19 July, 2013

Virtual Machine Disk Consolidation is needed !?

vSphere 5 introduced a new feature to clean up VM snapshot "left-overs" which could be the result of a snapshot removal action where the consolidation step has gone bad. This would result in the snapshot manager interface telling you there are no more snapshots present, but at datastore level they still exist and could even be still in use / still be growing. This could case all kinds of problems, first of all VM performance issues as the VM is still running in snapshot mode, secondly you are not able to alter the virtual disks of this VM and in the long run your could potentiality run out of space on your datastore because the snapshot keeps on growing.
Prior to vSphere 5 there was a possibility to fix such a situation thru the CLI, now with vSphere 5 you get a extra option in the VM - Snapshot menu called "Consolidate" this feature should clean up any discrepancies between the Snapshot Manager interface and the actual situation at datastore level.
I'm always a little reluctant when I'm at a customer and they use a lot of snapshotting, it is a very helpful and useful tool but you have to use it with caution otherwise it could cause big problems. Usually the problems start if snapshots are kept for a longer period of time or if the snapshot a layered onto each other, but even if you are aware of the problems it can cause when it's used wrongly it can still happen that you run into issues when using snapshots.
That being said and when we look at features offered by the various Vendors of storage devices, I'm pointing to the VM backup solutions that they offer. When a VM is running when being backed-up they all use the vSphere snapshot (with or without the Quiescing of the guest file system option). Basically if your company uses a SAN that leverages this functionality and it's configured to backup your VM's on a daily basis you have a environment that uses snapshotting a lot (on a daily basis) and therefore you could possibly run into more snapshot / consolidation issues then when you would not have a SAN with this functionality (nobody snapshots all it's VM's manually on a daily basis, I hope).
When I recently was at a large customer (+2000 VM's) that uses their storage device feature to backup complete datastores daily and also uses vSphere snapshots to get a consistent backup of running VM's.
For some reason they run into snapshot / consolidation issues pretty often and they explained to me that the Consolidate feature did work ok on VM with a Linux guest OS , but they almost always had a problem when trying to consolidate on a VM with a Windows guest OS it would simply fail with a error.
So I had a look at one of their VM's that could not consolidate although the vCenter client was telling it did need it.

 When looking at the VM properties I cloud see that it was still had a "snapshot" (delta file) as a virtual disk, as it had a -000001.vmdk as virtual disk instead of a "normal" .vmdk

When a VM is in this state and it is still operational there is a issue, but the uptime is not directly affected, but most of the time the VM will be down and it will not power on again because of the issue. It will simply report a error of missing a snapshot on which the disk is depending. 
The way I solved it at this customer (multiple times) is by editing the VM's configuration file .vmx and re-registering the VM to vCenter and after manually cleanup the remaining snapshot files. Please note that if the VM was running in snapshot mode all changes written in the snapshot will be lost using this procedure, in other words the VM will return to it's "pre snapshot" situation. For this particular customer this was not a issue, because the failed snapshots where initiated for backup purposes so no changes where made to the VM when it ran in snapshot mode.

So if you run into this issue and you know that their where no changes made to the VM or the losing the changes is a acceptable loss you could solve it by these steps.
  1. Download the .vmx and open it with a text editor (I prefer Notepad++ for this kind of work) find the line that has the virtual disk files configured  scsi0:0.fileName = "virtual-machine-000001.vmdk" and remove "-000001" so you are left with scsi0:0.fileName = "virtual-machine.vmdk" save the file.
  2. Rename the .vmx file on the datastore to .old and upload the edited .vmx file.
  3. Either reload the VM by using PowerCLI** or remove the VM from the Inventory and re-add it again.
  4. Power on the VM
  5. If you get the "Virtual Disk Consolidation needed" message, go to the Snapshot menu and click "Consolidate" it should run correctly now and remove the message.
  6. Manually remove the unused files from the datastore .old, -000001.vmdk and -000001-flat.vmdk (I use Winscp to do this kind of work on a datastore)
** you could do this by using the following one-liner:

Get-View -ViewType VirtualMachine -Filter @{"Name" = "VM name"} |%{$_.reload()}