13 May, 2013

VMware snapshots good and bad....

Almost every IT admin "knows" or at least thinks he knows what a VMware snapshot is and how to use it (in a good and sensible way) .
I hope that the myth about a VMware snapshot can be used as a full backup has died all over the IT world, because that is the worst myth in VMware land.
So we all know snapshots can be very helpful in day to day operations, like quickly reverting to a snapshot when a software patch or update turns out bad. It is also widely used in the process of backing up a VM, most 3rd party backup solutions use a snapshot to capture a VM in a stable state during their backup operation.
But it is far less known that a snapshot can also be very difficult when there has something that went wrong with one of the snapshots of a VM. You could potentially lose a lot of data when a snapshot (delta) file becomes unusable, not to mention the loss in performance the VM has when it has one or more snapshots active.
So VMware snapshot are very useful, safe and good when used in the right way ! Use them in a way that they are designed / intended for. This is very easily said and done within a small environment with only one or two VMware admins, but it becomes more prone to error when the environment is larger and or if there are more VMware admins (or other IT persons that have the permissions to create snapshots).
If you look at a bigger company with a larger VMware environment, most of the time there will be separate departments for IT infrastructure, hardware and software. In a lot of cases the IT persons working in the software department will have the permissions to create (revert to and delete) snapshots, in the same cases the IT persons working in IT infrastructure department are responsible for keeping the VMware vSphere environment running and healthy.
VMware has acknowledged the risk that having snapshots without knowing brings along and included a vCenter alarm function for it since vSphere version 4.X but this alarm has snapshot size (in GB) as trigger. Note that the growth of a snapshot is linked to the changes made on the VMs disks, so it is not directly linked to age of the snapshot.
Also note that this particular alarm is known to be a bit unpredictable since version 5.1, please read VMware kb 1018092 for details.
To keep track of snapshots in a environment where there are more persons allowed to create them then there are to manage the environment you might want to have some additional tools to do this.
You could use RVtools (by Rob de Veij) and manually run this on a weekly or bi-weekly schedule, RVtools is a great tool to check your environment and have results presented in a "Excel" style. I use it a lot when I need to assess a customers VMware environment prior to the start of a project. You could also use Powergui with the community powerpack which includes a script to report snapshots older then X days.
But in this way it will be a manual repeating task for the VMware admins and with the results you have two options, you either contact the creator of the snapshot (if you have this information) and ask them to remove the snapshot when it is no longer needed or have them remove the snapshot directly because it is very old / very big. Or you contact the creator and ask them what you should do with those snapshots, either way you will be getting a lot of work managing the snapshots created by others.
By the way the only way to know who created the snapshot (if the creator did not mentioned it in his snapshot description) is by getting the info from the vCenter event log.
As a extra challenge with the introduction of vSphere 5.1 there is a possibility to Storage vMotion a VM with a active snapshot, this is good thing of course. But the downside is that when you do Storage vMotion VMs (for let's say maintenance reasons on a storage device) the process will end up consolidating all snapshots of the VMs on the target datastore, in other words the snapshots will be deleted !
So why not automate this process and have the creator receive a email message with all needed info about the snapshot that is over due (older then X days) and also receive a overview of these snapshots with creators yourself as a reminder. If the creator ignores the email and decides to keep the snapshot, he (or she) will receive the same message again on the next run. You could even create two separate triggers the first as a reminder of the snapshot and a second one more as a warning.
Below you will find a PowerCLI script which retrieves the creators of the snapshots older then X days and will lookup the email address in the Active Directory. After it will send a email message to the creators each one specifying the details of the snapshot and VM it belongs to. It will also send a overview email to one account (VMware administration usually) so you can keep track of the active snapshots in your environment.

Hopefully the reminding of active snapshots will make the creator more aware and will they do a better job of cleaning up snapshots no longer needed. After all the weakest link in IT is the object between the chair and the desk....

No comments:

Post a Comment