Virtual-Stones Blog

16 May, 2013

New SDS (Software Defined Storage) solutions introduced

Just a quick post on new products / solutions introduced or going to be introduced. EMC introduced ViPR at EMC World convention last week. Great articles on ViPR you can find on the StorageIO blog by Greg Schultz, he wrote a 3 part series on this new solution of EMC "EMC ViPR virtual physical object and software defined storage (SDS)".
A other take on this new solution has been written by Duncan Epping on his blog "EMC ViPR; My take".
Also Netapp gave notice that they are going to introduce Cobra Commander during Q3 2013, maybe just in time to have some interesting information presented in one of the Netapp related VMworld Europe sessions....

15 May, 2013

Have a look in the crystal ball to predict the future (capacity) of your vSphere environment...

A question regularly asked, how much resources does my vSphere environment have at the moment. Or more specific, how many VMs can I add to my environment and still make sure that I have enough resources available in case of host(s) failure ?
It is a viable question that is often asked, but the answer is not really easy nor straight forward. To be clear if you want this kind of realtime information you should be looking at a analyses tool like VMware vCenter Operations or Dell vOPS (formerly vKernel / Quest) these tools can (when you purchase the right licenses) give you the exact amount of VMs you could add to your environment while retaining the needed failover capacity, vOPS even has the ability to "play" what if scenarios and have the results on these before you actually do anything in real life to your environment.
I will try to write one (or more) blogs post in the near future after I have taken these tools for another test run.

In the mean time, the question remains. Or if your company can't or does not want to invest a considerable amount of money in one of these tools, and believe me they are pricey (but what they provide on useful information sure makes up for their price).
How many VMs can I add to my current environment keeping in account the needed (or requested) failover capacity ?
For me a alternate way of getting a idea on how many VMs I can add to a specific environment is by running a PowerCLI script that rounds up a average CPU and Memory use of the current VMs and you manually specify what the maximum CPU and Memory resource usage per cluster may be (Only for non HA clusters).
On a side-note; In theory you should be able to calculate how many VMs you could add by using the HA slot size , but in real life there are to many dependencies and variables for this to be useful especially if you don't use reservations on your VMs. Anyway I am not going to write to much about HA, DRS slot sizes, reservations.... because there a other (Dutch) people that can (and actually did) write books on these topics. A must read for anyone who want's to be a better VMware admin, consultant or architect is VMware vSphere 5.1 Clustering Deepdive by Duncan Epping and Frank Denneman.
So back to the quick, simple and somewhat educated guess way of getting a idea on how many VMs you can add to your environment. As said earlier I think the quickest way of getting such a result is by having a PowerCLI script get the average values of all VMs currently running and set the maximum resource usage allowed by yourself.
The script I use is fairly simple and quick to run, and will give you results that you can use as a sort of guideline when you get this question and you need to answer it.

By default the output is directly to the CLI, but you could also output to csv, xlsx or HTML if you want. The CLI output will look something like this.

Both clusters in the screenshot above have more then enough CPU resources, but when you look at the MEM resources they both are around 10% growth allowed based on the current load and failover level. These results are from HA enabled clusters, so no manual config needed. The script uses in this case the "EffectiveCpu", "EffectiveMemory", "FailoverLevel" and "NumEffectiveHosts"from "DasConfig" to calculate the results.

13 May, 2013

VMware snapshots good and bad....

Almost every IT admin "knows" or at least thinks he knows what a VMware snapshot is and how to use it (in a good and sensible way) .
I hope that the myth about a VMware snapshot can be used as a full backup has died all over the IT world, because that is the worst myth in VMware land.
So we all know snapshots can be very helpful in day to day operations, like quickly reverting to a snapshot when a software patch or update turns out bad. It is also widely used in the process of backing up a VM, most 3rd party backup solutions use a snapshot to capture a VM in a stable state during their backup operation.
But it is far less known that a snapshot can also be very difficult when there has something that went wrong with one of the snapshots of a VM. You could potentially lose a lot of data when a snapshot (delta) file becomes unusable, not to mention the loss in performance the VM has when it has one or more snapshots active.
So VMware snapshot are very useful, safe and good when used in the right way ! Use them in a way that they are designed / intended for. This is very easily said and done within a small environment with only one or two VMware admins, but it becomes more prone to error when the environment is larger and or if there are more VMware admins (or other IT persons that have the permissions to create snapshots).
If you look at a bigger company with a larger VMware environment, most of the time there will be separate departments for IT infrastructure, hardware and software. In a lot of cases the IT persons working in the software department will have the permissions to create (revert to and delete) snapshots, in the same cases the IT persons working in IT infrastructure department are responsible for keeping the VMware vSphere environment running and healthy.
VMware has acknowledged the risk that having snapshots without knowing brings along and included a vCenter alarm function for it since vSphere version 4.X but this alarm has snapshot size (in GB) as trigger. Note that the growth of a snapshot is linked to the changes made on the VMs disks, so it is not directly linked to age of the snapshot.
Also note that this particular alarm is known to be a bit unpredictable since version 5.1, please read VMware kb 1018092 for details.
To keep track of snapshots in a environment where there are more persons allowed to create them then there are to manage the environment you might want to have some additional tools to do this.
You could use RVtools (by Rob de Veij) and manually run this on a weekly or bi-weekly schedule, RVtools is a great tool to check your environment and have results presented in a "Excel" style. I use it a lot when I need to assess a customers VMware environment prior to the start of a project. You could also use Powergui with the community powerpack which includes a script to report snapshots older then X days.
But in this way it will be a manual repeating task for the VMware admins and with the results you have two options, you either contact the creator of the snapshot (if you have this information) and ask them to remove the snapshot when it is no longer needed or have them remove the snapshot directly because it is very old / very big. Or you contact the creator and ask them what you should do with those snapshots, either way you will be getting a lot of work managing the snapshots created by others.
By the way the only way to know who created the snapshot (if the creator did not mentioned it in his snapshot description) is by getting the info from the vCenter event log.
As a extra challenge with the introduction of vSphere 5.1 there is a possibility to Storage vMotion a VM with a active snapshot, this is good thing of course. But the downside is that when you do Storage vMotion VMs (for let's say maintenance reasons on a storage device) the process will end up consolidating all snapshots of the VMs on the target datastore, in other words the snapshots will be deleted !
So why not automate this process and have the creator receive a email message with all needed info about the snapshot that is over due (older then X days) and also receive a overview of these snapshots with creators yourself as a reminder. If the creator ignores the email and decides to keep the snapshot, he (or she) will receive the same message again on the next run. You could even create two separate triggers the first as a reminder of the snapshot and a second one more as a warning.
Below you will find a PowerCLI script which retrieves the creators of the snapshots older then X days and will lookup the email address in the Active Directory. After it will send a email message to the creators each one specifying the details of the snapshot and VM it belongs to. It will also send a overview email to one account (VMware administration usually) so you can keep track of the active snapshots in your environment.

Hopefully the reminding of active snapshots will make the creator more aware and will they do a better job of cleaning up snapshots no longer needed. After all the weakest link in IT is the object between the chair and the desk....

Are there still use cases for custom attributes within a vSphere 5.1 environment ?

VMware has introduced "tags" with the release of vCenter 5.1, during a "what's new" presentation at VMworld one of the speakers mentioned that these tags could be seen as 2.0 replacement for custom attributes. Tags do offer a lot more then custom attributes, for instance they will be indexed by the search feature of the vSphere web client and they can be used on (almost) all objects in a vSphere environment. Custom attributes can not be searched within the vSphere (web) client and can only be used on Virtual Machine objects.
But is the statement also true when looking at VMware vSphere 5.1 environments currently running in production at VMware customers ? Probably not at the moment, I believe custom attributes will disappear eventually but at the moment there are to many (legacy) 3rd party applications and (PowerCLI) scripts that still use custom attributes.
A good example for the use of custom attributes by 3rd party applications are backup applications, Netbackup and also Veeam can write the last backup result of a VM to a custom attribute value.
If wanted / needed you could create a export of these values to create a report by the use of PowerCLI.

During a recent project a customer was changing their backup application and procedures, the formerly used Netbackup in a disk-to-disk-to-tape way. To the new solution for this customer, which became Netapp Snap Manager, this provides very fast and full backups at storage level. The only downside for the VM admins is that they no longer where able to "see" backup details, such as last backup time and result. This used to be written by Netbackup in a custom attribute value per VM and also exported csv file to update a status webpage.
The Netapp solution does not provide such thing out-of-the-box, so it was PowerCLI to the "rescue". I started to work on a script that would add a custom attribute to every VM and add the timestamp of the last successful backup. Note that Snap Manager uses a VMware snapshot in it's backup procedure, so I only needed to find the removal of the snapshot created by the Netapp service account in the event log per VM to retrieve this data. After some testing and searching the internet for some extra info, I stumbled upon the "Who created that VM?" script by Alan Renouf .
This script retrieves the user account which created that VM and the timestamp of the creation action. The username is then used to retrieve the full name from the Active Directory, the full name and timestamp info are then written to 2 custom attributes. I saw the potential of knowing who and when a VM was created, especially if you want to report on growth of your environment (will write on this in a later post). So I took the main part of Alan's script, modified it to suite the customer needs and also added my last backup part in that script.

The VM admins can now easily see from within the vSphere client when a VM was created, by who and when the last successful backup was made and it will also create the csv file needed to update their internal status webpage.
So in this particular case there is still a very good use case for custom attributes, in a later post I will go into more detail on how to use the value from the "CreatedOn" custom attribute to create a report which shows with how much VM's your environment is growing per month.
For those who want to use my script or use parts of it, please do but there are some points of attention:
1 To get the full name from the AD the script uses Quest AD cmdlets to resolve the username to full name.
2 If a VM is removed from the vCenter inventory and later added again the creator name will be the person how has re-added the VM not the original creator.

In general when you want to add / attach informational data to a VM by the use of (PowerCLI) scripting, you still will want to use custom attributes instead of the newer tags. But keep in mind that custom attributes can only be made visible in the vSphere web client if you run them thru the "migrate to tag" wizard.

06 May, 2013

VMware Tools and their Upgrade Policy

With the introduction of vSphere 5.1 VMware has made some changes to going about upgrading VMtools in your VM's. Prior to version 5.1 a reboot of the guest OS was mandatory to complete a VMtools upgrade, but now that we have version 5.1 this is no long required.
Keep in mind you need to run the latest VM hardware version and also run the VMtools version that goes with vSphere 5.1 to be able to do a upgrade without a reboot.
All of this is great but still some things remain, how do you get all your VM's up to the correct VMtools version easy and how can you automate upgrading once VMware releases a new VMtools version
and how to make sure you run the latest version of VMtools on imported VM's or Appliances ?

vSphere has VMtools options / features within the VM properties which you can set and one of them is called "Check and Upgrade Tools during power cycling" and this option has been in vSphere since at least version 4.1, but to my knowledge not used a lot.

When you check this option box the VM will check and if there is a newer version of VMtools available upgrade upon every reboot. When looking and VM's with a Microsoft OS, those will need to reboot to complete their patches and updates on a regular basis. So why not enable this option in you VM's ?
I find that keeping your VM's up-to-date when it comes to (security) patches important and for me VMtools is something a lot of VM admins tend to forget in this process. But if you run a large vSphere environment you don't want to go and open the properties of all VM's and to enable this feature !
That's were PowerCLI comes in handy, because you can script the enable (or disabling) of this option.

In the first example script you can set the option on a per Cluster basis

You could change this script so it suites you needs, for instance you could use it to only set this option on a selected number of VM's which you gathered in a CSV file. Like in the example script below.

After all changing such a small script is done a lot quicker then having to go thru all of your VM's one by one to set the option, just imagine having 1000+ VM's in your vSphere environment !
This option is still valuable in vSphere 5.1, although you don't need a reboot anymore when upgrading VMtools it is still a task that is easily forgotten. A other way of making sure you don't forget is by the use of VUM (VMware Update Manager), but this takes some time to setup a baseline and baseline groups. That is if you already have VUM installed in vCenter.

26 April, 2013

vSphere and ESXi 5.1 U1 released

VMware has released the much "needed" Update 1 yesterday. If you want all ins- and outs of the enhancements en bug fixes incorporated in this Update, please read the ESXi release notes and the vCenter release notes.

I took a look at the release notes and I found some bug fixes that I want to highlight:

Reinstallation of ESXi 5.1 does not remove the Datastore label of the local VMFS of an earlier installation.

Re-installation of ESXi 5.1 with an existing local VMFS volume retains the Datastore label even after the user chooses the overwrite datastore option to overwrite the VMFS volume.

When the quiesced snapshot operation fails the redo logs are not consolidated

When you attempt to take a quiesced snapshot of a virtual machine, if the snapshot operation fails towards the end of its completion, the redo logs created as part of the snapshot are not consolidated. The redo logs might consume a lot of datastore space.

ESXi 5.x host appears disconnected in vCenter Server and logs the ramdisk (root) is full message in the vpxa.log file

If Simple Network Management Protocol (SNMP) is unable to handle the number of SNMP trap files (.trp) in the/var/spool/snmp folder of ESXi, the host might appear as disconnected in vCenter Server. You might not be able to perform any task on the host.

The vpxa.log contains several entries similar to the following:

WARNING: VisorFSObj: 1954: Cannot create file

/var/run/vmware/f4a0dbedb2e0fd30b80f90123fbe40f8.lck for process vpxa because the inode table of its ramdisk (root) is full.

WARNING: VisorFSObj: 1954: Cannot create file

/var/run/vmware/watchdog-vpxa.PID for process sh because the inode table of its ramdisk (root) is full

vSphere 5 Storage vMotion is unable to rename virtual machine files on completing migration

In vCenter Server , when you rename a virtual machine in the vSphere Client, the VMDK disks are not renamed following a successful Storage vMotion task. When you perform a Storage vMotion task for the virtual machine to have its folder and associated files renamed to match the new name, the virtual machine folder name changes, but the virtual machine file names do not change.

This issue is resolved in this release. To enable this renaming feature, you need to configure the advanced settings in vCenter Server and set the value of the provisioning.relocate.enableRename parameter to true.

There are a lot more resolved issues and enhancements, take some time to go thru both the release notes. It for sure will benefit you !

vCloud loses sync with the vCenter Inventory Service

Yesterday the vCloud admin of a customer I am working for on a other project, had a strange problem. He told me that it became impossible to deploy new vApps from the catalog, this proces would stop with all kinds of error's.
A few days before when we where testing the deployment of vApps on NFS datastores that where on new storage devices he also ran into a strange problem which looked quite similar. When deploying vCloud would first generate a other set of "shadow VMs" before actually deploying the vApp. This was strange because the vApp already had a set of running "shadow VMs" and it should have been "using" these.
Because the issue of yesterday had stopped production on the vCloud environment, the vCloud admin opened a support request with VMware GSS.
Once they had a look at the issue, it became quite quickly clear what was causing these strange problems. It looked like the vCloud Director cell had lost sync with the vCenter Inventory Service, this is not uncommon and you can find several "solutions" to this problem when searching thru some blogs.
In short the steps you need to take to re-start the syncing process again (If you are running a multi-cell environment):

1. First disable the cell and pass the active jobs to the other cells.

2. Display the current state of the cell to view any active jobs.
# /opt/vmware/vcloud-director/bin/cell-management-tool -u <username> cell --status

3. Then Quiesce the active jobs.
# /opt/vmware/vcloud-director/bin/cell-management-tool -u <username> cell --quiesce true

4. Confirm the cell isn’t processing any active jobs.
# /opt/vmware/vcloud-director/bin/cell-management-tool -u <username> cell --status

5. Now shut the cell down to prevent any other jobs from becoming active on the cell.
# /opt/vmware/vcloud-director/bin/cell-management-tool -u <username> cell --shutdown

6. Then restart the services.
# service vmware-vcd restart

If you are not running a multi-cell environment, you can just restart the services but it will incorporate loss of service for a couple of minutes. If you want to keep the loss of services to a minimum, you can "monitor" or tail the cell.log file (when it reads 100%, it's done)

# tail -f /opt/vmware/vcloud-director/logs/cell.log

In case that the above does not work, you can also reboot the complete cell (In a multi-cell environment first pass all active tasks to other cells). Upon reboot the vCloud cell will reconnect and will sync again.

Ok back to the issue, in this case this did not work, vCloud did not start the sync again in either way tried.

The support engineer wanted to restart all vCenter service to make sure that they were all running ok. Unfortunately this did not help, but with in this specific environment the Inventory Service is run from a separate server (VM) and after restarting the Inventory Service and after a other restart of the cell services vCloud did sync upon starting.

After this when talking to the vCloud admin, he told me that he had found other minor issues that probably could be interpreted as signs that vCloud and Inventory Service were getting out of sync again.

He found that some VMs where present in vCloud but not in vCenter, hence you could not find them when using the search feature (which is driven by Inventory Service) in the vSphere client. And he found that the "VMs and Clusters" view of the vSphere client had become very very slow upon to unresponsive. All other views in the client were working as usual.

As this issue can occur again, we decided to keep an eye out if we would detect either of these "signs" and when we do, do a restart of the Inventory Service ASAP.

Better to be safe then sorry.