20 December, 2013

ESXi 5.1 host cannot synchronize due to an incorrect user name or password

Recently at a customer I found a issue with 8 out of a 10 host HA cluster, the 8 hosts reported a warning as seen below.
Usually this error is seen when something is changed to the vpxuser user account, as also described in VMware's KB2017460.
The solution to the problem is pretty straight forward, you disconnect the host from vCenter and use the DCUI or a ssh session to remove the vpxuser user account. After you re-connect the host again to vCenter, you ignore the warning about the bad user name and password and enter the root credentials of the host when prompted. The host will now be re-connected and vCenter will create a new vpxuser user account. After this process is finished the host should be as it was before the issue.
In this particular case I found something strange, of the 8 hosts with this issue 7 hosts did not have a vpxuser account at all. When I tried to delete the user account it just reported back "unknown" only the 8th host (I know it's always the last one) did have a vpxuser user account.
The important part to know is that when there is no vpxuser user account just a disconnect followed by a re-connect will solve the issue.

29 November, 2013

Resetting (local) Windows 2008 and 2012 Administrator password

Although not a actual virtualization topic, I think it's something a lot of Admin's have to deal with every now and then. For me it's occasionally when working with templates of Windows server based VM's, when they are created a long time ago some Admin's just tend to forget what local administrator password was used. Because it is usually changed after a VM has been deployed to a different password.
When you then need to work on such a template for some reason you will get stuck when you do not remember the password. In the past I used the "Offline NT Password & Registry Editor" to reset the administrator password, but recently a Admin at a customer informed me there was a other way to reset the password and this way would also work on Windows server 2012.

You need to have the Windows server install dvd, or if you're like me into virtualization a ISO of that dvd. For both Windows server versions you need to go to the "Repair your computer" option on the Setup menu after booting the VM (if it's a template, first convert to VM) from this dvd.

After selecting this you need to select the OS instance you want to repair in the "System Recovery Options". Next select "Command Prompt". Once you are in the command prompt you need to go to the disk that holds the system32 folder, default this is C:
Then run the following commands:

  • cd windows\system32
  • ren Utilman.exe Utilman.exe.old
  • copy cmd.exe Utilman.exe
After reboot the server (not from dvd!). At the logon screen press Windows-key + U this will open a command prompt.
Now run the following command "net user administrator Password" where password stands for you the password of your choice. After closing the command prompt you will be able to login to the server as local administrator with the password you previously entered.

You're able to logon again but you are not completely done yet, there is some clean-up to be done.

Again boot your VM from the Windows server dvd and follow the previous mentioned steps to get to the Command Prompt and go to the disk that holds the system32 folder.

  • cd windows\system32
  • ren utilman.exe utilman.exe.new
  • copy utilman.exe.old utilman.exe
When finished close the Command prompt and click "Continue" now you're VM should be running as it should and you should be able to logon as local administrator.

If it was a template do not forget to convert your VM back to a template. And store the local administrator password in some sort of password-vault !

21 November, 2013

How to manage Hardware Version 10 VM's on a standalone (free) ESXi ?

As you might already know, to manage or configure VM's at hardware level 10 which comes with ESXi 5.5 and vSphere 5.5 you will need the vSphere web client (and vCenter) to do so. So what about a standalone or free ESXi 5.5 host ? The easiest way is NOT to upgrade the hardware version to version 10, but to simply keep it at any version below by this way you can fully manage and configure the VM with the C# client (full vSphere client).
I got caught by my enthusiasm to try the new features of vSphere 5.5 and I went and upgraded a VM to version 10 in a lab environment not connected to a vCenter. The quickest way I found to get out of this "unmanaged" situation was to create a new VM with hardware version 9 without disks, after creation I moved the disks of the unmanageable version 10 VM to this new VM and powered it on.
This will give you back control, but you still won't have any of the new features of 5.5.
A other way to manage these version 10 VM's is to install VMware Workstation 10 on you computer and connect to either your vCenter or to your standalone host, by this way you will be able to manage you VM's including the hardware version 10 VM's

More information and screenshot will follow soon in upcoming blog posts

29 October, 2013

20th Belgium VMUG meeting (agenda updated !)

VMUG meetings are always interessting and I think that the Belgium VMUG have very good sessions scheduled for their 20th VMUG meeting.
Please check out the details below ! For those who are attending this VMUG, see you December 5th !

If you are a VMUG member you can register here for this free meeting. You can also find the location details on that page.

Belgium VMUG 20th meeting agenda:

9:00Erik SchilsWelcome
9:15Chad SakacSoftware Defined Storage – State of the Art, and the Art of the Possible (EMC Gold Sponsor)
10:00Lee DilworthStretched Storage Clusters, Best PracticesSeb HakielUser Case: A 4K VDI 
11:00Dheeraj Pandey“Bringing Google-like Infrastructure to the Enterprise “ (Nutanix Gold Sponsor)
11:45Chris Van Den AbbeeleTurning the Tables on Cyber Attacks (Trend Micro Silver Sponsor)Gilles ChekrounApplication Centric Infrastructure: Redefine the Power of IT
(Cisco Silver Sponsor)
13:45Joe BaguleyThoughts from a CTO - Where's it all going?" (VMware Gold Sponsor)
14:30Luc DekensManage and Automate VMware View with PowerCLIViktor van den Berg and Arnim van LieshoutvCac
15:30Vaughn StewartAccelerate vSphere with Flash Storage Technologies
Bouke GroenescheijPractical Vscsistats Result Analysis
16:15Hans & GuestsPanel Discussion: "How Does the Future of Enterprise IT Impact Us?" (Pure Storage Silver Sponsor)Kris VandermeulenEverything you always wanted to know about Microsoft licensing on VMware but were afraid to ask
16:45Cormac HoganvSANPeter vandermeulenUser Case: Datacenters, Branch Office en Process Controle MIZ with VSA and How We Want to Integrate

24 October, 2013

VCAP5 exam experiences

I have been looking into getting VCAP5-DCA and VCAP5-DCD certified for some time now, when I was making plans to attend VMworld 2013 in Barcelona those 2 came together. I was lucky enough to have the opportunity to get some study time beside my work for a couple of weeks just before VMworld.
For the DCA exam I thought I did not have to study that much, so I just did some fleshing up on my vMA "skills" (don't use it regularly) and on configuring SATP rules by CLI.
On the other hand I did study a fair amount on the DCD exam as I knew it would be a real challenge.
I planned my DCA exam on Monday (Partner / TAM day) in the afternoon, because in the morning I was still travelling to Barcelona. And I planned my DCD exam on Tuesday morning, to be sure I could spend the rest of my time at VMworld attending sessions (and going to the various parties).
When I sat for my DCA exam I found it a very good exam, I would almost say I enjoyed taking it. I think that for enterprise admin's it is the same as it is within your comfort zone. Nevertheless it is a challenging exam where you are really tested for your hands-on knowledge. I did manage to complete all questions / tasks within the given time. I knew that I would not receive the results immediately after completing the last question it could take up to 15 days before you got the results back by email. But to my surprise VMware has recently changed this and they will get you the results within 8 hours ! As I did not know this I was surprised to find a email "Notice of VMware Technical Certification" the next morning and the best part I passed !
That day started very good, and with a boost in confidence I sat for my DCD exam. Now this is a whole different kind of exam, I didn't like it as much as the DCA exam for mainly 2 reasons. The first is that the design questions you get you need to do in a environment somewhat similar to MS Visio, but is feels like it is in Beta. It just isn't user friendly especially if you need to make modifications along the way, I was struggling more with the user interface as I was with the actual content of the question itself.
And the second reason for me is that a lot of questions have vague descriptions, I understand that when you are designing in real life there will also be points that are not really clear or vague but in this case you could ask to get clarification. This is not a option during the exam of course.
I have to say that the DCD exam is a difficult exam, in which time is your biggest enemy. It's just a huge amount of questions that you need to answer and the questions do cover every object in the blueprint !
When I got to the last question I had 5 minutes left, luckily I was able to answer within a blink of an eye. After answering the last question I had some exciting seconds before the results where shown, I passed this exam as well.
Needless to say that this has been the best VMworld ever for me !

Both exams are difficult and need some sort of exam strategy if you ask me, but both strategies are very different. If you are pursuing either or both of them, please take a look at my VCAP5-DCA and VCAP5-DCD study pages for useful links and tips.

11 October, 2013

vCenter 5.1 takes ESXi 5.1 host automatically out of maintenance mode

A couple of days ago I wrote about changing the scratch location on ESXi hosts in relation to the use of SNMP "Enabling SNMP on ESXI 5.1 host results in "The ramdisk 'root' is full" events".
In addition to this, the same customer ran into a other issue which also relates back to having the scratch location on non-persistent storage.
For hardware maintenance hosts where put into maintenance mode and "handed over" to datacenter engineers, they needed to update firmwares and bios of these hosts. When they finished the first host, they powered it on and it booted up ESXi as normal.
When they came up to find out if all was OK with this host, the VM admin looked up the host and found it in fully operational state ! And not in maintenance mode as expected, when looking at the host tasks and events it looked like "system" had taken the host out of maintenance mode after it came back online in vCenter. This could cause some serious issues, if the host is a member of a Cluster with HA and DRS (full automated) enabled but lacks the network uplinks that provide the VMnetwork(s). VM's would be vMotioned to this hosts, these VM's will lose their network connection !
For this to happen, the host has to be in operational state before it re-connected to vCenter. So why did this host "forget" it was in maintenance mode during the hardware maintenance ?
When I saw this happening I remembered that in the past with patching ESXi 4.1 hosts similar events happened and this was caused by that during the patching the scratch location (/tmp/scratch) got damaged / not accessible. When this happened the host booted normally, re-connected again to vCenter and got taken out of maintenance mode by the system account.
So a quick check learned that the host we where working on now had it's scratch location on non-persistent storage, next I checked in what way the datacenter engineers shutdown or reboot a ESXi host. I learned that as they don't have the rights to do shutdown or reboot a host through the vSphere client (either connected directly to the host or connected to vCenter), they used the out-of-band management (iLO, Drac, iRMC etc.). They always tried to do a graceful shutdown or reboot, but for this agents need to be installed on the host. As this is not the case this does not work for them, the other option is hard reset or power cycle. Let's be clear this is not a good thing, in my opinion you always need to do a clean and graceful shutdown or reboot ! Especially if the host concerned has its scratch location on non-persistent storage (read ramdisk), as this type of storage will act like if the host is experiencing power failure. And therefore not writing anything to a persistent location as it would during a clean reboot or shutdown.

VMware KB on changing the scratch location KB1033696


07 October, 2013

Enabling SNMP on ESXI 5.1 host results in "The ramdisk 'root' is full" events

Recently I ran into a issue when doing some work on a ESXI 5.1 Cluster, I needed to put the hosts in maintenance mode one by one. When I put the first host in maintenance I assumed that the host would be evacuated by migrating all VM's with the use of vMotion as Enterprise plus licenses where in place. But when the progress bar hit 13% the vMotion process stopped with a error. The error referred to "ramdisk (root) is full". When I checked with the customer they told me that this started happening after they configured snmp. They found that it would for some reason fill up the disk containing /var and they also found that sometimes those hosts became unresponsive to SSH and or DCUI.
After looking up the error message it quickly became clear what the relation was between snmp and "ramdisk (root) is full", the snmp service generated a .TRP file for every snmp trap sent. I believe this is not normal behavior, when I checked the functionality of snmp "esxcli system snmp test" it reported a error "Agent not responding, connect uds socket(/var/run/snmp.ctl) failed 2, err= No such file or directory" this proved my assumption was right (a successful test should result in "Comments: There is 1 target configured, send warmStart requested, test completed normally.". These files are stored in /var/spool/snmp and this location was located on non-persistent storage, in fact it was located on a 4GB ramdisk. Please check VMware KB2042772 for details on the error related to scratch location.
The snmp service will write a maximum of 8191 .TRP files, if the /var/spool/snmp location runs out of space before hitting this number you will have a host which is no longer able to vMotion, it can also become disconnected / unresponsive. And in some cases you are not able to start DCUI as there are no free inodes on the host. In this case connect to the host console (iLO, DRAC,....) and make sure you can login, then stop the vpxa service this will free up a inode and you will be able to start DCUI from the host's console (Troubleshooting options).
Now you need to remove the files that fill up the ramdisk, but first be sure that snmp is the cause of the issue by checking the file count in /var/spool/snmp "ls /var/spool/snmp | wc -l" if the result is above 2000 files snmp is most likely the cause.
To remove the files you can go 2 ways, move to the /var/spool/snmp dir and remove all .TRP files "for i in $(ls | grep trp); do rm -f $i; done" but I also found that stopping the snmp service "esxcli system snmp -e No" also clears the dir most of the times.
When the files are removed the host will start responding normally again, you will be able to start the vpxa service again "/etc/init.d/vpxa start" if you had to stop it previously.
The permanently fix this issue you need, as stated earlier to change to scratch location preferably to a local or shared datastore (VMFS or NFS). You can do this by editing the advanced settings (software) of the host “ScratchConfig -> ScratchConfig.ConfiguredLocation” after changing a reboot is mandatory to apply the change.
If you have to go thru a number of hosts , you might want to do this by using PowerCLI. If your lucky and the naming convention of the (local) datastores is uniform you will be able to automate all actions. If not (like in my particular case) you either go it host by host with the use of the vSphere (web) client. Or you could use a small script to look up all datastores, let you select the (local) datastore and update the advanced settings for you. sample script below.

VMware KB used as reference : KB2001550 KB2040707 KB1010837

29 August, 2013

vSphere 5.1 All Paths Down (APD) warning messages Part-2

In my previous post I wrote about troubleshooting APD and finding the root cause. At that time the root cause turned out to be a IP address conflict which bound 2 different NFS networks to 1 vmkernel interface, causing random APD events on all hosts connected to either of the NFS networks.
Besides these random APD events the same customer also had APD events that appeared to be a set times, mainly around the time backups where running (Netapp backups).
So for these APD events it looked like the NAS head becomes overloaded when it has to run a backup task on top of the normal load. When talking to the storage admin's, they wanted to see if any other background process (like deduplication) could trigger the same issue.
When looking into this we found that indeed deduplication could trigger APD events on vSphere and write "NFS slow" events in the Netapp logs. Not sure why this happened, a support case was opened with Netapp. Within this case all known performance where looked at and from the perfstat captures they could tell there were misaligned VM's. There is only one way to measure the effect of misaligned IO on your Netapp is by looking at the pw.over_limit counter. This counter is only available in priv set advanced command line mode.
So we ran the Scan Manager from the Netapp plugin for vCenter to see how many VM's where misaligned and we found there where a lot, due to the vCloud environment which has a lot of misaligned base VM's with multiple linked clones (which are automatically also misaligned).
During the search one of the storage admin's found a relation between a deduplication task on NAS head "A" in Datacenter A causing not only load on NAS head "A" in Datacenter A but also on NAS head "B" in Datacenter B. This was caused by a feature called "alternated write" this combined with the used storage design had a negative effect on the load.
All these factors turned a more then capable storage system into a stressed out, overloaded storage system. Like they say "The devil is in the details"
The vSphere environment suffered from these storage performance issues, in a massive way as you can imagine. This specific customer had Enterprise+ licenses and had SIOC (Storage I/O Control) enabled on all datastores, but even with SIOC they still experienced unresponsive and crashing VM's.

12 August, 2013

vSphere 5.1 All Paths Down (APD) warning messages Part-1

As you might know there has been some changes to APD and PDL behaviour in vSphere version 5.1.
APD got a new way of handling I/O's during APD scenarios and by setting a advanced option you can now even choose between the old way and the new I/O's "fast fail" way.
In short, the old way attempted to retry failed I/O's for a period of 140 seconds. After it would stop, the new way is to stop all I/O's directly. There are situations that the old way could cause the host(s) to disconnect from vCenter or even become unresponsive, this is something you want to prevent. By using the new way it will prevent these issues.
If you want to know more about ADP and PDL behaviour you should read the different articles on these subjects on Duncan Epping's blog Yellow Bricks or to be more specific start with this blog post.

So way I am writing a post about this subject when there is already a lot of good information out there ? A couple of days ago I have been asked to troubleshoot APD warning events in a vCenter log of a customer and I found that it was very difficult to pinpoint the cause of these APD messages.

There where multiple factors that made the troubleshooting difficult, one being that the customer has a stretched metro-cluster setup and a other one being that a part of the messages would appear on recurring times and the other part at "random" times.
The origin of the APD messages on recurring times was quickly found, when these would only occur if there where "extra" background processes running on the Netapp NAS heads. With background processes I mean processes like backup or de-duplication tasks. And the timeslot these APD events occurred made it a not so urgent issue. On how the APD events related to the background processes where solved I will write a other blog post as soon as all is double checked and confirmed both by VMware and Netapp.
On the other-hand the "random" APD events where a lot more difficult to pin point and the issue was a lot more high-profile as customers where complaining about slowness and unresponsiveness of their VM's and vApp's during the APD events. The customer used HP blades and Flex 10 modules for connecting the enclosures to the core network and NFS network. After troubleshooting and ruling-out all enclosure and network related possible causes, only the Netapp NAS heads or ESXi hosts could be the root cause of the APD events. These APD events occurred at random times, and most of these times the Netapp NAS heads didn't have any background processes running, nor did we find any information in the system logs of the Netapp pointing to the cause. Last place to look where the ESXi hosts, first checked all physical NIC's (which are actually virtual NIC's as they are presented to the Blade by the Flex 10 module) no issues there. Next checked network config of all hosts, luckily 1 host within a HA cluster assigned to a vCloud environment wrote warning messages of a duplicate IP address being used on 1 of it's VMkernel interfaces.
When I checked the network config of this host I saw nothing strange, so I started checking all other hosts with in the same cluster finding nothing... I continued checking a other HA cluster assigned to the same vCloud environment, finally I found a other host which had a VMkernel interface configured with the same IP address. Both VMkernel interfaces where used for NFS, but this IP address was not from the same subnet the Netapp NAS heads where in. They where in a seperate subnet in which a other NFS NAS was connected, this was used by only 1 of the 2 hosts. But on the hosts is was not used it was configured on the same dVswitch as the NFS network to the Netap NAS heads.
I updated the network config for the unused VMkernel interface and the "random" APD events disappeared. So I guess having a IP address conflict on a interface not used but within the same (d)Vswitch with a interface that is being used for NFS could cause APD events for multiple hosts and even multiple HA clusters, in fact it even affected hosts outside the vCloud environment the only thing in common was that they all where connected to the same Netapp NAS heads.

26 July, 2013

The downside of "Fast Provisioned" vApps in vCloud

When you or your company is replacing their existing SAN on which their vSphere and/or vCloud environment is running, you are probably going to replace it with a VAAI capable storage solution.
VAAI (vStorage APIs for Array Integration) was introduced with ESX(i) 4.1 and it basically provides a way to offload storage related tasks to the storage device. This reduces the load on the ESX(i) hosts and vCenter, it also speeds up those tasks.
When looking at vCloud Director, it is since version 5.1 also supporting VAAI, it is possible to offload any cloning of VM's "Fast Provisoning" as it is called in vCloud.
This speeds up the deployment of vApps considerably, it also helps to be very storage efficient when combined with deduplication and thin provisioning.
So far still no downside, well the point is that when you need to move vApps to a datastore that is on a different storage system or you need to move your entire vCloud environment to a new storage solution then that is the moment you could run into some unpleasant suprises (limitations) that VAAI brings.

A customer where I was recently working on their vSphere / vCloud environment where running both these environments on the same storage solution, but due to the explosive growth of the vCloud part they where starting to experience performance issues. The storage solution could not deliver the needed IOPS. Their current storage solution is a Netapp VAAI capable one, the purchased a new storage solution which would be only used for their vCloud environment. This also is a Netapp VAAI capable one, only a more high performance model.
They had enabled "Fast Provisioning" within vCloud from the get go, so all vApps where deployed like this. This means all linked clones where actually "Flex clones". When the new high performance storage solution was setup and ready to be used they needed to move / migrate their deployed vApps to the new datastores. They knew that this would cause the linked clones to be consolidated and become full clones as this is the only way a SvMotion could move the VM's within the vApps.
But a unexpected error stopped this, vSphere vCenter reported a error explaining that it could not consolidate and therefore not migrate the VM's. When I had a look at it I first thought the error could be caused by a compatibility mismatch between vSphere and Netapp Ontap, so to gather more info on this I opened up a support case with VMWare GSS. After providing vCloud logs and some additional information the SR became a PR and went to the engineering department, quickly after this had happened I got a answer from VMWare, which basically said that we where trying to do something that is not supported.
Not the answer I was looking for ! And when I continued to read thru the list of unsupported actions regarding vCloud and VAAI storage I even became more unhappy no consolidate or Storage vMotion are permitted on VAAI enabled storage (Fast Provisioned VM's and vApps that is). Luckily at the bottom of the email it read that it move could be done by the use of the "relocate" methode which is not present in any UI but only thru the vCD API. So there was a way of accomplishing the move, but it would take me some time to figure out how to do this. The mail provided a link to vCloud 5.1 API guide.

Unsupported / not permitted actions on "Fast provisioned" or "VAAI" clones:

  • Consolidate operations are not permitted for VMs created through the VAAI cloning process, even for powered-off VMs.
  • Using Storage vMotion through VC UI to move VMs created through the VAAI cloning process, is not permitted. However it is possible to relocate VAAI clones through the VCD API
  • Reporting on capacity remaining on a given datastore may be inaccurate. 
  • This release supports a maximum VAAI chain length of 256. Once VAAI clones reach this limit, further clones will be full-copy, handled by vCloud Director. This maximum is configurable using the db flag, Config.VirtualMachine.AllowedMaxVaaiChainLength
  • Source VMs, when cloned, have a REDO log attached to them, which, if they are running, may cause a negative read performance (compared to non-cloned vm) impact.  
  • No explicit way to prevent an admin from turning on Vaai flag for NAS datastores that do not support Vaai.
  • Vaai clones will not work for vSphere  < 5.0 (Linked clones in VCD only work for VC versions after 5.0).
  • Relocate of vaai clones (VMs residing on vaai enabled volumes) will not work if the vm has user created snapshot. The snapshots will need to be removed for clones to work.
  • There maybe additional constraints on Vaai Clone support imposed by arrays. Please contact vendors. 

On page 18 of the guide you find the information about authentication and headers required and on page 231 there is information on relocating a VM to a different datastore, example code is provided.

With this new information I started to figure out what this relocate actually does, when I was looking for some additional information I stumbled upon the blog of Matt Vogt he wrote a article on VMware opening up a lot of API's with the 5.1 release of vCloud Director, but there was still a lot more to be desired. One of the things being able to change the storage profile of VM's and for this you need to know the Href for the storage profile you want to change to.
When you change the storage profile of a VM in vCloud it will trigger a relocate action, because the different storage profile refers to a different datastore. 
Matt used a script from Jake Robinson posted on the VMware Community which could retrieve the Href of storage profiles and created his own script which could change storage profiles of all VM's within a vApp. I took his script and adjusted it to my needs, this resulted in a script which can do the following.
It can change the storage profile of VM's within the same vApp in one go (sequentially). When the VM has a chain length lower then 2 it can be powered on, when the chain length is equal or greater then 2 the VM needs to be powered off to successfully complete the change. In any case the linked clone will be consolidated to a full/thick clone, so it becomes independent of it's base disk(s). 
When I tried the script it did not work for some reason it would not retrieve the Href of the new storage profile. For my purpose I did not spend any time on solving this I just hard coded the Href in the script, be sure to retrieve it yourself and update the script before use. This can be done easily with a Powercli one-liner.
Steps to get Href of destination (new) storage profile:

  1. Manually change the storage profile of a VM to the new storage profile, this VM will be relocated to the corresponding datastore(s).
  2. Setup a Powercli connection to vCloud director
  3. Run $VM = Get-CIVApp "vApp name" | Get-CIVM VM name
  4. Run $VM.ExtensionData.storageprofile.name (verify the name reflects the name of the destination storage profile)
  5. Run $VM.ExtensionData.storageprofile.Href

This last line gives you a output that should look like: 
This is the Href of the storage profile, update this in the script at $profileHref line and you are ready to relocate vApp's.

Script code

Of course you can also use the script when relocating traditional/vSphere linked clones.

19 July, 2013

Virtual Machine Disk Consolidation is needed !?

vSphere 5 introduced a new feature to clean up VM snapshot "left-overs" which could be the result of a snapshot removal action where the consolidation step has gone bad. This would result in the snapshot manager interface telling you there are no more snapshots present, but at datastore level they still exist and could even be still in use / still be growing. This could case all kinds of problems, first of all VM performance issues as the VM is still running in snapshot mode, secondly you are not able to alter the virtual disks of this VM and in the long run your could potentiality run out of space on your datastore because the snapshot keeps on growing.
Prior to vSphere 5 there was a possibility to fix such a situation thru the CLI, now with vSphere 5 you get a extra option in the VM - Snapshot menu called "Consolidate" this feature should clean up any discrepancies between the Snapshot Manager interface and the actual situation at datastore level.
I'm always a little reluctant when I'm at a customer and they use a lot of snapshotting, it is a very helpful and useful tool but you have to use it with caution otherwise it could cause big problems. Usually the problems start if snapshots are kept for a longer period of time or if the snapshot a layered onto each other, but even if you are aware of the problems it can cause when it's used wrongly it can still happen that you run into issues when using snapshots.
That being said and when we look at features offered by the various Vendors of storage devices, I'm pointing to the VM backup solutions that they offer. When a VM is running when being backed-up they all use the vSphere snapshot (with or without the Quiescing of the guest file system option). Basically if your company uses a SAN that leverages this functionality and it's configured to backup your VM's on a daily basis you have a environment that uses snapshotting a lot (on a daily basis) and therefore you could possibly run into more snapshot / consolidation issues then when you would not have a SAN with this functionality (nobody snapshots all it's VM's manually on a daily basis, I hope).
When I recently was at a large customer (+2000 VM's) that uses their storage device feature to backup complete datastores daily and also uses vSphere snapshots to get a consistent backup of running VM's.
For some reason they run into snapshot / consolidation issues pretty often and they explained to me that the Consolidate feature did work ok on VM with a Linux guest OS , but they almost always had a problem when trying to consolidate on a VM with a Windows guest OS it would simply fail with a error.
So I had a look at one of their VM's that could not consolidate although the vCenter client was telling it did need it.

 When looking at the VM properties I cloud see that it was still had a "snapshot" (delta file) as a virtual disk, as it had a -000001.vmdk as virtual disk instead of a "normal" .vmdk

When a VM is in this state and it is still operational there is a issue, but the uptime is not directly affected, but most of the time the VM will be down and it will not power on again because of the issue. It will simply report a error of missing a snapshot on which the disk is depending. 
The way I solved it at this customer (multiple times) is by editing the VM's configuration file .vmx and re-registering the VM to vCenter and after manually cleanup the remaining snapshot files. Please note that if the VM was running in snapshot mode all changes written in the snapshot will be lost using this procedure, in other words the VM will return to it's "pre snapshot" situation. For this particular customer this was not a issue, because the failed snapshots where initiated for backup purposes so no changes where made to the VM when it ran in snapshot mode.

So if you run into this issue and you know that their where no changes made to the VM or the losing the changes is a acceptable loss you could solve it by these steps.
  1. Download the .vmx and open it with a text editor (I prefer Notepad++ for this kind of work) find the line that has the virtual disk files configured  scsi0:0.fileName = "virtual-machine-000001.vmdk" and remove "-000001" so you are left with scsi0:0.fileName = "virtual-machine.vmdk" save the file.
  2. Rename the .vmx file on the datastore to .old and upload the edited .vmx file.
  3. Either reload the VM by using PowerCLI** or remove the VM from the Inventory and re-add it again.
  4. Power on the VM
  5. If you get the "Virtual Disk Consolidation needed" message, go to the Snapshot menu and click "Consolidate" it should run correctly now and remove the message.
  6. Manually remove the unused files from the datastore .old, -000001.vmdk and -000001-flat.vmdk (I use Winscp to do this kind of work on a datastore)
** you could do this by using the following one-liner:

Get-View -ViewType VirtualMachine -Filter @{"Name" = "VM name"} |%{$_.reload()}

05 July, 2013

New Fling from VMwareLABS called "VisualEsxtop"

Most of the Fling's coming from VMwareLABS are worth trying and every once in a while there is a Fling that is really cool and above all useful (like InventorySnapshot I wrote a post on a while back), just a couple of days ago some of the engineers from the VMware performance group release VisualEsxtop.
The name says it all, it is a graphical version of esxtop which can be run on Microsoft OS, Linux OS and on Mac OS** so it is really "cross-platform".
It works remotely and can be run on any computer which has network access to ESX(i) host or vCenter (although I haven't been able to connect it to a vCenter successfully).

When you run it, it looks like a enhanced version of esxtop

It will color coat important counters and issues automatically. Further more it has the ability to record and playback batch output, it can create line charts for selected counters and it has counter descriptions when you "mouse-over" them.

To download this Fling, which I recommend as it is a very useful tool to have please go to VMwareLABS

** As said before it is cross-platform, but to get it to run on Mac OS you need to take some extra steps. For details on how to do this please read How to Run VMware's New Fling VisualEsxtop on Mac OS X from the virtuallyGhetto blog of William Lam

Shutdown unresponsive VM on ESX(i) 4.x / 5.x

Sometimes you will run into a VM that is unresponsive and or unreachable by RDP, even the vSphere client console will not work. The only way to solve this is to shutdown the VM, in some occasions even this will not work nor will the power-off or reset commands work.
When this is the case you will be getting error messages like: The operation is not allowed in current state or The attempted operation cannot be performed in the current state (Powered Off) or anything else along this line.
Recently a was working at a customer and the had a similar situation with 2 VMs, the admin's working on it where not able to successfully shutdown the either of the VMs. They asked if I knew a way to do it, maybe from the CLI.
I knew I could kill the process of these VMs thru CLI with the command  esxcli vm process list to get the world-id followed by esxcli vm process kill -t [soft,hard,force] -w WorldNumber to kill the process running the VM. But for some reason I was not able to find the VMs concerning in the output presented in the CLI (perhaps to many VMs on the host and over-looked it).
As alternative I also knew there was a way to kill processes from esxtop (only available on ESXi 4.x and 5.x), but I did not have all the details on the steps to perform but this was quickly solved by a quick search thru the VMware KB.
As it turned out, I think I prefer the "esxtop way" above the other ways of doing it, for 2 reasons; first esxtop is something you use regularly (I assume) and second it is a very clear and "visible" way.

How to use esxtop to kill processes of unresponsive VMs:

  1. On the ESXi console, enter Tech Support mode or connect thru SSH and log in as root.
  2. Run esxtop
  3. Press c to switch to the CPU resource utilization screen.
  4. Press Shift+v to limit / filter the view to virtual machines. 
  5. Press f to display the list of fields.
  6. Press c to add the column for the Leader World ID.
  7. Identify the target virtual machine by its Name and Leader World ID (LWID).
  8. Press k.
  9. At the World to kill prompt, type in the Leader World ID from step 7 and press Enter.
  10. Wait 30 seconds and validate that the process is not longer listed.
For more information on ways to shutdown unresponsive VMs please read VMware KB1014165

*Note; for the one's running ESX instead of ESXi please refer to VMware KB1004340 as there a different ways to do this on these systems.

06 June, 2013

PowerCLI 5.1 Release 2 and VDS (Distributed Virtual Switch)

So prior to PowerCLI 5.1 Release 2 there was not a lot you could script with PowerCLI regarding the VMWare Distributed vSwitch (I deliberately use different names and abbreviations for the VDS as the official name changed over time and I lost track).
But now with the Release 2 there are a whole bunch of new cmdlets to work with VDS. Having these new cmdlets and the added support for vCloud Director 5.1 will give the VCD admin a lot of new automation possibilities.
I won't write a blog post on the new VDS cmdlets, for this I refer you to the blog post of Alan Renouf on Virtu-al.net.
A other thing supported by PowerCLI 5.1 Release 2 is PowerShell v3.0 and it's enhancements, one cool and very handy feature I want to point out. This feature is very handy when your new to PowerShell or PowerCLI and will give you a better understanding on how a command is "build". PowerCLI always had a syntax explanation of a cmdlet when you added "-Syntax" to your cmdlet, but this would give you a short description of all the parameters you could use with the specific cmdlet and is not really helpful if you do not exactly know what it is you are looking for or maybe doing wrong.
When you install PowerShell v3.0 and PowerCLI 5.1 Release 2 you can use the "show-command" cmdlet to get a small GUI for the cmdlet you want to use.

Let's look at a simple cmdlet "Connect-VIserver" you will need it for most PowerCLI scripts as this will connect you to a vCenter server(s) or ESXi Host(s).
When you run "show-command connect-viserver"

It will present a GUI in which you can enter the needed data, after when you click OK it will show and run the proper formed command to the PowerCLI prompt. This way you will know the next time how to use the cmdlet to get what you want (Ok maybe it will take you a couple of runs before you get it).

05 June, 2013

VMtools on vSphere 5.1 reboot no longer needed ?

VMware wrote in their "What's new in VMware vSphere 5.1" document that starting from version 5.1 no reboots (Zero-downtime) would be required when updating them to a newer version on a Microsoft OS (Vista and up).
Soon after they William Lam wrote a clarification blog post on the VMware blog which explained in more detail under what circumstances Zero downtime was needed when updating VMtools.
The blog post informs us that there are less VMtools update actions that need a reboot, but there are still some drivers or other components that will require a reboot when being updated or replaced.
From the vSphere (web) client you can update VMtools automatically and with advanced options you can suppress the reboot (even if it is required to complete the update).
But if you want to quickly update VMtools on multiple VMs at once then the easiest way is to do it by PowerCLI, by using a simple one-liner you can update them and suppress a reboot.

If you want more details on which update situations require a reboot, please take a look at the VMware KB2015163

vSphere 5.X Storage vMotion

VMware has addressed a lot of bugs with their Update 1a for vSphere (ESXi and vCenter), one of them being the long awaited "renaming" feature when SvMotioning a VM. This "feature" slipped into vSphere somewhere in version 4.X really as a undocumented feature, as it turned out it was pretty useful for online renaming of VM's. The team responsible for Storage vMotion thought differently and reported this as a bug that needed to be fixed, which they did with the introduction of vSphere 5.0.
After a lot of "complaints" from VMware customers around the globe they re-introduced the bug / feature again with Update 2 for vSphere 5.0, but the feature acted differently then before. The feature would now only rename the folder of the VM on the Datastore, but would not rename the files that make up the VM.
Now with vSphere 5.1 and Update 1a the latter is possible again, but not out of the box. You will need to add a advanced feature to the vCenter settings for it to work again. So now you have a choice if you want to use this renaming feature or not, which is a nice gesture but why leave it disabled by default. In my opinion it would have been better to have it enabled by default with the option to disable it. For sure there VMware will have a good reason why they didn't do it.
Anyway, if you want to use the feature and keep your VM names consistent with the corresponding folders and files on the Datastore you will have to add the following key "provisioning.relocate.enableRename" to the "Advanced Settings" in "vCenter Server Settings" the key needs to get the value "true".
After adding the key it will show up as "config.provisioning.relocate.enableRename"
And after closing "vCenter Server Settings" the renaming of VM files during a SvMotion should work.

30 May, 2013

Where is my VM ?

I think every VM admin has experienced the following situation; For some reason planned or unplanned your vCenter server is down, no real issue because it will not affect the VM's running in your vSphere environment. Even HA will continue to work.
But how do you know where a specific VM is located (on which host) when you don't have vCenter ?
I already had a small script which allowed me to search for a VM on a selection of hosts or all active hosts of a environment, because if you vCenter is down unplanned their might be some other servers / VM's that are also down with the same root cause. If this is the case you will not only have to worry about getting your vCenter back up-and-running but you will also get a lot of questions on how to find and access other unresponsive VM's. This will be a challenge for any Admin which solely relies on the vSphere (web) client, because this Admin will have manually and separately logon to every host with his vSphere (web) client and search the inventory per host to find the VM he is looking for.
This is not a big problem if the environment has 5 or 6 hosts but the problem gets bigger as the environment gets bigger, just imagine the amount of work and time it will take when you have to search thru 50 hosts !.
I had a "vCenter down" situation recently with a customer and some other VM's became unresponsive. These VM's did not respond to RDP and because vCenter was down there was no direct way to know on which of the 58 hosts ! the VM's were running. So I got my old small script to lookup some of these VM's and one of the Admins saw this and he asked if he could also run it to find some VM's. This Admin had little experience with vSphere so he had some trouble to provide the needed input into the script before it would work (ESXi hosts IP or DNS to connect to and name of the VM which it was registered with in vSphere) but with a little help he managed.
After this event I thought I would simplify the script by using pre-created files with host information of the vSphere environment and have a selection menu to choose from, the only info that you need to provide is the full name of the VM.
Of course you will need to create the files, but you can do this when vCenter is up and it is easy to retrieve this kind of information.

The files you need to create are one file per search selection, in my script I have 1 file for every Cluster and 1 file for the complete environment. The file is a plain text file with 1host IP address or DNS name per line.
For me this is a script (with or without the menu) that you want to have available to you in case of a emergency, it take little time to setup and will help you big time in case of a issue.

27 May, 2013

VM's grayed out (Status Unknown) after a APD (All Paths Down) event on NFS datastores

Last week during a change on one of the core switches of the NFS storage network at a customer, we ran into a big problem causing a outage of 50 % of all VM's for around 4 hours.
The problem started with a network related error, on which I will not elaborate other then the result was a unstable NFS network causing random disconnected NFS datastores on the majority of the customers' ESXi hosts. On top of that it also caused latencies, which triggered vMotion actions causing to ramp-up the latencies even more and resulting in a storm of failing vMotions.
In theory this would never have happend as the NFS network of the customer is completely redundant, but in real life it turned out completely different in this particular case.
After putting DRS into "partially automated" the vMotion storm stopped, the latency continued on the NFS network and this also had it's effect on the responsiveness of the ESXi hosts. Only after powering down the core switch (the one which had the change) all returned to normal status, datastores were connected to ESXi hosts again and latency disappeared. When looking into the vSphere client I found lots and lots of VMs that had a inaccesible or invalid status. When trying to power-on such a VM it would not work and you would get a "action not allowed in this state" message. The only way I knew to get them accessible again at the time was to unregister the VMs from vCenter (Remove from Inventory) and add them again to browsing to the .vmx file with the Datastore Browser and selecting "Add to Inventory". This was time consuming and tedious work, but the only quick fix in getting those VMs back into vCenter. Mind you, most of the VMs where still up-and-running but in no way manageable thru vCenter.
By the time I had all VMs registered again, some also needed a reboot as their OS crashed thru to high disk latencies. I was contacted by the vCloud admin, he had also lost around 100 VMs from his vCloud environment. It looked to be a other long task of getting those VMs back, but we faced a extra problem.  vCloud relies heavily on MoRef Id's for identification of VMs, in other words if the MoRef Id changes vCloud will no longer recognise this VM as it cannot match it to anything in its database.
But removing a VM from Inventory and re-adding it changes / updates its MoRef Id, so even if we wanted this quick fix I had could not be used on the VMs in vCloud. Luckily the vCloud admin found VMware kb1026043 it looked like VMware had the solution to our problem, but for some reason this solution was not working for us and it needed to have the host of the affected VMs in maintenance mode. It did help us with the search for a working solution, which was quickly after found by the vCloud admin on www.hypervisor.fr a French VMware related blog of Raphael Schitz. He wrote a article "Reload du vmx en Powershell" (Reload a vmx with Powershell) on how to reload VMs into Inventory without having the need for maintenance mode on your host(s), it all comes down to a PowerCLI one-liner that does the trick. You can alter the command to run it against entire Datacenter or just a Cluster.
In the end it saved our day by reloading all inaccesible and invalid VMs within just 5 minutes, this is a very useful one-liner as NFS is getting more and more used as preferred storage.

17 May, 2013

Database redundancy for your vCenter database(s)

The most important database within a vSphere environment is the vCenter database without a doubt. VMware therefore has enclosed detailed instructions on how to setup and configure this database, they have a guide for every supported database type. Recently I ran into a situation which made me believe that VMware "forgot" some details on this database configuration guide, at least when you have your vCenter database running on Oracle.
A customer has chosen to put their vCenter database on Oracle as this was their preferred database knowledge wise. And they set it up to also be resilient, the way they achieved this was by having a active and a standby database placed on 2 different database server in separated datacenters. To me it looked like a very solid solution. On the vCenter part they modified the TNSNames.ora in such a way it now included 2 database server addresses and also contained the parameters for connect-time failover and load balancing.
By doing this they made sure that vCenter could (almost) always connect to one of the two database servers, it would simply do a failover when the connection time would expire. In this case the failover would not have been quick enough to keep vCenter up-and-running but it would need a reboot (or at least a restart of services) to get connection again. But this would not affect the running VMs at all.
For maintenance purposes to the database servers, we had to switch from the active server to the backup server. As this was a planned action, we could first gracefully stop the vCenter services and after switch to the standby database server. After the switch all vCenter services were started again and vCenter went up-and-running like it supposed to do.
One issue that occurred during this database server switch was that VMware Orchestrator, which was installed on a separate server stopped working, logging all kinds of database related error's. With a quick look at the database configuration of Orchestrator I remembered that it could not cope with multiple database server addresses and was set to connect to the database server that now had become the standby. By changing the database server and starting the Orchestrator services again this problem was solved.
At least until the next day when I took a look at the vCenter Operations dashboard and found that the health of vCenter was 0

When I looked into more detail on what caused this I found VMware vCenter Storage Monitoring Service - Service initalization failed on only thing I found that could link this alert to the database failover was the timestamp, it was recorded right at the same time the failover had happened.

Not really knowing where to start investigating on the vCenter server, I first tried to find some information on the VMware KB and the first article that came up described the exact same error message. When reading kb2016472 I quickly found confirmation that this issue was related to the database failover although it refers to vCenter 4.X and 5.0 with the use of a SQL database instead of vCenter 5.1 / Oracle database.
It appears that this vCenter Storage Monitoring Service does not use the TNSNames.ora for the database connection, it has it's own configuration / connection file called vcdb.properties. This file has only the first of the two database server addresses.
Thru the information in the KB article I knew what to change to get the connection set to the backup database server, and after a restart of the vCenter Server service the vCenter Storage Monitoring Service initialized ok and started without any error.
So my conclusion is that even when you have redundancy or failover setup on vCenter database level, there are still some vCenter related products and services that need some manual action to continue to work in case of a (planned) database failover.

16 May, 2013

VMware Horizon View Optimization Guide for Windows 7 and 8

VMware has published a new Horizon View Optimization Guide for VDI / View Windows 7 and 8 Virtual Machines. This guide contains recommended configuration settings to optimize the OS and provide a overall better scalability and performance in a Horizon View environment.
Horizon View 5.2 supports the Windows 8 Metro style user interface and supports the basic touch gestures in a View client running on a Intel Surface tablet.
With Windows 8 you can improve the performance by using the new services state change "Manual (Triggered Start)", services are started when triggered the user accesses a component that requires the service to be started.

New SDS (Software Defined Storage) solutions introduced

Just a quick post on new products / solutions introduced or going to be introduced. EMC introduced ViPR at EMC World convention last week. Great articles on ViPR you can find on the StorageIO blog by Greg Schultz, he wrote a 3 part series on this new solution of EMC "EMC ViPR virtual physical object and software defined storage (SDS)".
A other take on this new solution has been written by Duncan Epping on his blog "EMC ViPR; My take".
Also Netapp gave notice that they are going to introduce Cobra Commander during Q3 2013, maybe just in time to have some interesting information presented in one of the Netapp related VMworld Europe sessions....

15 May, 2013

Have a look in the crystal ball to predict the future (capacity) of your vSphere environment...

A question regularly asked, how much resources does my vSphere environment have at the moment. Or more specific, how many VMs can I add to my environment and still make sure that I have enough resources available in case of host(s) failure ?
It is a viable question that is often asked, but the answer is not really easy nor straight forward. To be clear if you want this kind of realtime information you should be looking at a analyses tool like VMware vCenter Operations or Dell vOPS (formerly vKernel / Quest) these tools can (when you purchase the right licenses) give you the exact amount of VMs you could add to your environment while retaining the needed failover capacity, vOPS even has the ability to "play" what if scenarios and have the results on these before you actually do anything in real life to your environment.
I will try to write one (or more) blogs post in the near future after I have taken these tools for another test run.

In the mean time, the question remains. Or if your company can't or does not want to invest a considerable amount of money in one of these tools, and believe me they are pricey (but what they provide on useful information sure makes up for their price).
How many VMs can I add to my current environment keeping in account the needed (or requested) failover capacity ?
For me a alternate way of getting a idea on how many VMs I can add to a specific environment is by running a PowerCLI script that rounds up a average CPU and Memory use of the current VMs and you manually specify what the maximum CPU and Memory resource usage per cluster may be (Only for non HA clusters).
On a side-note; In theory you should be able to calculate how many VMs you could add by using the HA slot size , but in real life there are to many dependencies and variables for this to be useful especially if you don't use reservations on your VMs. Anyway I am not going to write to much about HA, DRS slot sizes, reservations.... because there a other (Dutch) people that can (and actually did) write books on these topics. A must read for anyone who want's to be a better VMware admin, consultant or architect is VMware vSphere 5.1 Clustering Deepdive by Duncan Epping and Frank Denneman.
So back to the quick, simple and somewhat educated guess way of getting a idea on how many VMs you can add to your environment. As said earlier I think the quickest way of getting such a result is by having a PowerCLI script get the average values of all VMs currently running and set the maximum  resource usage allowed by yourself.
The script I use is fairly simple and quick to run, and will give you results that you can use as a sort of guideline when you get this question and you need to answer it.

By default the output is directly to the CLI, but you could also output to csv, xlsx or HTML if you want. The CLI output will look something like this.
Both clusters in the screenshot above have more then enough CPU resources, but when you look at the MEM resources they both are around 10% growth allowed based on the current load and failover level. These results are from HA enabled clusters, so no manual config needed. The script uses in this case the "EffectiveCpu", "EffectiveMemory", "FailoverLevel" and "NumEffectiveHosts"from "DasConfig" to calculate the results.

13 May, 2013

VMware snapshots good and bad....

Almost every IT admin "knows" or at least thinks he knows what a VMware snapshot is and how to use it (in a good and sensible way) .
I hope that the myth about a VMware snapshot can be used as a full backup has died all over the IT world, because that is the worst myth in VMware land.
So we all know snapshots can be very helpful in day to day operations, like quickly reverting to a snapshot when a software patch or update turns out bad. It is also widely used in the process of backing up a VM, most 3rd party backup solutions use a snapshot to capture a VM in a stable state during their backup operation.
But it is far less known that a snapshot can also be very difficult when there has something that went wrong with one of the snapshots of a VM. You could potentially lose a lot of data when a snapshot (delta) file becomes unusable, not to mention the loss in performance the VM has when it has one or more snapshots active.
So VMware snapshot are very useful, safe and good when used in the right way ! Use them in a way that they are designed / intended for. This is very easily said and done within a small environment with only one or two VMware admins, but it becomes more prone to error when the environment is larger and or if there are more VMware admins (or other IT persons that have the permissions to create snapshots).
If you look at a bigger company with a larger VMware environment, most of the time there will be separate departments for IT infrastructure, hardware and software. In a lot of cases the IT persons working in the software department will have the permissions to create (revert to and delete) snapshots, in the same cases the IT persons working in IT infrastructure department are responsible for keeping the VMware vSphere environment running and healthy.
VMware has acknowledged the risk that having snapshots without knowing brings along and included a vCenter alarm function for it since vSphere version 4.X but this alarm has snapshot size (in GB) as trigger. Note that the growth of a snapshot is linked to the changes made on the VMs disks, so it is not directly linked to age of the snapshot.
Also note that this particular alarm is known to be a bit unpredictable since version 5.1, please read VMware kb 1018092 for details.
To keep track of snapshots in a environment where there are more persons allowed to create them then there are to manage the environment you might want to have some additional tools to do this.
You could use RVtools (by Rob de Veij) and manually run this on a weekly or bi-weekly schedule, RVtools is a great tool to check your environment and have results presented in a "Excel" style. I use it a lot when I need to assess a customers VMware environment prior to the start of a project. You could also use Powergui with the community powerpack which includes a script to report snapshots older then X days.
But in this way it will be a manual repeating task for the VMware admins and with the results you have two options, you either contact the creator of the snapshot (if you have this information) and ask them to remove the snapshot when it is no longer needed or have them remove the snapshot directly because it is very old / very big. Or you contact the creator and ask them what you should do with those snapshots, either way you will be getting a lot of work managing the snapshots created by others.
By the way the only way to know who created the snapshot (if the creator did not mentioned it in his snapshot description) is by getting the info from the vCenter event log.
As a extra challenge with the introduction of vSphere 5.1 there is a possibility to Storage vMotion a VM with a active snapshot, this is good thing of course. But the downside is that when you do Storage vMotion VMs (for let's say maintenance reasons on a storage device) the process will end up consolidating all snapshots of the VMs on the target datastore, in other words the snapshots will be deleted !
So why not automate this process and have the creator receive a email message with all needed info about the snapshot that is over due (older then X days) and also receive a overview of these snapshots with creators yourself as a reminder. If the creator ignores the email and decides to keep the snapshot, he (or she) will receive the same message again on the next run. You could even create two separate triggers the first as a reminder of the snapshot and a second one more as a warning.
Below you will find a PowerCLI script which retrieves the creators of the snapshots older then X days and will lookup the email address in the Active Directory. After it will send a email message to the creators each one specifying the details of the snapshot and VM it belongs to. It will also send a overview email to one account (VMware administration usually) so you can keep track of the active snapshots in your environment.

Hopefully the reminding of active snapshots will make the creator more aware and will they do a better job of cleaning up snapshots no longer needed. After all the weakest link in IT is the object between the chair and the desk....

Are there still use cases for custom attributes within a vSphere 5.1 environment ?

VMware has introduced "tags" with the release of vCenter 5.1, during a "what's new" presentation at VMworld one of the speakers mentioned that these tags could be seen as 2.0 replacement for custom attributes. Tags do offer a lot more then custom attributes, for instance they will be indexed by the search feature of the vSphere web client and they can be used on (almost) all objects in a vSphere environment. Custom attributes can not be searched within the vSphere (web) client and can only be used on Virtual Machine objects.
But is the statement also true when looking at VMware vSphere 5.1 environments currently running in production at VMware customers ? Probably not at the moment, I believe custom attributes will disappear eventually but at the moment there are to many (legacy) 3rd party applications and (PowerCLI) scripts that still use custom attributes.
A good example for the use of custom attributes by 3rd party applications are backup applications, Netbackup and also Veeam can write the last backup result of a VM to a custom attribute value.
If wanted / needed you could create a export of these values to create a report by the use of PowerCLI.

During a recent project a customer was changing their backup application and procedures, the formerly used Netbackup in a disk-to-disk-to-tape way. To the new solution for this customer, which became Netapp Snap Manager, this provides very fast and full backups at storage level. The only downside for the VM admins is that they no longer where able to "see" backup details, such as last backup time and result. This used to be written by Netbackup in a custom attribute value per VM and also exported csv file to update a status webpage.
The Netapp solution does not provide such thing out-of-the-box, so it was PowerCLI to the "rescue". I started to work on a script that would add a custom attribute to every VM and add the timestamp of the last successful backup. Note that Snap Manager uses a VMware snapshot in it's backup procedure, so I only needed to find the removal of the snapshot created by the Netapp service account in the event log per VM to retrieve this data. After some testing and searching the internet for some extra info, I stumbled upon the "Who created that VM?" script by Alan Renouf .
This script retrieves the user account which created that VM and the timestamp of the creation action. The username is then used to retrieve the full name from the Active Directory, the full name and timestamp info are then written to 2 custom attributes. I saw the potential of knowing who and when a VM was created, especially if you want to report on growth of your environment (will write on this in a later post). So I took the main part of Alan's script, modified it to suite the customer needs and also added my last backup part in that script.

The VM admins can now easily see from within the vSphere client when a VM was created, by who and when the last successful backup was made and it will also create the csv file needed to update their internal status webpage.
So in this particular case there is still a very good use case for custom attributes, in a later post I will go into more detail on how to use the value from the "CreatedOn" custom attribute to create a report which shows with how much VM's your environment is growing per month.
For those who want to use my script or use parts of it, please do but there are some points of attention:
1 To get the full name from the AD the script uses Quest AD cmdlets to resolve the username to full name.
2 If a VM is removed from the vCenter inventory and later added again the creator name will be the person how has re-added the VM not the original creator.

In general when you want to add / attach informational data to a VM by the use of (PowerCLI) scripting, you still will want to use custom attributes instead of the newer tags. But keep in mind that custom attributes can only be made visible in the vSphere web client if you run them thru the "migrate to tag" wizard.

06 May, 2013

VMware Tools and their Upgrade Policy

With the introduction of vSphere 5.1 VMware has made some changes to going about upgrading VMtools in your VM's. Prior to version 5.1 a reboot of the guest OS was mandatory to complete a VMtools upgrade, but now that we have version 5.1 this is no long required.
Keep in mind you need to run the latest VM hardware version and also run the VMtools version that goes with vSphere 5.1 to be able to do a upgrade without a reboot.
All of this is great but still some things remain, how do you get all your VM's up to the correct VMtools version easy and how can you automate upgrading once VMware releases a new VMtools version
and how to make sure you run the latest version of VMtools on imported VM's or Appliances ?

vSphere has VMtools options / features within the VM properties which you can set and one of them is called "Check and Upgrade Tools during power cycling" and this option has been in vSphere since at least version 4.1, but to my knowledge not used a lot.
When you check this option box the VM will check and if there is a newer version of VMtools available upgrade upon every reboot. When looking and VM's with a Microsoft OS, those will need to reboot to complete their patches and updates on a regular basis. So why not enable this option in you VM's ?
I find that keeping your VM's up-to-date when it comes to (security) patches important and for me VMtools is something a lot of VM admins tend to forget in this process. But if you run a large vSphere environment you don't want to go and open the properties of all VM's and to enable this feature !
That's were PowerCLI comes in handy, because you can script the enable (or disabling) of this option.

In the first example script you can set the option on a per Cluster basis

You could change this script so it suites you needs, for instance you could use it to only set this option on a selected number of VM's which you gathered in a CSV file. Like in the example script below.

After all changing such a small script is done a lot quicker then having to go thru all of your VM's one by one to set the option, just imagine having 1000+ VM's in your vSphere environment !
This option is still valuable in vSphere 5.1, although you don't need a reboot anymore when upgrading VMtools it is still a task that is easily forgotten. A other way of making sure you don't forget is by the use of VUM (VMware Update Manager), but this takes some time to setup a baseline and baseline groups. That is if you already have VUM installed in vCenter.