12 February, 2014

Orphaned VM after failed migration

When a VM is orphaned, usually I check on which datastore, cluster and folder  the VM is placed. Then I remove the VM from the inventory and add the VM again through the datastore browser (or by using PowerCLI, check at the bottom of this post).
Lately I am seeing orphaned VM's at a customer that cannot be (re-) added because the .vmx has a lock on it by a other process / host.
vCenter "thinks" the VM is still active on this host, when trying to go in to maintenance it will stop at 80% when you send a shutdown command to the host on the DCUI it will notify you that there are still VM's active on it. And from the VM' standpoint it is still active, the VM will still be running and accessible by RDP for instance !
But when you look at esxtop on that host and look at all the VM process still running, there will be no VM process active.
A quick and somewhat dirty way is to get the host to go in to maintenance mode, most of the time this will stall at eiher 65% or 80% at this timt you will see the only remaining vm on this host is the orphaned VM, login to the DCUI (or use Powercli or CLI) to send a reboot command to the host. The host will tell you that you still have active VMs on it, ignore this and have the host reboot anyway. The VM will be forcefully powered off.
After the reboot, you will find the host in maintenance mode with 1 powered-off VM present, this was the previously orphaned VM.
Take the host out of maintenance mode and power-on the VM, it will probably boot normally and if used DRS will move some VM's to the host to balance out the HA-cluster the host is in.
The method described above is as mentioned before a "quick and dirty" way and should be a last resort option in my opinion as it results in outage eg. downtime for the VM in question.
So before you go with this method make sure that there is other way to resolve the issue, for instance if there is no lock on the .vmx you could easily re-register it to vCenter without any reboots needed. If you need to do this for a larger number of VM's than have a look at one of my previous posts "VM's grayed out (Status Unknown) after a APD (All Paths Down) event on NFS datastores".

No comments:

Post a Comment