24 September, 2014

vCenter Orchestrator loses VM networks after host renaming

Last week I was asked to have a look at a VCO workflow issue. There was a issue with workflows used to deploy VM's, the workflows would fail on specific hosts. One of the customers' VMware administrators found that the workflow stopped at the point where a VM had to be linked to a specific VM port group.
This happened with any selected VM port group (VLAN) available within the workflow, these workflows have a automatic host selection a manual selection can also be made. After running workflows with manual host selection some hosts were found which completed the workflow successfully.
When verifying what the difference was between the hosts, it became clear that the hosts that failed the workflow where recently renamed.
The customer uses a dVswitch for all network traffic across all hosts within the HA-clusters. During the renaming you have to disconnect the ESXi host from vCenter and re-connect after the renaming,  a PowerCLI script was used to automate the renaming process, a similar script can be found here.
During the renaming there had been a issue with the hosts upon reconnecting to vCenter. After renaming hosts reconnected with a dvSwitch error message, to get rid of this error you manually re-add the host to the dvSwitch. After all the hosts network looked OK, nevertheless this was a good reason to take a better look at the network configuration of those renamed hosts.
One detail which stood out, was the color of the dvUplink interfaces. When all is fine they are coloured green, but when for instance the Physical NIC used by the Uplink is disconnected the color turns to white as shown in the picture below for dvUplink2.


 Now with the renamed hosts it was not one dvUplink, but all 4 dvUplinks where coloured white. Strangely enough the VM's hosted on these hosts had a fully functional network connection, so as expected none of the physical NIC's was disconnected.
One of the VMware administrators tried to get all dvUplinks "green"  again by simply removing and re-adding the vmmic from the dvUplink, this seemed to work all dvUplinks came back "green" again. Unfortunately the Orchestrator workflow persisted, after the actions above and none of the VMware administrators (me included) had any idea's on how to solve this issue so a support case was opened with GSS.
After the usual "please upload logfiles" steps, the problem was quickly solved during a Webex session. The solution was to force update the dvSwitch configuration across all hosts connected to this dvSwitch.
So how to you push the configuration or how do you forcefully update the configuration on ESXi hosts, simple just add a temporary new dvPortgroup to the dvSwitch. By adding a dvPortgroup all connected ESXi hosts get a updated dvSwitch configuration.
This solved the Orchestrator workflow issues finally, I can imagine that this updating of the dvSwitch configuration could also be of help in other dvSwitch "out of sync" kind of issues.
I will be trying next time I run into such a issue.

No comments:

Post a Comment