27 February, 2014

VMkernel port challenge on a Distributed Switch

Recently I was working on a project which consisted of adding new hosts to a existing vSphere 5.1 environment. As the form-factor and specs where far from what the customer already had in-place, the new hosts were put in new HA-clusters. Because the new hosts have 4 10GbE NIC's, the virtual networking design had to be re-designed.
I designed one Distributed Switch for all traffic and for all clusters, regardless of the purpose of the clusters. Security wise this was agreed on by the customer, they only needed logical network separation in the form of VLAN's. This would keep the design and also the physical network configuration fairly simple.
One of the customers' requirements was that the pace they could vMotion should be a lot higher then on their current hosts, having a large amount of hosts they wanted hosts to be able to go into Maintenance mode quickly. They also wanted to speed up the patching and updating of the hosts by Update Manager.
So within the Distributed Switch design I added the use of multi NIC vMotion, I got a lot of information from the blogs of Frank Denneman and Duncan Epping especially these two blog posts How to setup Multi-NIC vMotion on a distributed vSwitch and Multiple-NIC vMotion in vSphere 5…. Of course NIOC (Network I/O Control) is also used to control / guarantee required bandwidth to the various sorts of network traffic.
Because the new hosts will use a newly created VLAN for vMotion, but the current workload needs to be moved from the current hosts to the new hosts (by the use of vMotion to prevent VM down-time) there is a challenge, vMotion traffic does not route! A simple solution to this is temporarily using a extra VMkernel interface for vMotion traffic which is in the VLAN that is also used on the current hosts for vMotion and removing this after the workload is completely moved.
All of the multi NIC vMotion VMkernel interfaces were created by the use of a PowerCLI script, which you can find in a this previous post.
The temporarily created vMotion VMkernel interfaces on the other hand were done manually and on one host something went wrong, not sure what but it looks like a duplicate IP was used for the interface. So to correct I first removed the VMkernel interface, so it would not interfere the environment anymore. After I saw that there were more IP related issues on this host and at some point the host lost its connection to vCenter, I used both a direct vSphere client to host connnect as well as the DCUI to straighten things out and get at least the VMkernel for management back up and running with the correct IP. Now keep in mind this is still all on the Distributed Switch, so when the hosts successfully re-connected to vCenter it had a error the Switch configuration was not in sync with vCenter. This synchronization takes place every five minutes usually, after some time passed the error went away and all was good.
So I could start to re-add the VMkernel interfaces for multi NIC vmotion, I ran my script and this resulted in error's. I tried to add the VMkernel interfaces manually, but this also resulted in error.
It looked like I would not be able to sort this out from the vSphere client and a the quick fix from VMware I found on the internet was to restart the vCenter service, not a possibility at the time for me. So I resorted to the CLI and esxcli.
I connected to the host by ssh and listed all its VMkernel IP interfaces with "esxcli network ip interface list" which resulted in the following output:

vmk0
   Name: vmk0
   MAC Address: 00:50:56:6a:21:2e
   Enabled: true
   Portset: DvsPortset-0
   Portgroup: N/A
   VDS Name: dvSwitch_PROD_02
   VDS UUID: 78 94 06 50 7b 69 04 20-9e 3c 07 0f e2 0f cf f5
   VDS Port: 9059
   VDS Connection: 767794954
   MTU: 1500
   TSO MSS: 65535
   Port ID: 50331658

vmk1
   Name: vmk1
   MAC Address: 00:50:56:61:57:86
   Enabled: false
   Portset: DvsPortset-0
   Portgroup: N/A
   VDS Name: dvSwitch_PROD_02
   VDS UUID: 78 94 06 50 7b 69 04 20-9e 3c 07 0f e2 0f cf f5
   VDS Port: 9436
   VDS Connection: 415422000
   MTU: 0
   TSO MSS: 0
   Port ID: 0

So there indeed are not one but two VMkernel interfaces, only one shows up in the vSphere client. If you look at the output you see that vmk1 has Enabled:false this is probably why its not visible in the vSphere client.
With esxcli network ip interface remove --interface-name=vmk1 it should be possible to remove this hidden VMkernel interface. After running the command I checked the VMkernel IP interfaces of the host again:

vmk0
   Name: vmk0
   MAC Address: 00:50:56:6a:21:2e
   Enabled: true
   Portset: DvsPortset-0
   Portgroup: N/A
   VDS Name: dvSwitch_PROD_02
   VDS UUID: 78 94 06 50 7b 69 04 20-9e 3c 07 0f e2 0f cf f5
   VDS Port: 9059
   VDS Connection: 767794954
   MTU: 1500
   TSO MSS: 65535
   Port ID: 50331658
~ #

Problem looks solved, after I was able to re-add the multi NIC vMotion VMkernel interfaces again witout any problem. In the end all it took were a couple of esxcli commands.

No comments:

Post a Comment