Virtual-Stones Blog: 2014

18 December, 2014

Microsoft Remote Desktop Services issue

This is clearly not a usual virtualisation related blog post. But every now and then I'm asked by either a customer or colleague to have a look at a non virtualisation issue they run into, mostly because the issue is occurring on a virtualised server.
In this particular case for some unknown reason, overnight all RemoteApp programs serviced from their "Terminal Services" server suddenly stopped working.
When users started a RemoteApp they where prompted for their username and password, after entering these nothing happened. Looking in the Application event log two events were logged for each RemoteApp start attempt.

When looking at the first information event (ID 9003) it looks like the Windows Desktop Experience feature is not installed or the service is not started.

The second event (ID 9009) is a logical result from the first event, the Window Desktop Manager exits.
When searching for the cause of event ID 9003 all I found was that indeed the Windows Desktop Manager Experience probably was not installed or the RemoteApp was being started within a Remote Desktop Session.
When looking at the installed Windows features it turned that the feature was installed, I also verified that the service belonging to this were also started and no Remote Desktop sessions were used.
The next obvious thing was to recreate one of the RemoteApp programs, to see if this would solve this issue. Unfortunately this did not solve it.
Then I started thinking about the description of event ID 9003 "The Desktop Window Manager was unable to start because a composited theme is not in use" and the fact all Internet resources linked this to either the Window Desktop Feature or a not supported desktop theme on the client.
For this customer there were no changes to the desktops, although a small change due to a Windows Update / Patch could be possible considering the RemoteApp Programs stopped working at the same time on all workstations.
I started searching on the server part where I could define the settings for the Desktop Experience or for the allowed desktop theme's.

I started at the configuration of the Remote Desktop Session Host server, I opened the properties.

And going through the tabs I ended up at the Client Settings tab, were you find the Color Depth limit setting. This setting was set to 32-bit per pixel, I removed this limit by deselecting it.

As a result all RemoteApp Programs started working again on all workstations. About the root cause I still have no clue, for now I will keep it at a patch or update on the "Terminal Services" server.

28 November, 2014

Virtual Ethernet card "Virtual Ethernet Adapter" is of type which is not supported

Last week I was asked if I could solve a issue at a customer, they where patching all of their ESXi hosts. Only they ran into a host which refused to go in to maintenance mode, the reason for this was a VM which didn't vMotion to a other host successfully.
Basically, the vMotion task timed-out. When they investigated the error details a more specific error description.

It looked like there was something wrong with the network adapter installed in this specific VM. But when the configuration of this VM was checked it turned out that it had the customers default network adapter installed, VMXNET3.

Having a second look at the VM configuration, the Guest OS was set at Windows 2003 standard for some odd reason. After all this VM was running Windows 2008 R2 Enterprise.

After shutting down the VM and correcting the Guest OS, the VM was powered-on again. Now the VM could be vMotioned without any issue.

Already knowing that it is important to select the proper Guest OS to have VMtools install correctly and to have the right default virtual hardware presented to the VMs OS. There is a extra reason to pay extra attention to choosing the right Guest OS for a VM.

If you want to read more about the impact of a mismatched Guest OS selection, please read this blog post of Frank Denneman.

In this blog post you can also find a PowerCLI "one-liner" by Alan Renouf to quickly scan your environment for mismatched Guest OS.

05 November, 2014

A personal introduction to Diablo Technologies @VMworld

A couple of weeks before VMworld Europe, I got introduced to Diablo Technologies. Before that I had heard of the company but did not really know what they did.
I got the opportunity to have a meeting with Kevin Wagner, VP Marketing during VMworld to talk about Diablo Technologies, their current solutions and yet to be released solutions.
After I got introduced, I wanted to find out what Diablo Technologies is all about and what their relation is to virtualization. Looking into their website I found out they have a technology called Memory Channel Storage, which sounds great but I could not really figure out how this technology is actually used. I found it was Flash storage that sits in a DRAM slot, but it was not clear to me how you could leverage this Flash storage.
This changed quickly after meeting with Kevin, I was expecting a marketing talk which hopefully also would bring some technical details along. This was not the case at all, of course there was a fair amount of marketing info. After all I did not know the company nor their technologies really well, the meeting was for the largest part technical.
So what's Memory Channel Storage all about then, the technology is based on a combination of hardware and software (driver). The hardware, this looks somewhat like a DDR3 DRAM module with a additional heatsink attached to it.

Only this DRAM module there is no DDR3 RAM on it, but their is 19 nm MLC NAND (Flash) memory on it accompanied by some controllers, NAND and controllers are provided by Sandisk. These controllers make it possible to present this Flash memory as block-based storage within the servers OS, additionally a driver within the OS is needed. Currently drivers are available for Microsoft Windows, VMware ESXi and a variety of Linux distributions.
So two questions popped to mind, how much Flash is on such a module ? And Flash memory in a DRAM slot presented as block-based storage, how does the server (mainboard) recognise this correctly ?
The MCS modules currently are available in 2 sizes, 200 GB and 400 GB. There are ideas to also add a 800 GB version to this. So 200 GB and 400 GB Flash modules are adequate sized for using as a cashing solution, would it also be possible to use it as a real storage solution like you can do with SSDs ? More on this later, because the other question was how is it presented as block-based storage.
Well to be able to do this Diablo Technologies has incorporated controllers on the module which act as a disk controller, also a driver within the servers OS is needed to control the way the module(s) work. But you are still left with some module plugged into a DRAM slot on the mainboard, for this to be detected as MCS module instead of a DRAM module a minor change is needed to the mainboards UEFI (Unified Extensible Firmware Interface) which is the newer version of the main mainboards BIOS.
My two questions were answered, but a now other questions came up.
A change to the UEFI/BIOS of a server means you can not buy MCS modules and start plugging them in to your existing server hardware ! So what hardware do I need to buy, which OEM's does Diablo Technology work with ? And how do I fit these MCS modules together with DRAM modules in a server ?
The biggest OEM partners of Diablo Technology are IBM and SuperMicro, there are others and the OEM base is growing. It would be great to see the possibility for MCS in HP, Dell or even Fujitsu compute hardware if you ask me!
MCS module placement, as previously mentioned the modules are plugged into DDR3 DRAM slots and the way you can place them is exactly like you can place DRAM modules. This means a DRAM slot can either be occupied by a DRAM module or a MCS module, in other words you will need to have a server mainboard with enough DRAM slots to accommodate both types of modules.
There are a couple of things you need to take in account, so why would you choose a MCS solution over a SSD disk or PCIe SSD solution ? Simple, the memory bus is very close to the CPU and therefore performs with a higher speed / lower latency then on a PCIe or SATA / SAS bus connection. On top of that each MCS module uses only one memory channel, just like a DRAM module does. Because of this it can make use of parallelism, so instead of having 800 GB of flash available when you plug-in two 400 GB MCS modules you only have 400 GB of flash with both of the modules communicating through separate memory channels, guarantying a very low response times even when servicing heavy I/O loads.
So SSD's result in low latencies when used in servers, we all know this. This is in the <1ms ranges in the most optimal situations. Latencies when using MCS technology are in the 4μs to 5μs range, but these numbers don't tell the complete story. The Diablo Technologies MCS is best at keeping ultra low latencies even with high throughput.
When looking at VM latencies running on a vSphere hypervisor it's not only the use of the memory bus that get's the very low latencies, when you look at how a traditional storage stack is buildup compared to a MCS storage stack you can easily see that the I/O path through the MCS storage stack is much shorter and passing less components. This reduction also reduces the total latency.

Now if we take a new storage technology based on local storage, VMware's VSAN. This uses SSDs for caching and spinning disks for persistent storage, so with MCS being presented as block-based storage you could use MCS for caching. This possibly could improve the already impressive VSAN performance.
But if we look a bit further down the road, VMware has announced it will probably support full flash VSAN configurations in the "2.0" release. Just imagine the performance you could get when you would use MCS in a full flash VSAN.
Also further down the road for Diablo Technologies, they have Carbon2 just around the corner this is their MCS solution based on DDR4 memory channel interface, which will bring MCS to the newest server architectures.

I must say I was really impressed with the technology and would recommend you to have a look at Diablo Technologies if you are searching for a high throughput / ultra low latency storage solution. Diablo TTechnologies offers a possibility to "Test Drive" MCS, they will set you up with a time slot to try out MCS running in their own servers based at their HQ in Canada.
Both IBM and Sandisk have their own name for the MCS technology, at IBM it's called eXFlash DIMM and Sandisk runs with ULLtra DIMM.

27 October, 2014

21st VMUGBE+ Meeting

Over the last couple of years I have been attending VMUG meetings, they are all interesting although some have more interesting sessions than others. I always found the Belgium VMUG to have very good sessions.
The upcoming meeting promises also to be a good one, especially if you are into EVO:RAIL, VSAN and/or NSX and missed VMworld. Please check out the details below !
For those who are attending this VMUG meeting, see you November 21!

VMUG BE Agenda:

If you are a VMUG member you can register here for this free meeting. You can also find the location details on that page.

21 October, 2014

vSphere Distributed Switch health check gotcha !

Last week at a customer I was asked to have a look at a issue that they recently been having regarding the Distributed Switch health check feature.
This customer uses this feature to regularly check if all physical network connections are providing all the VLAN's configured on the Distributed Switches in their vSphere environment. About two weeks ago the health check suddenly started notifying errors on all hosts connected to a specific Distributed Switch. The error was a missing VLAN and a MTU size mismatch, the VLAN missing on the physical switch ports which connected to the physical NIC's of the hosts.
When I looked at the details of the error, I saw the missing VLAN was VLAN 0. This got me interested as usually you don't use VLAN 0 for standard network traffic and therefore you don't see it often in vSphere on a Distributed Switch which handles Virtual Machine network traffic.
When I checked the Distributed Switch configuration, more specific the dvPortgroups configured on it I found a dvPortGroup named "name_test" there where no VM's using it and when I looked at it's configuration I found it had the VLAN type set to "None".
After asking I learned that this dvPortGroup was created as a test during troubleshooting an other issue, after troubleshooting it was left in place. So there was no reason for leaving it there, I removed this dvPortGroup and after a refresh I checked the health status of this Distributed Switch.
To no surprise the health check showed up without any error's this time, so not only the missing VLAN error got solved but also the MTU size mismatch got solved in this.
So my take away on this is, when you as a vSphere Admin use the Distributed Switch health check feature to keep tabs on the status of your virtual to physical network please keep a tight procedure when it comes to changes on the Distributed Switches! And if you have to add a dvPortGroup for testing purposes please do not add it with the VLAN type set to "None", but add it with a VLAN that is available on your physical network interfaces.

24 September, 2014

vCenter Orchestrator loses VM networks after host renaming

Last week I was asked to have a look at a VCO workflow issue. There was a issue with workflows used to deploy VM's, the workflows would fail on specific hosts. One of the customers' VMware administrators found that the workflow stopped at the point where a VM had to be linked to a specific VM port group.
This happened with any selected VM port group (VLAN) available within the workflow, these workflows have a automatic host selection a manual selection can also be made. After running workflows with manual host selection some hosts were found which completed the workflow successfully.
When verifying what the difference was between the hosts, it became clear that the hosts that failed the workflow where recently renamed.
The customer uses a dVswitch for all network traffic across all hosts within the HA-clusters. During the renaming you have to disconnect the ESXi host from vCenter and re-connect after the renaming, a PowerCLI script was used to automate the renaming process, a similar script can be found here.
During the renaming there had been a issue with the hosts upon reconnecting to vCenter. After renaming hosts reconnected with a dvSwitch error message, to get rid of this error you manually re-add the host to the dvSwitch. After all the hosts network looked OK, nevertheless this was a good reason to take a better look at the network configuration of those renamed hosts.
One detail which stood out, was the color of the dvUplink interfaces. When all is fine they are coloured green, but when for instance the Physical NIC used by the Uplink is disconnected the color turns to white as shown in the picture below for dvUplink2.

Now with the renamed hosts it was not one dvUplink, but all 4 dvUplinks where coloured white. Strangely enough the VM's hosted on these hosts had a fully functional network connection, so as expected none of the physical NIC's was disconnected.

One of the VMware administrators tried to get all dvUplinks "green" again by simply removing and re-adding the vmmic from the dvUplink, this seemed to work all dvUplinks came back "green" again. Unfortunately the Orchestrator workflow persisted, after the actions above and none of the VMware administrators (me included) had any idea's on how to solve this issue so a support case was opened with GSS.
After the usual "please upload logfiles" steps, the problem was quickly solved during a Webex session. The solution was to force update the dvSwitch configuration across all hosts connected to this dvSwitch.
So how to you push the configuration or how do you forcefully update the configuration on ESXi hosts, simple just add a temporary new dvPortgroup to the dvSwitch. By adding a dvPortgroup all connected ESXi hosts get a updated dvSwitch configuration.
This solved the Orchestrator workflow issues finally, I can imagine that this updating of the dvSwitch configuration could also be of help in other dvSwitch "out of sync" kind of issues.
I will be trying next time I run into such a issue.

10 September, 2014

Exciting VMware revelations which make VMworld Europe a valuable event

Why would you go to VMworld Europa when all new product releases and most other revelations already where shown at VMworld US ?

VMworld US is the bigger event, more days, more sessions and a bigger venue.
The answer is simple if you ask me, when a new product (version) is introduced there is not always all technical info available on this product. For instance at VMworld US EVO:rail was introduced and at VMworld Europe there are at least three sessions scheduled, with one of them being a technical deep dive (the kind of session you want to attend to right?).

In short; EVO:rail is a management layer on top of the usual vSphere tools, wizards are what make this management layer standout. And it will only be bundled with standardised hyper converged hardware build by selected vendors.
EVO:rail is based on Enterprise plus licensing and also has VMware's Log Insight "build-in".

For some of the announcements it takes some time to sink in or get your head around, like for instance VAIO I/O filters, it has been announced at VMworld US. You can read some info about it on this blog post written by Cormac Hogan. At first it did not attract my attention immediately, only after reading press releases and some blog post (like Cormac's I got interested. If I had been at VMworld US, I would probably have skipped sessions regarding VAIO. But with the extra info I now have, I put a VAIO related session on my "must see" list.
Please check out my blog post "My take on interesting sessions @VMworld Europe" for my complete "must see" session list.

I am planning on writing more on EVO:rail and VAIO (amongst other topics) during and after VMworld Europe.

09 September, 2014

My take on interesting sessions @VMworld Europe

Like most VMworld attendees I also have been busy building my session schedule. This can be a rather time consuming task. There are so many sessions to choose from, there is also the Solution Exchange, Hands On Labs (HOL), Certification possibilities and meeting up with peers at the Hang Space.
You can fill up your entire schedule with sessions, but that would leave you very limited time to engage any (or all) of the activities mentioned above. There are numerous blogs where the writer has presented it's "must see" sessions list for VMworld, although most of those lists refer to the US edition a lot of the sessions will also be presented in Europe. I suggest you look at some of those "must see" lists, mine is at the bottom of this post and make a selection out of the "must see" lists and the complete VMworld catalog taking in account your interests / work field.
After I would make a selection of session the you want to see live (sessions like "ask the experts" or "The vExpert Storage Game Show" you have see live because of the interactive nature of these sessions) and which you could optionally see later online.
When you make this selection it will free up time you can use to attend to the other activities as mentioned above.

My selection of "must see" sessions:

NET1743 - VMware NSX - A Technical Deep Dive

INF1601 - Taking Reporting and Command Line Automation to the Next Level with PowerCLI

NET1592 - Under the Hood: Network Virtualization with OpenStack Neutron and VMware NSX

NET2745 - vSphere Distributed Switch: Technical Deep Dive

STO2197 - Storage DRS: Deep Dive and Best Practices

STO2554-SPO - Zooming In: How VMware Virtual Volumes (vVols) Will Provide Shared Storage with X-ray Vision

SDDC2095 - Overview of EVO: RAIL: The Radically New Hyper-Converged Infrastructure Appliance 100% Powered by VMware

TEX1991 - vCenter Orchestrator - What's Next?

SEC2238 - Security and Microsegmentation for the Software Defined Data Center

TEX1492 - IO Filters: Adding Data Services to ESXi

NET1468 - A Tale of Two Perspectives: IT Operations with VMware NSX

INF1864-SPO - Software Defined Storage - What’s Next?

BCO2629 - Site Recovery Manager and vSphere Replication: What’s New Technical Deep Dive

STO2997-SPO - The vExpert Storage Game Show EMEA

STO1965 - Virtual Volumes Technical Deep Dive

INF2311 - vCenter Server Architecture and Deployment Deep Dive

SDDC1337 - Technical Deep Dive on EVO: RAIL, the new VMware Hyper-Converged Infrastructure Appliance

STO2480 - Software Defined Storage - The VCDX Way Part II : The Empire Strikes Back

INF2427 - DRS : Advanced Concepts, Best Practices and Future Directions

SDDC2370 - Introduction to OpenStack for VMware Administrators

STO3098 - Virtual SAN Best Practices for Monitoring and Troubleshooting

I hope you will enjoy this year's VMworld Europe edition, I know I will for sure !!

20 August, 2014

Network micro-segmentation using VMware NSX

Last week VMware hosted a vExpert only Webex event. The topic was VMware's network virtualization NSX with a focus on (micro)-segmentation. The main presenter was Scott Lowe, he explained what management and security benefits micro-segmentation in can bring in general. Especially how you could keep a secure micro-segmented network environment manageable by leveraging NSX. With traditional firewall and routing solutions, network isolation or segmentation on a per business unit, per vApp or even a per VM granularity quickly becomes a painstaking management task which is on top of that very prone to error. By using NSX and deploying multiple virtual firewalls and routers (as much as the environment needs to meet the customer and/or security demands) which can al be managed from one central interface, takes away the "pain" in managing this environment. And in addition most firewall / segmentation configuration is policy based, by defining the needed policy's specific to the customers demands and applying them where needed (business unit, vApp or VM) it also mitigates the error factor enormously.

Not having any hands-on experience with NSX is a customers production environment, this presentation gave me a clear view on how to apply NSX in a environment where micro-segmentation is needed whether the business requirement is security related or otherwise related.

So if you are attending VMworld (US or Europe) make sure have some sessions around NSX in your schedule. To hear for yourself about all the awesomeness NSX is bringing !!
No Limits is the theme for 2014 VMworld, this certainly goes for network virtualization with VMware NSX.

25 July, 2014

#FeedForward

During the last NLVMUG event in March, there was a keynote done by Mike Laverick. He put in words what most VMUG members where thinking, a VMUG event should be all about sharing VMware knowledge and above all sharing real life experiences with VMware products (and all other products related for that matter).
Of course we all enjoy hearing about the latest and greatest new products, but the community sessions is what it is all about.
After Mike shared what he thought that was causing that only a few VMUG members step up to the plate and present during a VMUG meeting. All the obvious reasons every one can think of, especially if the thought of presenting yourself did come to mind ! What would really help is to have someone review or mentor you before you are up on the stage doing the actual presentation, after a presentation there is always some feedback but in most case your presentation is a one-time deal. So getting feedback after your presentation is not really useful. You need this feedback in advance before you do the actual presentation, #FeedForward sort to speak !
This is what #FeedForward is all about. There is much more value into getting Feedback in advance hence #FeedForward. This will make sure your presentations and presentation skills will improve and that you will enjoy presenting your own story.

After the NLVMUG keynote Mike "launched" a hashtag #FeedForward so it was easy for people to find each other by the use of Twitter.
Also global VMUG has created an new landing page in favor of #FeedForward

http://www.vmug.com/feedforward

On this page you can express the intention to either:

Becoming a Mentor and reviewing presentations for VMUG events
Sharing a presentation at a VMUG event
Serving on a Committee to promote FeedForward

If you need help or want to help mentor your fellow VMUG’ers present your self, please sign up on the website.

11 July, 2014

vExpert 2014

Yesterday late in the evening I received a email from Corey Romero of VMware, the subject of the email "Welcome to the 2014 vExpert Program!". That was a great way to finish the day !

I’m thankful to be awarded the vExpert award in 2014. The VMware vExpert program acknowledges the people within the community that have contributed into evangelising virtualization as whole. It feels great to be part of that group and in my day to day role as Consultant as wel as my role as blogger I will hopefully be able continue to contribute to the community.

For a complete list of vExperts please checkout the following link:
https://communities.vmware.com/vexpert.jspa

If you think you are vExpert material (or if you know someone that you would recommend) then please see the following link:
https://blogs.vmware.com/vmtn/2014/04/vexpert-2014-q2-applications-open.html

18 June, 2014

Improve vSphere Webclient performance

Several other blogs have posted ways to improve the speed and responsiveness of the vSphere webclient. I was browsing through the different blogs post in addition to the official VMware installation and migration documentation and Knowledge Base articles, while writing a Migration Plan.
This Migration Plan must guide a customer through the replacement of vCenter and upgrade of ESXi hosts, both need to get to vSphere 5.5.
Grabbing the different adjustments to improve the speed and responsiveness of the vSphere webclient, was not that difficult but it became clear that there is one single place that has the complete list (at least I didn't find it).
So I thought to write a post with all changes that improve the "look and feel" of the vSphere webclient that I know of.
Let's get started, changes to the JVM settings of various vCenter components are the ones that make the biggest improvement.

VirtualCenter Management WebServices
Configuration file location:
installation_directory\VMware\Infrastructure\tomcat\conf\wrapper.conf
Heap size parameter:
wrapper.java.additional.9="-Xmxheap_3072M"

vCenter Inventory Service
Configuration file location:
installation_directory\VMware\Infrastructure\Inventory Service\conf\wrapper.conf
Heap size parameter:
# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=12288

vSphere Profile-Driven Storage
Configuration file location:
installation_directory\VMware\Infrastructure\Profile-Driven Storage\conf\wrapper.conf
Heap size parameter:
# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=2048

vSphere Web Client
Configuration file location:
installation_directory\Program Files\VMware\Infrastructure\vSphereWebClient\server\bin\service\conf\wrapper.conf
Heap size parameter:
########
# JVM Memory
########
wrapper.java.maxmemory=3072

Some other changes to the vSphere Web Client, which improve the usage of the webclient by changing for instance the page timeout and disabling the animations within the webclient.

The list of things I would change you find below, but please keep in mind that the settings may differ for your or your customers' environment, Please adjust accordingly !

vSphere Web Client

Configuration file location:

%ALLUSERSPROFILE%\VMware\vSphere Web Client\webclient.properties

session.timeout = 0

navigator.disableAnimation = true

refresh.rate = 600

feature.facetedSearch.enabled = true

If you have read this post and find there is a update of modification missing please leave a comment and I will review it and update the post.

29 April, 2014

Removing orphaned replica VM's

I have been working on a SAN replacement project with a customer, for me this also meant moving all VM workloads from the old SAN to the new SAN. Usually I work on VMware vSphere environments (Data Center Virtualisation) but this project also involved moving their VMware View 5 environment and View workloads.
With this View environment not using linked clones, the storage migration was pretty straight forward even for a DCV guy like me. At least until I came across VM's called Replica-GUID with which I could not do much as these VM's no longer existed within the View environment (assuming that they were leftovers from a linked clone experiment), but where still registered with vCenter.
So I thought, this should be easy just right click and "Remove from inventory" or "Delete from disk", but both of these "solutions" were grayed-out.
When I searched the VMware KB I came across a very detailed article on how to manually remove replica virtual machines KB1008704
Before I could start removing the replicas I wanted to double check if the VM's were really obsolete. The View admin thought / assumed that they were leftovers from a experiment or on-site training, as we all know assumptions are the mother of all #$&@.
Given the fact that they were powered off I figured why not check for how long they have been in this state. By using the datastore browser to locate the files corresponding with the Replica VM's and checking the "Modified" column I found these Replica VM's have not been altered / powered-on for over a year. This is what I was expecting and aligns with the thought of the View admin.
I started manually deleting the Replica VM's by following the steps outlined in KB 1008704, although this is pretty straightforward please do note that the used command SviConfig is case-sensitive.

At first I had absolutely not luck with removing the Replica VM's it looked like there was a permission / rights issue some where. But I used a user account with full Administrator access to vCenter, used the correct SQL user and pasword to access the View database and still no luck.
As it turns out, the vCenter user account you use, does not only need full Administrator privileges on the vCenter but it also needs Administrator privileges on the View environment (Composer, broker)

When searching for a solution to SviConfig not working at first, I came a cross a blog post of Terence Luk he explains in good detail how to use the SviConfig command. He even provides some extra info on top of the VMware KB article.

Change the name of a ESXi host

Recently I needed to rename a considerable amount of ESXi 5.x hosts, VMware has published a KB article KB1010821 that describes the various way's of doing this very well. But for me there are two things that are missing in this information. First of it being that there is nothing written specifically regarding the consequences of a host renaming action on the distributed vSwitche(s) the hosts' Physical Adapters where used as dvUplinks.
When I followed the manual steps of the KB to test run the procedure I got error's regarding the adapters on the dvUplinks when removed the host from vCenter (after I first put the host in maintenance mode and disconnected it).
These error's came back after the renaming was done and I added the host again to vCenter. The host added successfully to the HA-Cluster it was previously part of, but failed to reconnect it's management, NFS and VM network through the distributed vSwitches. I had to manually run the "add host" procedure to add the physical adapters to the correct distributed vSwitches, the physical adapters used for vmkernel ports where pre-selected the adapters used for VM network I had to manually select the physical adapters which I wanted to use. With the additional steps the procedure was successful.
The second thing that not mentioned, although understandable is that you lose all historic (performance, event, task, etc.) data from the renamed host because of the "remove from vCenter" step mentioned in the KB.

When I read a blog post by Reuben Stump on the Virtuin blog called "Rename ESXi Hosts in vCenter (Without Losing Historical Data)" which describes a way of renaming a host without removing it from vCenter which let's you keep the historic data. I started thinking it could also work for the distributed vSwitch issue of having to re-add the uplinks. The way Reuben describes it is by the use of a Perl script. Running a Perl script can be done in different ways, one I like is to have VMware vCLI ( VMware vSphere Command-Line Interface) installed on a Windows computer. Especially if you have it installed on the same computer as you have PowerCLI installed, because you can then easily use PowerCLI scripting to invoke a Perl script. Please take a look at the blog post of Robert van den Nieuwendijk, How to run VMware vSphere CLI perl scripts from PowerCLI on a PowerCLI function he has written to do this.
With the host no longer being disconnected during the renaming proces you besides keeping historical data, do not have to re-add the uplinks to the distributes vSwitches.

Of course you will need to take care that the DNS records are also updated so they reflect the new host name. vCenter will try to resolve the DNS name upon adding it to the inventory, so make sure that the DNS records on the vCenter server are refreshed / updated before you run the script.

If you don't want to resort to using Perl, please have a look at blog post Rename an ESXi 5.x Host of Luc Dekens As always he has a PowerCLI solution to almost everything, although this script does remove and re-add the host from vCenter.

28 April, 2014

vMotion fails on MAC-address of virtual NIC

During one of my recent projects (replacing ESXi hosts, from rack servers to blades) there was also a second project ongoing that touched the VMware environment. The current EMC SAN solution was being replaced by a new EMC SAN solution comprised of VPLEX and VMAX components.
One of the inevitable tasks involved is moving VM's and Templates to datastores that reside on the new SAN. After all VM's of a particular datacenter were moved successfully, it was time to move the templates.
As templates cannot be moved by the use of Storage vMotion, the customer first converted them to normal VM's. In this way they could leverage the ease of migrating them by Storage vMotion. Well so much for the idea, about 80% of the former template VM's failed the storage migration task. They failed at 99% with a "invalid configuration for device 12" error.
When I looked at this issue at first I had no idea what could be the cause of this issue, although it looked like it had something to do with the VM virtual hardware. I took a look at the former template VM's that did go through a successful storage migration and compared the virtual hardware to the ones that failed. There was no difference between the two. The only thing different was the OS used, this was also pointed out by the customer. Now the difference in OS is not what is important, but the point in time the template was created is!.
It stood out that the former template VM's with the older OS's where failing, so I asked to customer if he knew when these templates were created on more importantly on which version of vSphere.
As you might know the MAC-address of virtual NIC's has a relation to the vCenter which is managing the virtual environment, I don't know the exact details but there is a relation. And I remembered reading a old blog post about a invalid configuration for virtual hardware device 12, this post related device 12 to the Virtual NIC of the VM. The templates where originally created on a vSphere 4.1 environment of which the vCenter was decommissioned instead of upgraded along with the rest of the environment. When you put this information (or assumptions) together it could very well be that the MAC-address of the virtual NIC was not in a "good" relation with the current vCenter and that this resulted in failing Storage vMotion tasks. I know it was a bit far fetched, but still I gave it a go and removed the current vNIC from one of the failed VM's and added a new vNIC. I checked and the replacement changed the MAC-address of the vNIC.
After the replacement I re-tried to Storage vMotion and this time it succeeded!. Did the same replacement action of the remaining failed VM's and they all could now successfully be migrated to the new datastores.
So for some reason when doing a Storage vMotion vCenter needs a VM to have a "compatible" MAC-address to succeed.
In short if you ever run into a error "invalid configuration for device 12" when trying to perform a Storage vMotion, check if the MAC-address of this VM "aligns" with the MAC-adresses of VM's that can be Storage vMotioned.
If they don't, replacing the virtual NIC might solve your issue.

04 April, 2014

vSphere sees datastore as snapshot datastore

Last week a colleague contacted me to get my thoughts on a issue he was facing at a customer with a small VMware vSphere 5.5 environment.
Apparently the customer had faced a power outage for a longer period then their UPS could cope with, the result was 2 hosts and a HP P2000 iSCSI SAN were powered off without a clean shutdown.
When the power was restored this resulted in 1 RAID set being in degraded mode and the other RAID set being OK, while the RAID set was recovering just fine there were some issues on the vSphere environment.
After the hosts booted and vCenter was started, it was possible to connect with the webclient to vCenter. There it looked like the first datastore was OK and the second was not OK, all VM's on the first datastore booted without any issues. But because the second datastore presented itself as a VMFS volume on a snapshot LUN, the VM's that resided on this datastore couldn't be powered on. The real second datastore was not visible at all from vCenter.
I came up short on ideas during the phone call, so my colleague resorted to VMware support (GSS) and they came up with a rather quick solution to this issue. I thought I would share this.
The first thing that was done, was to rename the snapshot datastore. Next they added storage and selected the existing LUN with the re-signature option. After this completed, the only thing left was to re-register the VM's to vCenter that resided on the second datastore.
For me the solution that GSS came up with was a good one, it solved the issue quickly without too much efford.

Adding new datastores to a existing vSphere environment

Today I got asked by a VMware admin at a customer how he could prevent or maybe schedule storage rescans.
He asked me this because he was adding 25 new datastores to 12 ESXi 5.1 hosts in a existing cluster and every time he added a datastore a rescan of the HBA adapters is automatically initated. As the cluster already was under a pretty heavy workload, the "rescan storm" started by his actions were having a impact on the performance on most of the VM's running in the cluster.
As far as I know it is not possible to schedule storage rescans, I don't see any added value to such a feature anyway.
But what is possible is disabling automatic host rescan of HBA adapters, this is done on a vCenter level with a advanced setting "config.vpxd.filter.hostRescanFilter" together with the value "False"

VMware has a KB article about this, so if you want to have a reference or want to know how to make this advanced setting from the webclient please have a look at KB1016873
One very important thing not to forget, change the value of the advanced setting to "True" as soon as you have finished adding the datastores !

01 April, 2014

VM stops at POST screen

Recently at a customer I was asked to have a look at 2 VM's that supposedly did not boot well after receiving and installing Microsoft patches. These VM's had been running just fine up until the mandatory reboot after patching.
They had strange boot behaviour, usually you would expect that the boot would halt or go wrong when loading the Windows OS. But these 2 VM's wouldn't even go beyond the BIOS POST screen.

For troubleshooting purposes I created a new diskless VM and attached the system disk of one of the failing VM's to it, this combination resulted in a successful boot. So the boot issue was not related to the recently installed Microsoft patches, it had to be something within the VM configuration.
When looking more closely at the configuration of the 2 VM's I found both of them had RDM's.

I checked if the RDM's had a active path(s) to the LUN and it turned out about half didn't.
Once I removed the RDM's with the dead path(s), I powered on the VM again and it successfully booted the OS.
I never thought a dead RDM path would prevent a VM from getting through it's BIOS POST screen, I've checked if there was a VMware KB article around this VM behaviour but came up with only one blog that had info about this issue, Enterprise IT blog He also has a some good pointers and checks to verify if there is no other cause to the issue, so do check out the article

27 February, 2014

VMkernel port challenge on a Distributed Switch

Recently I was working on a project which consisted of adding new hosts to a existing vSphere 5.1 environment. As the form-factor and specs where far from what the customer already had in-place, the new hosts were put in new HA-clusters. Because the new hosts have 4 10GbE NIC's, the virtual networking design had to be re-designed.
I designed one Distributed Switch for all traffic and for all clusters, regardless of the purpose of the clusters. Security wise this was agreed on by the customer, they only needed logical network separation in the form of VLAN's. This would keep the design and also the physical network configuration fairly simple.
One of the customers' requirements was that the pace they could vMotion should be a lot higher then on their current hosts, having a large amount of hosts they wanted hosts to be able to go into Maintenance mode quickly. They also wanted to speed up the patching and updating of the hosts by Update Manager.
So within the Distributed Switch design I added the use of multi NIC vMotion, I got a lot of information from the blogs of Frank Denneman and Duncan Epping especially these two blog posts How to setup Multi-NIC vMotion on a distributed vSwitch and Multiple-NIC vMotion in vSphere 5…. Of course NIOC (Network I/O Control) is also used to control / guarantee required bandwidth to the various sorts of network traffic.
Because the new hosts will use a newly created VLAN for vMotion, but the current workload needs to be moved from the current hosts to the new hosts (by the use of vMotion to prevent VM down-time) there is a challenge, vMotion traffic does not route! A simple solution to this is temporarily using a extra VMkernel interface for vMotion traffic which is in the VLAN that is also used on the current hosts for vMotion and removing this after the workload is completely moved.
All of the multi NIC vMotion VMkernel interfaces were created by the use of a PowerCLI script, which you can find in a this previous post.
The temporarily created vMotion VMkernel interfaces on the other hand were done manually and on one host something went wrong, not sure what but it looks like a duplicate IP was used for the interface. So to correct I first removed the VMkernel interface, so it would not interfere the environment anymore. After I saw that there were more IP related issues on this host and at some point the host lost its connection to vCenter, I used both a direct vSphere client to host connnect as well as the DCUI to straighten things out and get at least the VMkernel for management back up and running with the correct IP. Now keep in mind this is still all on the Distributed Switch, so when the hosts successfully re-connected to vCenter it had a error the Switch configuration was not in sync with vCenter. This synchronization takes place every five minutes usually, after some time passed the error went away and all was good.
So I could start to re-add the VMkernel interfaces for multi NIC vmotion, I ran my script and this resulted in error's. I tried to add the VMkernel interfaces manually, but this also resulted in error.

It looked like I would not be able to sort this out from the vSphere client and a the quick fix from VMware I found on the internet was to restart the vCenter service, not a possibility at the time for me. So I resorted to the CLI and esxcli.

I connected to the host by ssh and listed all its VMkernel IP interfaces with "esxcli network ip interface list" which resulted in the following output:

vmk0

Name: vmk0

MAC Address: 00:50:56:6a:21:2e

Enabled: true

Portset: DvsPortset-0

Portgroup: N/A

VDS Name: dvSwitch_PROD_02

VDS UUID: 78 94 06 50 7b 69 04 20-9e 3c 07 0f e2 0f cf f5

VDS Port: 9059

VDS Connection: 767794954

MTU: 1500

TSO MSS: 65535

Port ID: 50331658

vmk1

Name: vmk1

MAC Address: 00:50:56:61:57:86

Enabled: false

Portset: DvsPortset-0

Portgroup: N/A

VDS Name: dvSwitch_PROD_02

VDS UUID: 78 94 06 50 7b 69 04 20-9e 3c 07 0f e2 0f cf f5

VDS Port: 9436

VDS Connection: 415422000

MTU: 0

TSO MSS: 0

Port ID: 0

So there indeed are not one but two VMkernel interfaces, only one shows up in the vSphere client. If you look at the output you see that vmk1 has Enabled:false this is probably why its not visible in the vSphere client.

With esxcli network ip interface remove --interface-name=vmk1 it should be possible to remove this hidden VMkernel interface. After running the command I checked the VMkernel IP interfaces of the host again:

vmk0

Name: vmk0

MAC Address: 00:50:56:6a:21:2e

Enabled: true

Portset: DvsPortset-0

Portgroup: N/A

VDS Name: dvSwitch_PROD_02

VDS UUID: 78 94 06 50 7b 69 04 20-9e 3c 07 0f e2 0f cf f5

VDS Port: 9059

VDS Connection: 767794954

MTU: 1500

TSO MSS: 65535

Port ID: 50331658

~ #

Problem looks solved, after I was able to re-add the multi NIC vMotion VMkernel interfaces again witout any problem. In the end all it took were a couple of esxcli commands.

26 February, 2014

My take on how to deploy ESXi hosts quicker with PowerCLI

When building a new or expanding a existing vSphere environment, there a lot of helpful features and tools that can help you speed up the deployment and configuration of new hosts.
For instance VMware Auto Deploy and Host Profiles can be very useful, also they will help you to setup a Cluster or Datacenter with consistent configured hosts.
Nevertheless there are several situations when you either can't or don't want to use Auto Deploy and/or Host Profiles, please note that I believe that Host Profiles is a very powerful feature which if possible (License) always should be used to assure that your run a environment with consistent configured hosts.
For one something that you need to do / have before you are able to access shared storage, in case you use FC shared storage is provide FC adapter WWN's to your storage team in order for them to setup the LUN zoning correctly.
You can get the information you need by clicking through the vSphere (web) client and copy/paste the info of all your hosts' FC adapters you are going to use. But if you have 10+ hosts it's becoming tedious, time consuming and not to forget prone to mistakes. Why not use PowerCLI ? This will make this job a lot easier and quicker, below you find a script which will get the WWN's per host and also provide vendor/type info (in case you have multiple types of FC adapters presented within the host as with most CNA / FCoE cards).

With most of the newer server hardware there is either a internal usb or flash (SD) slot to be used with media to boot a OS off. When running ESXi of such media it is advised to move the scratch location to persistent storage to have crash consistent location for ESXi to store it's log files. As referred to in VMware KB1033696, the KB also explains in great detail all way's to setup a persistent scratch location.
The script below will help you to automate these steps by the use of PowerCLI.

Two other settings that are frequently set when deploying new ESXi hosts are NTP and Syslog, when using host profiles these settings are done when applying the host profile. With the script below you are able to set it on multiple hosts with ease without the use of host profiles.

I hope the scripts provided will help you to do quicker and more consistent ESXi deployments.

12 February, 2014

Orphaned VM after failed migration

When a VM is orphaned, usually I check on which datastore, cluster and folder the VM is placed. Then I remove the VM from the inventory and add the VM again through the datastore browser (or by using PowerCLI, check at the bottom of this post).
Lately I am seeing orphaned VM's at a customer that cannot be (re-) added because the .vmx has a lock on it by a other process / host.
vCenter "thinks" the VM is still active on this host, when trying to go in to maintenance it will stop at 80% when you send a shutdown command to the host on the DCUI it will notify you that there are still VM's active on it. And from the VM' standpoint it is still active, the VM will still be running and accessible by RDP for instance !
But when you look at esxtop on that host and look at all the VM process still running, there will be no VM process active.
A quick and somewhat dirty way is to get the host to go in to maintenance mode, most of the time this will stall at eiher 65% or 80% at this timt you will see the only remaining vm on this host is the orphaned VM, login to the DCUI (or use Powercli or CLI) to send a reboot command to the host. The host will tell you that you still have active VMs on it, ignore this and have the host reboot anyway. The VM will be forcefully powered off.
After the reboot, you will find the host in maintenance mode with 1 powered-off VM present, this was the previously orphaned VM.
Take the host out of maintenance mode and power-on the VM, it will probably boot normally and if used DRS will move some VM's to the host to balance out the HA-cluster the host is in.
The method described above is as mentioned before a "quick and dirty" way and should be a last resort option in my opinion as it results in outage eg. downtime for the VM in question.
So before you go with this method make sure that there is other way to resolve the issue, for instance if there is no lock on the .vmx you could easily re-register it to vCenter without any reboots needed. If you need to do this for a larger number of VM's than have a look at one of my previous posts "VM's grayed out (Status Unknown) after a APD (All Paths Down) event on NFS datastores".

28 January, 2014

NLVMUG conference 2014 Agenda

VMUG meetings are always interesting and I think that the NLVMUG are getting better sessions since they joined the global VMUG.
Please check out the details in the link below !

If you are a VMUG member you can register here for this meeting. You can also find the location details on this page. The complete agenda for this meeting you can find here

For those who are attending this NLVMUG conference, see you March 6th !