18 December, 2014

Microsoft Remote Desktop Services issue

This is clearly not a usual virtualisation related blog post. But every now and then I'm asked by either a customer or colleague to have a look at a non virtualisation issue they run into, mostly because the issue is occurring on a virtualised server.
In this particular case for some unknown reason, overnight all RemoteApp programs serviced from their "Terminal Services" server suddenly stopped working.
When users started a RemoteApp they where prompted for their username and password, after entering these nothing happened. Looking in the Application event log two events were logged for each RemoteApp start attempt.

When looking at the first information event (ID 9003) it looks like the Windows Desktop Experience feature is not installed or the service is not started.

The second event (ID 9009) is a logical result from the first event, the Window Desktop Manager exits.
When searching for the cause of event ID 9003 all I found was that indeed the Windows Desktop Manager Experience probably was not installed or the RemoteApp was being started within a Remote Desktop Session.
When looking at the installed Windows features it turned that the feature was installed, I also verified that the service belonging to this were also started and no Remote Desktop sessions were used.
The next obvious thing was to recreate one of the RemoteApp programs, to see if this would solve this issue. Unfortunately this did not solve it.
Then I started thinking about the description of event ID 9003 "The Desktop Window Manager was unable to start because a composited theme is not in use" and the fact all Internet resources linked this to either the Window Desktop Feature or a not supported desktop theme on the client.
For this customer there were no changes to the desktops, although a small change due to a Windows Update / Patch could be possible considering the RemoteApp Programs stopped working at the same time on all workstations.
I started searching on the server part where I could define the settings for the Desktop Experience or for the allowed desktop theme's.
I started at the configuration of the Remote Desktop Session Host server, I opened the properties.
And going through the tabs I ended up at the Client Settings tab, were you find the Color Depth limit setting. This setting was set to 32-bit per pixel, I removed this limit by deselecting it.
As a result all RemoteApp Programs started working again on all workstations. About the root cause I still have no clue, for now I will keep it at a patch or update on the "Terminal Services" server.

28 November, 2014

Virtual Ethernet card "Virtual Ethernet Adapter" is of type which is not supported

Last week I was asked if I could solve a issue at a customer, they where patching all of their ESXi hosts. Only they ran into a host which refused to go in to maintenance mode, the reason for this was a VM which didn't vMotion to a other host successfully.
Basically, the vMotion task timed-out. When they investigated the error details a more specific error description.
It looked like there was something wrong with the network adapter installed in this specific VM. But when the configuration of this VM was checked it turned out that it had the customers default network adapter installed, VMXNET3.
Having a second look at the VM configuration, the Guest OS was set at Windows 2003 standard for some odd reason. After all this VM was running Windows 2008 R2 Enterprise.
After shutting down the VM and correcting the Guest OS, the VM was powered-on again. Now the VM could be vMotioned without any issue.
Already knowing that it is important to select the proper Guest OS to have VMtools install correctly and to have the right default virtual hardware presented to the VMs OS. There is a extra reason to pay extra attention to choosing the right Guest OS for a VM.
If you want to read more about the impact of a mismatched Guest OS selection, please read this blog post of Frank Denneman.
In this blog post you can also find a PowerCLI "one-liner" by Alan Renouf to quickly scan your environment for mismatched Guest OS.

05 November, 2014

A personal introduction to Diablo Technologies @VMworld

A couple of weeks before VMworld Europe, I got introduced to Diablo Technologies. Before that I had heard of the company but did not really know what they did.
I got the opportunity to have a meeting with Kevin Wagner, VP Marketing during VMworld to talk about Diablo Technologies, their current solutions and yet to be released solutions.
After I got introduced, I wanted to find out what Diablo Technologies is all about and what their relation is to virtualization. Looking into their website I found out they have a technology called Memory Channel Storage, which sounds great but I could not really figure out how this technology is actually used. I found it was Flash storage that sits in a DRAM slot, but it was not clear to me how you could leverage this Flash storage.
This changed quickly after meeting with Kevin, I was expecting a marketing talk which hopefully also would bring some technical details along. This was not the case at all, of course there was a fair amount of marketing info. After all I did not know the company nor their technologies really well, the meeting was for the largest part technical.
So what's Memory Channel Storage all about then, the technology is based on a combination of hardware and software (driver). The hardware, this looks somewhat like a DDR3 DRAM module with a additional heatsink attached to it.

Only this DRAM module there is no DDR3 RAM on it, but their is 19 nm MLC NAND (Flash) memory on it accompanied by some controllers, NAND and controllers are provided by Sandisk. These controllers make it possible to present this Flash memory as block-based storage within the servers OS, additionally a driver within the OS is needed. Currently drivers are available for Microsoft Windows, VMware ESXi and a variety of Linux distributions.
So two questions popped to mind, how much Flash is on such a module ? And Flash memory in a DRAM slot presented as block-based storage, how does the server (mainboard) recognise this correctly ?
The MCS modules currently are available in 2 sizes, 200 GB and 400 GB. There are ideas to also add a 800 GB version to this. So 200 GB and 400 GB Flash modules are adequate sized for using as a cashing solution, would it also be possible to use it as a real storage solution like you can do with SSDs ? More on this later, because the other question was how is it presented as block-based storage.
Well to be able to do this Diablo Technologies has incorporated controllers on the module which act as a disk controller, also a driver within the servers OS is needed to control the way the module(s) work. But you are still left with some module plugged into a DRAM slot on the mainboard, for this to be detected as MCS module instead of a DRAM module a minor change is needed to the mainboards UEFI (Unified Extensible Firmware Interface) which is the newer version of the main mainboards BIOS.
My two questions were answered, but a now other questions came up.
A change to the UEFI/BIOS of a server means you can not buy MCS modules and start plugging them in to your existing server hardware ! So what hardware do I need to buy, which OEM's does Diablo Technology work with ? And how do I fit these MCS modules together with DRAM modules in a server ?
The biggest OEM partners of Diablo Technology are IBM and SuperMicro, there are others and the OEM base is growing. It would be great to see the possibility for MCS in HP, Dell or even Fujitsu compute hardware if you ask me!
MCS module placement, as previously mentioned the modules are plugged into DDR3 DRAM slots and the way you can place them is exactly like you can place DRAM modules. This means a DRAM slot can either be occupied by a DRAM module or a MCS module, in other words you will need to have a server mainboard with enough DRAM slots to accommodate both types of modules.
There are a couple of things you need to take in account, so why would you choose a MCS solution over a SSD disk or PCIe SSD solution ? Simple, the memory bus is very close to the CPU and therefore performs with a higher speed / lower latency then on a PCIe or SATA / SAS bus connection. On top of that each MCS module uses only one memory channel, just like a DRAM module does. Because of this it can make use of parallelism, so instead of having 800 GB of flash available when you plug-in two 400 GB MCS modules you only have 400 GB of flash with both of the modules communicating through separate memory channels, guarantying a very low response times even when servicing heavy I/O loads.
 So SSD's result in low latencies when used in servers, we all know this. This is in the <1ms ranges in the most optimal situations. Latencies when using MCS technology are in the 4μs to 5μs range, but these numbers don't tell the complete story. The Diablo Technologies MCS is best at keeping ultra low latencies even with high throughput.
When looking at VM latencies running on a vSphere hypervisor it's not only the use of the memory bus that get's the very low latencies, when you look at how a traditional storage stack is buildup compared to a MCS storage stack you can easily see that the I/O path through the MCS storage stack is much shorter and passing less components. This reduction also reduces the total latency.



Now if we take a new storage technology based on local storage, VMware's VSAN. This uses SSDs for caching and spinning disks for persistent storage, so with MCS being presented as block-based storage you could use MCS for caching. This possibly could improve the already impressive VSAN performance.
But if we look a bit further down the road, VMware has announced it will probably support full flash VSAN configurations in the "2.0" release. Just imagine the performance you could get when you would use MCS in a full flash VSAN.
Also further down the road for Diablo Technologies, they have Carbon2 just around the corner this is their MCS solution based on DDR4 memory channel interface, which will bring MCS to the newest server architectures.

I must say I was really impressed with the technology and would recommend you to have a look at Diablo Technologies  if you are searching for a high throughput / ultra low latency storage solution. Diablo TTechnologies offers a possibility to "Test Drive" MCS, they will set you up with a time slot to try out MCS running in their own servers based at their HQ in Canada.
Both IBM and Sandisk have their own name for the MCS technology, at IBM it's called eXFlash DIMM and Sandisk runs with ULLtra DIMM.


27 October, 2014

21st VMUGBE+ Meeting

Over the last couple of years I have been attending VMUG meetings, they are all interesting although some have more interesting sessions than others. I always found the Belgium VMUG to have very good sessions.
The upcoming meeting promises also to be a good one, especially if you are into EVO:RAIL, VSAN and/or NSX and missed VMworld. Please check out the details below !
For those who are attending this VMUG meeting, see you November 21!

VMUG BE Agenda:




If you are a VMUG member you can register here for this free meeting. You can also find the location details on that page.

21 October, 2014

vSphere Distributed Switch health check gotcha !

Last week at a customer I was asked to have a look at a issue that they recently been having regarding the Distributed Switch health check feature.
This customer uses this feature to regularly check if all physical network connections are providing all the VLAN's configured on the Distributed Switches in their vSphere environment. About two weeks ago the health check suddenly started notifying errors on all hosts connected to a specific Distributed Switch. The error was a missing VLAN and a MTU size mismatch, the VLAN missing on the physical switch ports which connected to the physical NIC's of the hosts.
When I looked at the details of the error, I saw the missing VLAN was VLAN 0. This got me interested as usually you don't use VLAN 0 for standard network traffic and therefore you don't see it often in vSphere on a Distributed Switch which handles Virtual Machine network traffic.
When I checked the Distributed Switch configuration, more specific the dvPortgroups configured on it I found a dvPortGroup named "name_test" there where no VM's using it and when I looked at it's configuration I found it had the VLAN type set to "None".
After asking I learned that this dvPortGroup was created as a test during troubleshooting an other issue, after troubleshooting it was left in place. So there was no reason for leaving it there, I removed this dvPortGroup and after a refresh I checked the health status of this Distributed Switch.
To no surprise the health check showed up without any error's this time, so not only the missing VLAN error got solved but also the MTU size mismatch got solved in this.
So my take away on this is, when you as a vSphere Admin use the Distributed Switch health check feature to keep tabs on the status of your virtual to physical network please keep a tight procedure when it comes to changes on the Distributed Switches! And if you have to add a dvPortGroup for testing purposes please do not add it with the VLAN type set to "None", but add it with a VLAN that is available on your physical network interfaces.

24 September, 2014

vCenter Orchestrator loses VM networks after host renaming

Last week I was asked to have a look at a VCO workflow issue. There was a issue with workflows used to deploy VM's, the workflows would fail on specific hosts. One of the customers' VMware administrators found that the workflow stopped at the point where a VM had to be linked to a specific VM port group.
This happened with any selected VM port group (VLAN) available within the workflow, these workflows have a automatic host selection a manual selection can also be made. After running workflows with manual host selection some hosts were found which completed the workflow successfully.
When verifying what the difference was between the hosts, it became clear that the hosts that failed the workflow where recently renamed.
The customer uses a dVswitch for all network traffic across all hosts within the HA-clusters. During the renaming you have to disconnect the ESXi host from vCenter and re-connect after the renaming,  a PowerCLI script was used to automate the renaming process, a similar script can be found here.
During the renaming there had been a issue with the hosts upon reconnecting to vCenter. After renaming hosts reconnected with a dvSwitch error message, to get rid of this error you manually re-add the host to the dvSwitch. After all the hosts network looked OK, nevertheless this was a good reason to take a better look at the network configuration of those renamed hosts.
One detail which stood out, was the color of the dvUplink interfaces. When all is fine they are coloured green, but when for instance the Physical NIC used by the Uplink is disconnected the color turns to white as shown in the picture below for dvUplink2.


 Now with the renamed hosts it was not one dvUplink, but all 4 dvUplinks where coloured white. Strangely enough the VM's hosted on these hosts had a fully functional network connection, so as expected none of the physical NIC's was disconnected.
One of the VMware administrators tried to get all dvUplinks "green"  again by simply removing and re-adding the vmmic from the dvUplink, this seemed to work all dvUplinks came back "green" again. Unfortunately the Orchestrator workflow persisted, after the actions above and none of the VMware administrators (me included) had any idea's on how to solve this issue so a support case was opened with GSS.
After the usual "please upload logfiles" steps, the problem was quickly solved during a Webex session. The solution was to force update the dvSwitch configuration across all hosts connected to this dvSwitch.
So how to you push the configuration or how do you forcefully update the configuration on ESXi hosts, simple just add a temporary new dvPortgroup to the dvSwitch. By adding a dvPortgroup all connected ESXi hosts get a updated dvSwitch configuration.
This solved the Orchestrator workflow issues finally, I can imagine that this updating of the dvSwitch configuration could also be of help in other dvSwitch "out of sync" kind of issues.
I will be trying next time I run into such a issue.

10 September, 2014

Exciting VMware revelations which make VMworld Europe a valuable event

Why would you go to VMworld Europa when all new product releases and most other revelations already where shown at VMworld US ?
VMworld US is the bigger event, more days, more sessions and a bigger venue.
The answer is simple if you ask me, when a new product (version) is introduced there is not always all technical info available on this product. For instance at VMworld US EVO:rail was introduced and at VMworld Europe there are at least three sessions scheduled, with one of them being a technical deep dive (the kind of session you want to attend to right?).

In short; EVO:rail is a management layer on top of the usual vSphere tools, wizards are what make this management layer standout. And it will only be bundled with standardised hyper converged hardware build by selected vendors.
EVO:rail is based on Enterprise plus licensing and also has VMware's Log Insight "build-in".







For some of the announcements it takes some time to sink in or get your head around, like for instance VAIO I/O filters, it has been announced at VMworld US. You can read some info about it on this blog post written by Cormac Hogan. At first it did not attract my attention immediately, only after reading press releases and some blog post (like Cormac's I got interested. If I had been at VMworld US, I would probably have skipped sessions regarding VAIO. But with the extra info I now have, I put a VAIO related session on my "must see" list.
Please check out my blog post "My take on interesting sessions @VMworld Europe" for my complete "must see" session list.

I am planning on writing more on EVO:rail and VAIO (amongst other topics) during and after VMworld Europe.