21 February, 2013

VMware Inventory Service causes vCenter server to run out of diskspace

Recently I ran into a issue with a vCenter server, this server apparently crashed because it ran out of diskspace on it's system disk. This particular vCenter server was running on physical hardware. When the system disk was investigated in a attempt to find what caused the disk to fill up 18 GB of free space in less then 48 hours, 4 large log files where found in the Inventory service database logs
 folder. After a second look, they where all created within 20 minutes and for some reason the logfiles weren't flushed to the database. Usually there is a new log file created every 10 minutes or so and / or if the file size is around 4 GB.
In search for a explanation and a solution a good look in the VMware vCenter documentation, resulted in finding a part in the "After you install the vCenter server" chapter called "Back up the Inventory Service Database on Windows" and in this part a "scripts"  folder is mentioned. One of the scripts is intended to backup the Inventory Database.
I presume everyone that has something to do vSphere knows how important the vCenter database is and I hope that all backup this accordingly. But I personally never read any article or best practice that stated to also backup the Inventory Service database separately, this includes the VMware documentation that is provided as a guideline to upgrade from vSphere 5.0 to 5.1.

Additional information:
After an other day of testing, the excessive growth of the logs could be leaded back to vCloud deploying vApps. For test we had 5 vApps deployed and monitored the xhive log files, this started growing during these deployment at a rate of 2 GB per minute.
If this can be addressed as "normal" behaviour....I am not sure, neither is GSS. The case which investigates this is still open at the time of me writing this article.

So the big question is / was, what caused the log files to grow this fast and why weren't these logs written to the database as it normally does? For now the only explanation is hardware malfunction.
This particular vCenter server was a physical server and it had a malfunctioning disk controller cache battery, therefore it had disabled the write cache to prevent data corruption.
This limited the write performance for the usual 100 MB/s to only 5MB/s, the server just could not keep up with writing log entries during the vCloud deployments.
The battery has been replaced and for now vCloud keeps on generating huge amount of log entries during deploy actions, but for now the server is able to keep up and write them to the database is a timely fashion.

6 comments:

  1. I had the same issue last night where the C: drive got filled up with these xhive logs in a matter of a few minutes.

    The difference though is that my vCenter is a VM running on SAN storage and I wasn't deploying any vApps or doing any tasks at all.

    I have seen a log file grow large after importing an existing ESX host that had a lot of VMs imported with it, but he log would shrink or go away after a few minutes.

    Last night, I ended up resetting the Inventory Service database.

    I will be adding vCloud Director to my vCenter soon, but it's scary now after last night and reading this article.

    Have you come across any further info on the cause/resolution?

    ReplyDelete
    Replies
    1. Matt, the normal behaviour I have seen is that indeed the xhive log would grow larger as more events where happening in vCenter. And you are right the log would be archived / shrunk after a set time or a set size (don't know which of the 2). Resetting the Inventory database will also reset the xhive log. This particular environment has been running without this issue before vCloud was added. vCloud has the tendency to generate a lot of actions and events in vCenter when you deploy vApps. With this customer the problem was caused multiple things; not enough space on disk for the xhive to grow, write cache backup battery defect (server could only write at 10MB/s max.), backup client that somehow corrupted the inventory database. After we sorted this out, we moved the Inventory Service to a separate server (VM) with enough disk space, replaced the battery of the vCenter server and solved the backup of the Inventory service VM in a alternative way. By doing this the complete environment has been running without having this issue re-appearing from end of February up until today. So if you add vCloud take a special care for the Inventory Service and you will be fine.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Any solution for this? I'm seeing inventory service eat nearly 100% cpu of a quad core and writing a few hundred megs per second. I can't find a good solution. I used the mssql express DB for vcenter inventory service since i have a very small install base.

    ReplyDelete
    Replies
    1. In our case it was just the password for the built-in SSO administrator account aministrator@vsphere.local - avoid using special characters. Most problems with special and non-ASCI characters should be solved according to http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2060637
      We are running vCenter 5.5.0b and it still was a problem…

      Delete
    2. Bryan, with the CPU load and disk I/O you are experiencing I would suggest first to move away from the SQL express solution. Even if you run a small install base and go with a separate database server. This will move some of the load to a secondary server.

      Delete