10 April, 2013

Real life benefits and caveats of NFS storage with VAAI

I recently was involved in a storage replacement project. This storage was used for a fairly large vSphere and vCloud environment amongst other.
During the replacement the customer went from FC storage to NFS storage, the FC storage did not incorporate VAAI (actually it did, but was disabled for stability reasons). The new NFS storage did incorporate VAAI and the design and sizing was done by involving all size reduction technics available on both the storage and vSphere. Therefore thin provisioning and deduplication were must haves to not run out of storage space before the end of the project.
In the initial phase of the project, right after the new NFS datastores where presented to the vSphere environment the customer ran into a big surprise. The VMware administrators had started to move VM's from the FC storage to the NFS storage and they forced the disk provisioning from these VM's from "thin" to "thick lazy zero" as they where convinced running "thin on thin" was very dangerous and could lead into a out of space issue without receiving any warnings or alerts from vCenter.
Not very well know but luckily recently very well documented by Cormac Hogan in his blog is that from the three disk provisioning types "thin, lazy zero thick and eager zero thick", only two are usable on NFS storage. Please read his blog posts on NFS best practices to get all the details, but in short NFS does not use the "lazy zero thick" type it will set it as "eager zero thick"! But they thought the deduplication feature of the storage device would straighten this out by deduping all "zero's". But what actually does happen with VAAI enabled, vSphere and the storage device become aware of each other and because vSphere has the disks provisioned as "eager zero thick" the storage device will not touch these zero'd blocks as they are reserved by vSphere (Reserve space feature of VAAI). So there was no dedupelication at all on these eager zero thick provisioned VM's.
After the storage admin's alerted them that they were about to run out of space on their datastores they only got up to 25% dedupe, I started looking at this and found the reason why.
The issue was solved without any uptime lost on the VM's, by creating and using a Powercli script that would Storage vMotion all VM's from 1 datastore to a other (temp.) datastore and back to the original location after the first action was completed. On the way back the disks where converted from "eager zero thick" to "thin".
After all VM's where back on there original datastore with "thin" provisioned disks the dedupe factor went up to almost 70%.

No comments:

Post a Comment