Storage capacity reporting seems like a pretty straight forward topic. How much storage am I using? But when you introduce the concept of multiple levels of thin provisioning AND data reduction into it, all usage is not equal (does it compress well? does it dedupe well? is it zeroes?).
This multi-part series will break it down in the following sections:
- VMFS and thin virtual disks
- VMFS and thick virtual disks
- Thoughts on VMFS Capacity Reporting
- VVols and capacity reporting
- VVols and UNMAP
Let’s talk about the ins and outs of these in detail, then of course finish it up with why VVols makes this so much better.
NOTE: Examples in this are given from a FlashArray perspective. So mileage may vary depending on the type of array you have. The VMFS and above layer though are the same for all. This is the benefit of VMFS–it abstracts the physical layer. This is also the downside, as I will describe in these posts.
So we have talked about thin disks, we have talked about thick disks. In short, thin disks have the major benefit of insight into what the host has written and of course correcting mismatches through in-guest UNMAP. Thick disks do not have this ability. These idiosyncrasies can make space reporting and management a problem.
So this begs the question:
Can VMFS capacity reporting be changed? Can VMFS take advantage of array-based data reduction? Should it?
It is an interesting question.
The answer to the first question is no. VMFS reporting is what it is.
The next two questions are different. VMFS is no different in concept to any other file system.
So let’s ignore the fact that we cannot override VMFS space reporting for now. Let’s assume that we can. So should we?
Well the whole point of VMFS is to provide abstracted storage to a VM, so it can be moved around and treated identically regardless to the physical layer.
If override how big or small a file is on VMFS, how do we know how big or small it will be when copied to the same VMFS or a different one? If one VMFS is on a reducing array and the other is not, the different could be colossal. If they are different vendors but both offer data reduction, the difference could still be stark, as not all data reduction is equal.
So that introduces a large question mark.
Furthermore, VMFS semantics and locks are related to blocks on the physically presented device, if a VMDK size is overridden, where does that translation layer occur? In VMFS? In the array? Who controls how it is done? What about file locks? Does that break? What about UNMAP?
What about different types of virtual disks? Is this enabled for thick and not thin? There are different types of thick too, what about that?
Does this break VMFS large file blocks (LFBs) and small file blocks (SFBs)? Probably. A lot of this would be need to be rewritten, and how it would work would vary wildly between vendors.
VMFS is a file system, overriding its accounting would cause far more problem than it solves.
If we just override the overall number, what benefit does that really provide us over what we already have?
Regardless, VMware has solved this via three main ways:
- Creating new file techniques that are more efficient. Linked Clones. Thin virtual disks. SE Sparse. Instant Clone. Different file features that allow VMFS to do its own type of data reduction.
- VSAN. Well this gives VMware control of the whole storage stack, so they have more flexibility around control.
- Virtual Volumes. This is the other side of the first two. Instead of VMware taking more control, it allows the vendor to take full control of capacity reporting and data management. More on this in the next blog post.
In short, the solution is not to entirely re-write VMFS–they already did by removing it and introducing VVols. VVol datastores are just logical abstractions that allow the storage array to report whatever it wants. More on this in the next post.
Conclusion
So before we go into VVols. How do I manage VMFS.
A few recommendations:
First, monitor your VMFS used capacity.
When it is full, it is full, I don’t care how well the array has reduced the footprint. VMFS doesn’t know about data reduction.
First off, try to correct inefficiencies:
- Use thin virtual disks when possible. If you don’t mind the slight latency overhead, use thin. Most applications cannot tell the minor difference in performance and the benefits will far outweigh that downside. A major benefit being:
- In-guest UNMAP. Enable it. This works with Windows 2012 R2 and later in ESXi 6.0 and with Linux in 6.5 and later. It doesn’t work well until 6.5 patch 1 (for either OS). Move to this ESXi release.
- If you are using thick-type virtual disks, UNMAP is not an option and you are left with zeroing to reclaim space. This makes keeping capacity correct on your array more of a targeted, reactive approach: ” I suspect I have dead space in the VM, I will zero its free space now”. Unlike in-guest UNMAP that just takes care of itself automatically continuously.
If this has all been done. Now it is time to:
- Expand your VMFS
- Add a new VMFS
- Use Storage vMotion or Storage DRS to balance capacity.
Beyond monitoring your VMFS volumes, monitor your array overall usage. Now this might vary a bit from vendor (maybe you need to monitor a pool or something similar instead).
Monitoring your VMFS is all you need on the volume level. The only case where you might look at the volume usage on the array, is to identify dead space, or report on space chargeback for example.
When your array is almost full (or to prevent that from happening)
- If using VMFS-5 have a practice around UNMAP. Run it on a schedule. Or run it when your array is almost full. Or when a volume usage seems out of whack with VMFS.
- If using VMFS-6, just keep automatic UNMAP enabled
When you have exhausted the UNMAP option and the array is still too full for comfort, then:
- Rebalance capacity if multiple arrays
- Add more physical capacity to the array
By the way, I talk more about the above topic specific for the FlashArray here:
VMFS Capacity Monitoring in a Data Reducing World
Or….
Move to VVols. This is where these trade-offs and problems have been resolved. The next post will dive into how VVol capacity reporting works. Specifically, of course, on the FlashArray.
Can VMware/Purestorage use a replication snapshot to populate a VVOL?
Currently, we refresh our DR environment from replications snapshots. But as we move to Virtual MS SQL Clusters, it would be great to test VVOLs.
Absolutely! Replicate the snapshot to array B and it can be copied to a VVol. Take the pgroup snapshot of the volume and do a purevol copy. Easiest thing is to create a same sized VVol virtual disk in the VM, and then purevol copy from the snapshot to the volume that is that virtual disk. I’ll work on a blog post.