A bit of a long rambling one here. Here it goes…
Virtual Disk allocation mechanism choice is something that was somewhat of a hot topic a few years back with the “thin on thin” question almost becoming a religious debate at times. Essentially this has cooled down and vendors have made their recommendations and end users have their preferences and that’s that. With the true advent of the all-flash-array such as the Pure Storage FlashArray with deduplication, compression, pattern removal etc. I feel like this somewhat basic topic is worth revisiting now.
To review there are three main virtual disk types (there are others, namely SESparse but I am going to stick with the most common for this discussion):
- Eagerzeroedthick–This type is fully allocated upon creation. This means that it reserves the entire indicated capacity on the VMFS volume and zeroes the entire encompassed region on the underlying storage prior to allowing writes from a guest OS. This means that is takes longer to provision as it has to write GBs or TBs of zeroes before the virtual disk creation is complete and ready to use. I will refer to this as EZT from now on.
- Zeroedthick–This type fully reserves the space on VMFS but does not pre-zero the underlying storage. Zeroing is done on an as needed basis, when a guest OS writes to a new segment of the virtual disk the encompassing block is zeroed first than the new write is committed.
- Thin–This type neither reserves space on the VMFS or pre-zeroes. Zeroing is done on an as-needed basis like zeroedthick. The virtual disk physical capacity grows in segments defined by the block size of the VMFS, usually 1 MB.
Traditionally the choice was pretty simple, whatever was most important to you, whether it be performance, space efficiency or protection against out-of-space conditions is what dictated your choice of virtual disk. EZT had the best performance and capacity protection, while thin had the lowest performance and no protection against space exhaustion but best efficiency. Zeroedthick was pretty much somewhere in the middle.
As management of virtual disks has matured and storage-based integration has developed these lines have considerably blurred. From what I have seen (yes this is anecdotal and maybe entirely inaccurate) most customers these days use zeroedthick or thin. In the past I would have 100% agreed with those choices being the preferences. Today though, with AFAs like the FlashArray I believe this needs to be re-examined. I almost invariably recommend using EZT, let me tell you why.
Let’s examine the factors that have led me to this:
- Performance
- Protection against space exhaustion
- Space efficiency
- XCOPY performance
- Time to deploy
Performance
Starting with performance. Don’t get me wrong–performance differences between the formats have definitely reduced but EZT will always still perform better or at least the same as the others. VAAI WRITE SAME plays a big role in this, but is not a panacea.
Prior to WRITE SAME support, the performance differences between these allocation mechanisms were distinct. This was due to the fact that before any unallocated block could be written to, zeroes would have to be written first, causing an allocate-on-first-write penalty. Therefore, for every new block that was to be written to there were two writes, the zeroes then the actual data. For thin and zeroedthick virtual disks this zeroing was on-demand so the effect was observed by the virtual machine writing to new blocks. For eagerzeroedthick zeroing occurred during deployment and therefore large virtual disks took a long time to create but with the benefit of eliminating any zeroing penalty for new writes. To reduce this latency, VMware introduced WRITE SAME support. WRITE SAME is a SCSI command that tells a target device (or array) to write a pattern (in this case zeros) to a target location. ESXi utilizes this command to avoid having to actually send a payload of zeros but instead simply communicates to an array that it needs to write zeros to a certain location on a certain device. This not only reduces traffic on the SAN fabric, but also speeds up the overall process since the zeros do not have to traverse the data path.
This process is optimized even further on the Pure Storage FlashArray. Since the array does not store space-wasting patterns like contiguous zeroes, the metadata is created or changed to simply note that these locations are supposed to be all-zero so any subsequent reads will result in the array returning contiguous zeros to the host. This additional array-side optimization further reduces the time and penalty caused by pre-zeroing of newly-allocated blocks. Consequently WRITE SAME improves performance within thin and zeroedthick virtual disks. Since both types of virtual disks zero-out blocks only upon demand (new writes to previously unallocated blocks) these new writes suffer from additional latency when compared to over-writes. The introduction of WRITE SAME reduces this latency (but does not eliminate!) by speeding up the process of initializing this space. EZT never suffers from any zero/WRITE SAME on demand latencies.
Let’s look at some numbers.
The following test was created to ensure that a large proportion of the workload was new writes so that the write workload always encountered the allocation penalty from pre-zeroing (with the exception of the eagerzeroedthick test which was more or less a control). Five separate tests were run:
- Thin virtual disk with WRITE SAME disabled.
- Thin virtual disk with WRITE SAME enabled.
- Zeroedthick virtual disk with WRITE SAME disabled.
- Zeroedthick virtual disk with WRITE SAME enabled.
- Eagerzeroedthick virtual disk.
.The workload was a 100% sequential 32 KB write profile in all tests. As expected the lowest performance (lowest throughput, lowest IOPS and highest latency) was with thin or zeroedthick with WRITE SAME disabled (zeroedthick slightly out-performed thin). Enabling WRITE SAME improved both, but eagerzeroedthick virtual disks out-performed all of the other virtual disks regardless of WRITE SAME use. With WRITE SAME enabled eagerzeroedthick performed better than thin and zeroedthick by 30% and 20% respectively in both IOPS and throughput, and improved latency from both by 17%. The following three charts show the results for throughput, IOPS and latency.
Note that all of the charts do not start the vertical axis at zero—this is to better illustrate the deltas between the different tests.
It is important to understand that these tests are not meant to authoritatively describe performance differences between virtual disks types—instead they are meant to express the performance improvement introduced by WRITE SAME for writes to uninitialized blocks. Once blocks have been written to, the performance difference between the various virtual disk types diminishes. Furthermore, as workloads become more random and/or more read intensive, this overall performance difference will become less perceptible.
From this set of tests we can conclude:
- Regardless of WRITE SAME status, eagerzeroedthick virtual disks will always out-perform the other types for new writes.
- The latency overhead of zeroing-on-demand with WRITE SAME disabled is about 30% (in other words the new write latency of thin/zeroedthick is 30% greater than with eagerzeroedthick).
- The latency overhead is reduced from 30% to 20% when WRITE SAME is enabled.
- The IOPS and throughput reduction caused by zeroing-on-demand with WRITE SAME disabled is about 23% (in other words the possible IOPS/throughput of thin/zeroedthick to new blocks is 23% lower than with eagerzeroedthick).
- The possible IOPS/throughput to new blocks is reduced from 23% to 17% when WRITE SAME is enabled.
Conclusion: Over the lifetime of a virtual disk performance differences between allocation formats diminish but there is a noted difference upon initial use. Eagerzeroedthick virtual disks perform the best followed by zeroedthick and then thin. While the introduction of WRITE SAME reduced this zero-on-first-write performance difference, this gap—however small—still exists. For the most predictable and constant performance eagerzeroedthick is the best option. Recommendation: eagerzeroedthick.
Protection against space exhaustion
This is a bit of a simpler discussion than performance. But the introduction of a FlashArray does change the answer here. Traditionally, due to their architectures, EZT was the safest as it reserved space on VMFS and the backend storage, zeroedthick was in the middle because it didn’t reserve on the array, and thin was the riskiest–it could be exposed to running out of logical space on the VMFS and physical space on the array. EZT and ZT only are exposed to physical capacity exhaustion.
With management alerts and automation technologies like Storage DRS running out of space on a VMFS became less of a concern. Paving the way to feel comfortable deploying thin or zeroedthick with more critical apps. So the question more often came down to physical capacity exhaustion on the array. Previously only EZT was immune to this. With the FlashArray though, these zeroes are not stored–they are removed and metadata simply notes they are supposed to be zeroes if ever read from. Therefore, while the metadata is reserved the capacity is not, so there exists a slight risk still of running out of space. When those zeroes are “overwritten” and the array is full and the new data cannot be sufficiently reduced a EZT could run out of underlying space. This is true for any array that removes zero patterns. So due to zero removal in FlashReduce, EZT does not provide additional benefit over ZT.
Conclusion: Therefore, from a physical capacity perspective there is NO DIFFERENCE between the three types of virtual disks when it comes to out-of-space protection. ZT and EZT are slightly better on the logical side due to reserving space on the VMFS but with the pre-mentioned VMware features that help monitor this situation and the very simple ability to grow a volume on a FlashArray and resize a VMFS on the fly, thin is not at as much of a disadvantage as before. But if we are looking at a purely protection perspective EZT and ZT are the best. Recommendation: eagerzeroedthick or zeroedthick
Space efficiency
The Pure Storage FlashArray (and all AFAs for that matter) are very concerned (or should be) with space efficiency to get the $ per GB down to reasonable levels. This is why Pure offers FlashReduce with dedupe, compression, pattern removal etc. Since the FlashArray does this reduction for you it makes no difference on the VMware side of things of which virtual disk type you use. They all have the exact same footprint on the array. Therefore, like discussed in the previous section, only how they behave on the logical VMFS capacity level is what matters. Thin is the obvious winner on that from. But once again due to reasons discussed before, is winning this particular race that important? Not really. Yes, you can have a higher virtual machine to VMFS density and reduce the number of volumes in use–but with extremely high volume size limits this is less of a problem. Also having so many VMs on one VMFS could conceivably increase the chance of device queuing issues. So I don’t really see too much value these days in a high density, but maybe you do. So arguably it is somewhat of a wash. But in the end I still have to give it to thin.
Conclusion: Thin reigns supreme, but in my opinion it has an asterisk for winning this race. Recommendation: thin
Note, yes wow that was written really trying to convince you that thin is not a great option. Wasn’t exactly without the stink of biased opinion… But the reason I am so against it mainly has to do with the next section. Which is:
XCOPY Performance
This is an ugly one and one that I have encountered on many occasions at both my time at Pure and EMC. Performance of XCOPY from a thin virtual disk is horrendous compared to ZT or EZT. EZT and ZT virtual disks copy significantly faster than thin virtual disks when cloned or moved using VAAI XCOPY.
There are a few reasons for this. One is the XCOPY transfer size. The XCOPY transfer size is a value that dictates the maximum amount of capacity that can be described by a single XCOPY SCSI command. The larger the value the lower number of XCOPY commands are required to copy a virtual disk. For arrays that are very sensitive to this value, a larger transfer size can make a profound difference in the performance in XCOPY operations. The Symmetrix is a prime example of this. At Pure we do recommend upping this value from the default of 4 MB to 16 MB, but it doesn’t have a very big effect on performance–it is only really noticeable in very large/simultaneous XCOPY deployments. So what does this have to do with thin? Well unlike ZT and EZT, thin virtual disks disregard this maximum transfer size and instead use the block size of the VMFS. Which is essentially invariably 1 MB. So 1/16th the possible transfer size with EZT and ZT. If your array is sensitive to this your XCOPY performance will be very poor with thin virtual disks.
Furthermore, due to possible fragmentation and various other reasons (I don’t claim to know all of them), thin XCOPY is just slower in general. ESXi has to do more work to move/copy these properly. So even if your array isn’t sensitive to the transfer size there will be a noticeable difference. See the following chart on XCOPY performance with different virtual disk types.
Speaks for itself I’d say.
So PLEASE, PLEASE do not use thin virtual disks for virtual machine templates!
***UPDATE*** This behavior is fixed in vSphere 6 and there is no longer a difference.
Conclusion: The delta between ZT and EZT XCOPY performance diminishes as the virtual disk itself becomes more full, but in general ZT will beat out EZT because EZT is treated like it is always full. The VMs, by the way, in the chart had 100 GB virtual disks with 50 GB of data. Stay away from thin here. Recommendation: Zeroedthick
Time to deploy
This last variable is how long it takes to create a new virtual disk. Note that I mean new, not cloned with XCOPY because that changes the story (see above). Since thin and zeroedthick don’t pre-zero, EZT takes much longer to deploy, even with WRITE SAME support. In a test creating a 100 GB virtual disk, a thin one took .15 seconds to create, zeroedthick .55 seconds and eagerzeroedthick 24 seconds. For the most part I think this is a silly thing to care about because how often do you actually create NEW virtual disks and does it really matter if it takes a second or a few dozen, but maybe it does. For very large virtual disks this can still take a while, tens of TB can easily be an hour or more, but that is much faster than zeroing by itself which could be far longer. Anyways thin just edges out zeroedthick, but not in any meaningful way
Conclusion: EZT is the longest, so you’ll have to twiddle your thumbs for a bit while it deploys. Recommendation: Thin or Zeroedthick.
Conclusion
I think after all of this an argument could be made for zeroedthick, and I don’t really have a problem with that. But my thoughts are that the places where ZT edges out EZT are either minor or the difference is so close that they are overridden by more important like in-guest performance. With physical efficiency not really being a factor and most other things essentially being a toss-up between ZT and EZT why not go with the one with the most consistent performance?
So EZT is my default recommendation. Specifically though, how about this?
Sensitive to performance? Use EZT
Sensitive to time to deploy? Use ZT/thin
Not sensitive to performance and no plans to clone/SvMotion? Go ahead with thin.
My main goal here was though to drive home the point that AFAs with data reduction technologies like the FlashArray change this conversation, so if you are moving from legacy disk to AFA (hopefully you are!) you might need to rethink some virtual disk choice strategies.
With vSphere 6 having been out a while, and now Vsphere 6.5 out, do you still recommend EZT over thin, or has that changed? Looking at the Pure documentation, they seem to recommend Thin.
Today we (Pure) recommend thin in pretty much any scenario except the most latency sensitive applications which should use EZT. I probably should update this post, some updates would be useful here