This is a topic I have posted about in the past but this time I am going to speak about it with the Pure Storage FlashArray. Anyone familiar with the VMware Native Multipathing Plugin probably knows about the Round Robin “IOPS” value which I will interchangeably also refer to as the IO Operation Limit. This value dictates how often NMP switches paths to the device–after a configured number of I/Os NMP will move to a different path. The default value of this is 1,000 but can be changed to as low as 1. For the highest performance Pure recommends changing this setting to 1 for all devices. The tricky thing is that it has to be done for every device on every host and doing this in a simple way isn’t immediately obvious. But here is the procedure.
The most common method employed to do this was setting it on each device using esxcli, but this is not exactly the most scalable method, but it requires doing it to every device on every host until the end of time. What is much easier is to create a rule that specifically will set a IOPS value for every Pure device that comes in. The SATP that claims Pure devices is the standard ALUA one, VMW_SATP_ALUA. So a rule needs to be assigned for Pure devices claimed by that SATP. First you need some information.
To create a rule specific enough to encompass only Pure devices we need to get the vendor information from an existing device. The simplest way to do this (or a simple one at least) is to just grep the vmkernel log after a rescan:
grep -i scsiscan /var/log/vmkernel.log
This will give you lines that look like so:
2014-05-14T21:54:50.756Z cpu13:33081 opID=2ac75bde)ScsiScan: 976: Path 'vmhba3:C0:T5:L11': Vendor: 'PURE ' Model: 'FlashArray ' Rev: '342 '
We just need to take the vendor and model names, which unsurprisingly are PURE and FlashArray respectively. To create a rule to both make sure Pure devices use round robin and that the IOPS value is always set to 1 run this command on all of your ESXi hosts:
esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O "iops=1"
This is case sensitive so make sure you type this exactly as above.
***See how to do this with PowerCLI here***
Note that existing devices will not get this change! If they are currently using MRU or something or have a different IOPS value this will not change them. You either need to specifically change existing devices or unclaim and reclaim them (which requires the device going offline) or reboot the host. If you want to change specific devices without taking them offline you can run (with a different NAA of course):
esxcli storage nmp device set -d naa.6006016055711d00cff95e65664ee011 --psp=VMW_PSP_RR
esxcli storage nmp psp roundrobin deviceconfig set -d naa.6006016055711d00cff95e65664ee011 -I 1 -t iops
Regardless all new devices will now be claimed with round robin using an IOPS value of 1 from this point on. You can check the IO Operation Limit value for a given device by running:
esxcli storage nmp psp roundrobin deviceconfig get --device naa.624a9370753d69fe46db318d00010000
Byte Limit: 10485760 Device: naa.624a9370753d69fe46db318d00010000 IOOperation Limit: 1 Limit Type: Default Use Active Unoptimized Paths: false
To change or remove the rule you cannot simply just run the command again to change the rule back to 1,000 or whatever number. You must first remove the rule and then you can create a new one with a different number, or leave it without a rule to use 1,000 again.
esxcli storage nmp satp rule remove -s "VMW_SATP_SYMM" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O "iops=1"
If you don’t remember what you set or want to take a look at the existing rules, run:
esxcli storage nmp satp rule list -s VMW_SATP_ALUA
Pretty straight forward!
Why is it so hard for vendors to add their best practice claim rules to ESX? I am going mad adding this for different vendors.
Agreed–I wish this was easier for vendors to change. When I was at EMC it took us almost four years to get them to change the VMAX default to RR from Fixed.
By the way, Pure Storage best practices are now default in ESXi, so you do not need to do this anymore.
Don’t you also want to set TPGS to on in this case?
No, since the FlashArray is active/active TPGS doesnt really need to be messed with. Could set it to off, but it doesn’t really matter.
If you just don’t want to pay too much attention to the naa attribute, using these two commands should help.
RR activation :
# for i in `esxcli storage nmp device list | grep PURE | awk ‘{gsub(/[()]/,””); print $8}’` ; do `esxcli storage nmp device set -d $i –psp=VMW_PSP_RR`; done
Path Switching to 1 :
# for i in `esxcli storage nmp device list | grep PURE | awk ‘{gsub(/[()]/,””); print $8}’` ; do esxcli storage nmp psp roundrobin deviceconfig set -d $i -I 1 -t iops;done
Hi Cody, Isn’t round robin now the default for vsphere 6.5 ?
Yup: https://www.codyhosterman.com/2017/07/nmp-multipathing-rules-for-the-flasharray-are-now-default/
I am not seeing any ScsiScan info in /var/log/vmkernel.log
Do you know of another way to retrieve the information?
For any device presented you should see it. esxcfg-scsidevs -l should show it too. What vendor are you looking to configure for?
Oh wow that is cool, I ran “esxcfg-scsidevs -l” and looks like there are 15 different ones.
Three pertain to “Vendor: PURE” and “Model: FlashArray”.
I just have one array, should there be 15 different naa.#’s?
Each is Multipath Plugin: NMP
Every volume (or datastore or LUN or whatever you want to call it) you provision will have it’s own NAA. The NAA is based on the volume serial number, so each one has a unique NAA–as it is what VMware uses to identify each datastore uniquely. Though for the FlashArray the vendor and model info will always be PURE and FlashArray–this is not unique to a volume, instead it is common to all storage from our array. To create a SATP rule you would use those values for us. If you are running the latest versions of ESXi though, you do not need to do this anymore
Ah ok so I am on esxi 5.5. I’d just run the rule for PURE FlashArray and should be good. Or alternatively, I’d upgrade to esxi 6.5 and wouldn’t need to add the rule to change the round robin io limit?
Thanks again
You’re welcome! Yep exactly! If you are on 5.5 run that rule on each ESXi host once and you are good. If you are on 6.0 Express patch 5 or later or 6.5 U1 or later you dont need to do it at all as these recommendations are now default in ESXi for the FlashArray in those releases and later