SIOC / StorageRM causes very high traffic on SAN datastores

Excessive and unexplained READs on iSCSI and FC datastores observed on ESXi hosts.



This behaviour was observed on a large VMWare environment, with multiple iSCSI and / or FC LUNs.
As an average each host would generate 200MB READs / LUN.

If SIOC was once enabled on these datastores, the storageRM service on the ESXi hosts will not stop reading the slotsfile and iormstats.sf file, even if SIOC is disabled in V-Center.
When SIOC is enabled , two files are created on the datastore:

 /vmfs/volumes/iscsi_datastore/.iormstats.sf  - keeps track of the latency values and datastore details
/vmfs/volumes//iscsi_datastore/.naa.60000000000000000200000000000008/slotsfile – contains all hosts UUID which access this datastore

The storageRM service will keep trying to read the slotsfile every 4 seconds, if the slotsfile is corrupt, or contains too many hosts, storageRM will generate a huge amount of READs, as seen using esxtop or analysing the switch stats or the storage array’s stats.

StorageRM log shows entries similar to:




2016-04-20T12:23:51.270Z: stat file /vmfs/volumes/iscsi_datastore/.iormstats.sf already exists.
2016-04-20T12:23:51.270Z: <iscsi_datastore, 0> Opening slot count file /vmfs/volumes//iscsi_datastore/.naa.60000000000000000200000000000008/slotsfile
2016-04-20T12:23:51.278Z: open /vmfs/volumes//iscsi_datastore/.naa.60000000000000000200000000000008/slotsfile(0x10000042, 0x0) -> 6 succeeded
2016-04-20T12:23:51.278Z: <iscsi_datastore, 0> File too big => number of hosts > 128. Not Supported
2016-04-20T12:23:51.279Z: Successfully closed file 6.
2016-04-20T12:23:51.279Z: <iscsi_datastore, 0> Error -1 in opening & reading the slot file
2016-04-20T12:23:51.279Z: Couldn't get a slot
2016-04-20T12:23:51.279Z: Successfully closed file 5.
2016-04-20T12:23:51.279Z: <iscsi_datastore, 0> Error in opening stat file for device: naa.60000000000000000200000000000008.Ignoring this device.
2016-04-20T12:23:51.281Z: open /vmfs/volumes/nexsanluns(0x10000, 0x0) -> 5 succeeded
2016-04-20T12:23:51.281Z: open /vmfs/volumes/nexsanluns(0x0, 0x0) -> 6 succeeded
2016-04-20T12:23:51.339Z: Successfully closed file 6.
2016-04-20T12:23:51.339Z: Stat file: numblocks= 2048 blocksize = 1048576
2016-04-20T12:23:51.340Z: stat file /vmfs/volumes/nexsanluns/.iormstats.sf already exists.
2016-04-20T12:23:51.340Z: <nexsanluns, 0> Opening slot count file /vmfs/volumes//nexsanluns/.naa.6000402e50000000344532d2ce974e6a/slotsfile
2016-04-20T12:23:51.343Z: open /vmfs/volumes//nexsanluns/.naa.6000402e50000000344532d2ce974e6a/slotsfile(0x10000042, 0x0) -> 6 succeeded
2016-04-20T12:23:51.343Z: <nexsanluns, 0> File too big => number of hosts > 128. Not Supported
2016-04-20T12:23:51.344Z: Successfully closed file 6.
2016-04-20T12:23:51.344Z: <nexsanluns, 0> Error -1 in opening & reading the slot file
2016-04-20T12:23:51.344Z: Couldn't get a slot
2016-04-20T12:23:51.344Z: Successfully closed file 5.
2016-04-20T12:23:51.344Z: <nexsanluns, 0> Error in opening stat file for device: naa.6000402e50000000344532d2ce974e6a.Ignoring this device.



Solution:
1. Stop storageRM on each host generating this traffic:

[root@esxi-host:~]  /etc/init.d/storageRM stop



2. Disable the feature on each datastore and each host:

[root@esxi-host:~] vsish -e set /storage/scsifw/devices/naa.60000000000000000200000000000008/iormState 2000
[root@esxi-host:~] vsish -e set /storage/scsifw/devices/naa.6000402e50000000344532d2ce974e6a/ iormState 2000



The value is not really important, but 2000 is known to be the default and thus disabled.

3. Remove the slotsfile on eahc datastore:



[root@esxi-host:~] rm -rf  /vmfs/volumes//iscsi_datastore/.naa.60000000000000000200000000000008/slotsfile

If removing does not work, because the file is corrupt, move it first, and then remove it.



[root@esxi-host:~] mv /vmfs/volumes//iscsi_datastore/.naa.60000000000000000200000000000008/slotsfile /vmfs/volumes//iscsi_datastore/.naa.60000000000000000200000000000008/slotsfile.old
[root@esxi-host:~] rm -rf  /vmfs/volumes//iscsi_datastore/.naa.60000000000000000200000000000008/slotsfile
[root@esxi-host:~] rm -rf  /vmfs/volumes//iscsi_datastore/.naa.60000000000000000200000000000008/slotsfile.old

Removing this file is completely safe, it will be automatically recreated by the next host accessing the datastore via storageRM.

NOTE: VMWare Support was completely clueless.