Improving iSCSI Native Multi Pathing Round Robin Performance

May 27, 2013 Category: Storage

10 Gb ethernet is still quite expensive. You not only need to buy appropriate NICS, but you must also upgrade your network hardware as well. You may even need to replace existing fiber optic cabling if it's not rated for 10 Gbit.

So I decided to still just go for plain old 1 Gbit iSCSI based on copper for our backup SAN. After some research I went for the HP MSA P2000 G3 with dual 1 Gbit iSCSI controllers.

Each controller has 4 x 1 Gbit ports, so the box has a total of 8 Gigabit ports. This is ideal for redundancy, performance and cost. This relatively cheap SAN does support active/active mode, so both controllers can share the I/O load.

The problem with storage is that a single 1 Gbit channel is just not going to cut it when you need to perform bandwidth intensive tasks, such as moving VMs between datastores (within VMware).

Fortunately, iSCSI Multi Pathing allows you to do basically a RAID 0 over multiple network cards, combining their performance. So four 1 Gbit NICS can provide you with 4 Gbit of actual storage throughput.

The trick is not only to configure iSCSI Multi Pathing using regular tutorials, but also to enable the Round Robin setting on each data store or each RAW device mapping.

So I dit all this and still I got less than 1 Gb/s performance, but fortunately, there is only one little trick to get to the actual performance you might expect.

I found this at multiple locations but the explanation on Justin's IT Blog is best.

By default, VMware issues 1000 IOPS to a NIC before switching (Round Robin) to the next one. This really hampers performance. You need to set this value to 1.

esxcli storage nmp psp roundrobin deviceconfig set -d $DEV --iops 1 --type iops

This configuration tweak is recommended by HP, see page 28 of the linked PDF.

Once I configured all iSCSI paths to this setting, I got 350 MB/s of sequential write performance from a single VM to the datastore. That's decent enough for me.

How do you do this? It's a simple one liner that sets the iops value to 1, but I'm so lazy, I don't want to copy/past devices and run the command by hand each time.

I used a simple CLI script (VMware 5) to configure this setting for all devices. SSH to the host and then run this script:

for x in `esxcli storage nmp device list | grep ^naa`
    echo "Configuring Round Robin iops value for device $x"
    esxcli storage nmp psp roundrobin deviceconfig set -d $x --iops 1 --type iops

This is not the exact script I used, I have to verify this code, but basically it just configures this value for all storage devices. Devices that don't support this setting will raise an error message that can be ignored (if the VMware host also has some local SAS or SATA storage, this is expected).

The next step is to check if this setting is permanent and survives a host reboot.

Anyway, I verified the performance using a Linux VM and just writing a simple test file:

dd if=/dev/zero of=/storage/test.bin bs=1M count=30000

To see the Multi Pathing + Round Robin in action, run esxtop at the cli and then press N. You will notice that with four network cards, VMware will use all four channels available.

This all is to say that plain old 1 Gbit iSCSI can still be fast. But I believe that 10 Gbit ethernet does probably provide better latency. If that's really an issue for your environment, is something I can't tell.

Changing the IOPS parameter to 1 IOPS also seems to improve random I/O performance, according to the table in Justin's post.

Still, although 1 Gbit iSCSI is cheap, it may be more difficult to get the appropriate performance levels you need. If you have time, but little money, it may be the way to go. However, if time is not on your side and money isn't the biggest problem, I would definitely investigate the price difference with going for fibre channel or with 10Gbit iSCSI.