Using InfiniBand for Cheap and Fast Point-To-Point Networking

Sat 25 March 2017 Category: Networking

InfiniBand networking is quite awesome. It's mainly used for two reasons:

  1. low latency
  2. high bandwidth

As a home user, I'm mainly interested in setting up a high bandwidth link between two servers.

I was using quad-port network cards with Linux Bonding, but this solution has some downsides:

  1. you can only go to 4 Gbit with Linux bonding (or you need more ports)
  2. you need a lot of cabling
  3. it is similar in price as InfiniBand

So I've decided to take a gamble on some InfiniBand gear. You only need InfiniBand PCIe network cards and a cable.

1 x SFF-8470 CX4 cable                                              $16
2 x MELLANOX DUAL-PORT INFINIBAND HOST CHANNEL ADAPTER MHGA28-XTC   $25
                                                            Total:  $66

view of installed infiniband card and cable

I find $66 quite cheap for 20 Gbit networking. Regular 10Gbit Ethernet networking is often still more expensive that using older InfiniBand cards.

InfiniBand is similar to Ethernet, you can run your own protocol over it (for lower latency) but you can use IP over InfiniBand. The InfiniBand card will just show up as a regular network device (one per port).

ib0 Link encap:UNSPEC HWaddr 80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00  
      inet addr:10.0.2.3  Bcast:10.0.2.255  Mask:255.255.255.0
      inet6 addr: fe80::202:c902:29:8e01/64 Scope:Link
      UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
      RX packets:7988691 errors:0 dropped:0 overruns:0 frame:0
      TX packets:17853128 errors:0 dropped:10 overruns:0 carrier:0
      collisions:0 txqueuelen:256 
      RX bytes:590717840 (563.3 MiB)  TX bytes:1074521257501 (1000.7 GiB)

Configuration

I've followed these instructions to get IP over InfiniBand working.

Modules

First, you need to assure the following modules are loaded at a minimum:

ib_mthca
ib_ipoib

I only had to add the ib_ipoib module to /etc/modules. As soon as this module is loaded, you will notice you have some ibX interfaces available which can be configured like regular ethernet cards

Subnet manager

In addition to loading the modules, you also need a subnet manager. You just need to install it like this:

apt-get install opensm

This service needs to run on just one of the endpoints.

Link status

if you want you can check the link status of your InfiniBand connection like this:

# ibstat
CA 'mthca0'
    CA type: MT25208
    Number of ports: 2
    Firmware version: 5.3.0
    Hardware version: 20
    Node GUID: 0x0002c90200298e00
    System image GUID: 0x0002c90200298e03
    Port 1:
        State: Active
        Physical state: LinkUp
        Rate: 20
        Base lid: 1
        LMC: 0
        SM lid: 2
        Capability mask: 0x02510a68
        Port GUID: 0x0002c90200298e01
        Link layer: InfiniBand
    Port 2:
        State: Down
        Physical state: Polling
        Rate: 10
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x02510a68
        Port GUID: 0x0002c90200298e02
        Link layer: InfiniBand

Set mode and MTU

Since my systems run Debian Linux, I've configured /etc/network/interfaces like this:

auto ib0
iface ib0 inet static
    address 10.0.2.2
    netmask 255.255.255.0
    mtu 65520
    pre-up echo connected > /sys/class/net/ib0/mode

Please take note of the 'mode' setting. The 'datagram' mode gave abysmal network performance (< Gigabit). The 'connected' mode made everything perform acceptable.

The MTU setting of 65520 improved performance by another 30 percent.

Performance

I've tested the card on two systems based on the Supermicro X9SCM-F motherboard. Using these systems, I was able to achieve file transfer speeds up to 750 MB (Megabytes) per second or about 6.5 Gbit as measured with iperf.

~# iperf -c 10.0.2.2
------------------------------------------------------------
Client connecting to 10.0.2.2, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 10.0.2.3 port 40098 connected with 10.0.2.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.49 GBytes  6.43 Gbits/sec

Similar test with netcat and dd:

~# dd if=/dev/zero bs=1M count=100000 | nc 10.0.2.2 1234
100000+0 records in
100000+0 records out
104857600000 bytes (105 GB) copied, 128.882 s, 814 MB/s

Testing was done on Debian Jessie.

During earlier testing, I've also used these cards in HP Micro Proliant G8 servers. On those servers, I was running Ubuntu 16.04 LTS.

As tested on Ubuntu with the HP Microserver:

------------------------------------------------------------
Client connecting to 10.0.4.3, TCP port 5001
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[  5] local 10.0.4.1 port 52572 connected with 10.0.4.3 port 5001
[  4] local 10.0.4.1 port 5001 connected with 10.0.4.3 port 44124
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  71.9 GBytes  10.3 Gbits/sec
[  4]  0.0-60.0 sec  72.2 GBytes  10.3 Gbits/sec

Using these systems, I was able eventually able to achieve 15 Gbit as measured with iperf, although I have no 'console screenshot' from it.

Closing words

IP over InfiniBand seems to be a nice way to get high-performance networking on the cheap. The main downside is that when using IP over IB, CPU usage will be high.

Another thing I have not researched, but could be of interest is running NFS or other protocols directly over InfiniBand using RDMA, so you would bypass the overhead of IP.

Comments