Setting Up Infiniband

So we got some new IB cards, and we needed to set them up on our servers. Our servers are Ubuntu 14.04 for this post, but I believe 16.04 should be similar.

Install the cards Physically.

To check if your hardware found your cards, enter the following:

lspci -v | grep Mellanox
02:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

You should get something like the above.

Install Infiniband Driver

Refer to the Release notes of version v4_2-1_2_0_0. The reference has a list of packages that are required before installation. I found out afterwards, that the installer seems to check these dependencies and installs them itself, but why not prepare your system beforehand.

$ apt-get install perl dpkg autotools-dev autoconf libtool automake1.10 automake m4 dkms debhelper tcl tcl8.4 chrpath swig graphviz tcl-dev tcl8.4-dev tk-dev tk8.4-dev bison flex dpatch zlib1g-dev curl libcurl4-gnutls-dev python-libxml2 libvirt-bin libvirt0 libnl-3-dev libglib2.0-dev libgfortran3 automake m4 pkg-config libnuma-dev logrotate ethtool lsof

For the libnuma package and the libnl-dev package, the corresponding package names are libnuma-dev and libnl-3-dev​.

Afterwards, checkout the ConnectX-3 Pro VPI Single and Dual QSFP+ Port Adapter Card User Manual for more help with installing.

 

Now, go ahead and install the Mellanox OFED. Download the installer from the Mellanox website under Products->Software->Infiniband VPI drivers. Go for Mellanox OFED Linux and at the bottom click the Download button. If nothing shows up and you are using Chrome, make sure to enable unsafe scripts.

Download the tgz file (or iso if you prefer iso) for your distribution. Untar the file.

Install the Mellanox OFED by executing the following script:

./mlnxofedinstall [OPTIONS if applicable. I didn't need any]

Afterwards, I rebooted the system.

Assigning IP addresses to each IB

Now Infiniband supports IPoIB that seems to allow infiniband to be resoluted with IP addresses. For this part I referred to the following post. Just to make sure IPoIB is installed, check the following command

lsmod | grep ipoib

There should be a ib_ipoib module loaded.

Now check your ib interface names via ifconfig -a command. Then set your ib IP addresses in /etc/network/interfaces file.

auto ib0
iface ib0 inet static
address 10.0.0.1
netmask 255.255.255.0
broadcast 10.0.0.255

And bring up your network device (ib) up via

ifup ib0

Setting up the Subnet Manager (If your not using a IB Switch)

Now if you check the status of your ib cards, via ibstat you may find that your card states are State: Initializing. Intel developer zone has a Troubleshooting InfiniBand connection issues using OFED tools Under the state part, I found that the INIT state corresponds to a HW initialized, but subnet manager unavailable situation.

If you are in a situation like I am, where you do not have an Infiniband switch, and you are just connecting nodes directly, you need to start up a SW subnet manager. Another intel guide allowed me to start up the subnet manager.

/etc/init.d/opensmd start

Afterwards my ibstat showed that my State: Active.

I tried a few tests, ib_send_bw to check the performance between two nodes and found that my system was working as expected.

Also, to setup the subnet manager to startup at boot execute the following command

update-rc.d opensmd defaults