So we got some new IB cards, and we needed to set them up on our servers. Our servers are Ubuntu 14.04 for this post, but I believe 16.04 should be similar.
Install the cards Physically.
To check if your hardware found your cards, enter the following:
lspci -v | grep Mellanox
02:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
You should get something like the above.
Install Infiniband Driver
Refer to the Release notes of version v4_2-1_2_0_0. The reference has a list of packages that are required before installation. I found out afterwards, that the installer seems to check these dependencies and installs them itself, but why not prepare your system beforehand.
$ apt-get install perl dpkg autotools-dev autoconf libtool automake1.10 automake m4 dkms debhelper tcl tcl8.4 chrpath swig graphviz tcl-dev tcl8.4-dev tk-dev tk8.4-dev bison flex dpatch zlib1g-dev curl libcurl4-gnutls-dev python-libxml2 libvirt-bin libvirt0 libnl-3-dev libglib2.0-dev libgfortran3 automake m4 pkg-config libnuma-dev logrotate ethtool lsof
libnuma package and the
libnl-dev package, the corresponding package names are
Afterwards, checkout the ConnectX-3 Pro VPI Single and Dual QSFP+ Port Adapter Card User Manual for more help with installing.
Now, go ahead and install the Mellanox OFED. Download the installer from the Mellanox website under Products->Software->Infiniband VPI drivers. Go for Mellanox OFED Linux and at the bottom click the
Download button. If nothing shows up and you are using Chrome, make sure to enable
Download the tgz file (or iso if you prefer iso) for your distribution. Untar the file.
Install the Mellanox OFED by executing the following script:
./mlnxofedinstall [OPTIONS if applicable. I didn't need any]
Afterwards, I rebooted the system.
Assigning IP addresses to each IB
Now Infiniband supports IPoIB that seems to allow infiniband to be resoluted with IP addresses. For this part I referred to the following post. Just to make sure IPoIB is installed, check the following command
lsmod | grep ipoib
There should be a
ib_ipoib module loaded.
Now check your ib interface names via
ifconfig -a command. Then set your ib IP addresses in
iface ib0 inet static
And bring up your network device (ib) up via
Setting up the Subnet Manager (If your not using a IB Switch)
Now if you check the status of your ib cards, via
ibstat you may find that your card states are
State: Initializing. Intel developer zone has a Troubleshooting InfiniBand connection issues using OFED tools Under the state part, I found that the
INIT state corresponds to a HW initialized, but subnet manager unavailable situation.
If you are in a situation like I am, where you do not have an Infiniband switch, and you are just connecting nodes directly, you need to start up a SW subnet manager. Another intel guide allowed me to start up the subnet manager.
ibstat showed that my
I tried a few tests,
ib_send_bw to check the performance between two nodes and found that my system was working as expected.
Also, to setup the subnet manager to startup at boot execute the following command
update-rc.d opensmd defaults