To Infiniband and beyond (or not)
After going through a Fiber Channel disaster trying to get it to work as an iSCSI Target in Windows Server 2008 R2, Storage Server 2008 R2, OpenSuse, OpenSolaris, Nexenta, Solaris 11 I thought trying “Infiniband” would be worthwhile. Checking on eBay there are several cards in the thousands of dollars, but a couple were in the more readily available price of $40-$50 by Mellanox. In researching a bit more, Dell, HP and others simply rebadge these cards and call them their own (like many other card manufacturers for instance LSI Logic MegaRaid == Dell PERC series). Upon installing the first card in my Windows 7 x64 machine, it went pretty smoothly. Installed the drivers off of the Mellanox site and had 2 new adapters in my Network Adapter listing:
On my Solaris 11 Express SAN however, the drivers got installed, but it was not seeing the card. After a few hours of troubleshooting,I gave in and wiped my ZFS array and Solaris 11 installation in lieu of Storage Server 2008 R2. Like Windows 7,Storage Server was up and running with the card in no time. After running the subnet manager, OpenSM on my SAN to allow direct connection between 2 Infiniband cards, I was happy to see:
After having never even gotten this card with my Fiber Channel cards, I was pretty excited, which was immediately followed by disappointment. I ran the popular IPerf network benchmark over my existing gigabit connection:
About what I expected, near 1000 Mbits/sec minus the TCP/IP overhead. Now onto the Infiniband test:
Only around 2 Gbits/sec? Something wasn’t right. I setup a simple Windows Share on my SAN, just mounted the root drive and copied 7gb of data from over the Infiniband. To my disappointment I was only getting 99-100mb/sec, ie less than the max throughput of gigabit Ethernet. After some tweaking (namely making the MTU 65536), I was able to get 180-250mb/sec on larger files. Still not satisfied with that performance (about twice gigabit speeds) I wanted to see the bottleneck, was it simply my hard drives? Results of my RAID 0 array of 2 1TB Western Digital Black 64mb 7200rpm SATA II drives:
About what I expected, so the bottleneck when copying files to and from the SAN would limited by these 2 drives speed. What about accessing files straight off the network? Results of my RAID 0 array of 8 Seagate 7200.11 32mb 7200rpm SATA drives:
So when accessing files I shouldn’t be hindered like I am when writing between the 2 machines. So to test this theory I attempted to play a 14 second 1920x1080 uncompressed video clip of an upcoming trailer I am working on right off the network share. This is a 2.2gb video file at about 180mb/sec (2.2gb / 14 seconds), well outside the max for gigabit to handle, but well within 10 gigabit. To my astonishment, it was stuttering almost worse than over gigabit a few weeks ago that triggered my pursuit of something to alleviate my bandwidth issues in the first place. Thinking maybe the file was simply overloading some sort of caching that Samba does inside the Windows Network Protocol, I then tried a 30 second 1280x720 uncompressed video clip of an upcoming promo I am working on. This only comes out to around 67mb/sec (2gb / 30 second). This played smoothly. So it was not a caching issue, what was the issue? Not happy with a lack of an answer I started monitoring Task Manager during a transfer of files. This was taken on my SAN (a Dual Core 3.2ghz Athlon II X2 260 btw):
Thinking maybe it was simply a fluke, I then monitored it on my 6 Core 3.2ghz Phenom II X6 1090T:
Eating up nearly 3 cores on my Windows 7 machine, very weird. At this point I realized Infiniband cannot work for my needs. With Adobe After Effects and Premiere eating 100% of my CPUs in simply rendering, giving half of my CPU to accessing data isn’t acceptable I’d be doubling my rendering times for quicker previews when editing. My search continues for a solution to my network woes, off to eBay these cards go….