1. HP Proliant Microserver Gen10 as Router or NAS

    September 14, 2017

    Introduction

    In the summer of 2017, HP released the Proliant Microserver Gen10. This machine replaces the older Gen8 model.

    gen10

    For hobbyists, the Microserver always has been an interesting device for a custom home NAS build or as a router.

    Let's find out if this is still the case.

    Price

    In The Netherlands, the price of the entry-level model is similar to the Gen8: around €220 including taxes.

    CPU

    The new AMD X3216 processor has slightly better single threaded performance as compared to the older G1610t in the Gen8. Overall, both devices seem to have similar CPU performance.

    The biggest difference is the TDP: 35 Watt for the Celeron vs 15 Watt for the AMD CPU.

    Memory

    By default, it has 8 GB of unbuffered ECC memory, that's 4 GB more than the old model. Only one of the two memory slots is occupied, so you can double that amount just by adding another 8 GB stick. It seems that 32 GB is the maximum.

    Storage

    This machine has retained the four 3.5" drive slots. There are no drive brackets anymore. Before inserting a hard drive, you need to remove a bunch of screws from the front of the chassis and put four of them in the mounting holes of each drive. These screws then guide the drive through grooves into the drive slot. This caddy-less design works perfectly and the drive is mounted rock-solid in it's position.

    To pop a drive out, you have to press the appropriate blue lever, which latches on to one of the front screws mounted on your drive and pulls it out of the slot.

    There are two on-board sata controllers.

    00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 49)
    01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
    

    The Marvell controller is connected to the four drive bays. The AMD controller is probably connected to the fifth on-board SATA port.

    As with the Gen8, you need a floppy-power-connector-to-sata-power-connector cable if you want to use a SATA drive with the fifth onboard SATA port.

    Due to the internal SATA header or the USB2.0 header, you could decide to run the OS without redundancy and use all four drive bays for storage. As solid state drives tend to be very reliable, you may use a small SSD to keep the cost and power usage down and still retain reliability (although not the level of reliability RAID1 provides).

    Networking

    Just as the Gen8, the Gen10 has two Gigabit network cards. The brand and model is: Broadcom Limited NetXtreme BCM5720

    As tested with iperf3 I get full 1 Gbit network performance. No problems here (tested on CentOS 7).

    PCIe slots

    This model has two half-height PCIe slots (1x and 8x in a 4x and 8x physical slot) which is an improvement over the single PCIe slot in the Gen8.

    USB

    The USB configuration is similar to the Gen8, with both USB2 and USB3 ports and one internal USB2 header on the motherboard.

    Sidenote: the onboard micro SD card slot as found in the Gen8 is not present in the Gen10.

    Graphics

    The Gen10 has also a GPU build-in but I have not looked into it as I have no use for it.

    The Gen10 differs in output options as compared to the Gen8: it supports one VGA and two displayport connections. Those displayport connectors could make the Gen10 an interesting DIY HTPC build, but I have not looked into it.

    iLO

    The Gen10 has no support for iLO. So no remote management, unless you have an external KVM-over-IP solution.

    This is a downside, but for home users, this is probably not a big deal. My old Microserver N40L didn't have iLO and it never bothered me.

    And most of all: iLO is a small on-board mini-comuter that increases idle power consumption. So the lack of iLO support should mean better idle power consumption.

    Boot

    Both Legacy and UEFI boot is supported. I have not tried UEFI booting.

    Booting from the 5th internal SATA header is supported and works fine (as opposed to the Gen8).

    For those who care: booting is a lot quicker as opposed to the Gen8, which took ages to get through the BIOS.

    Power Usage

    I have updated this segment as I have used some incorrect information in the original article.

    The Gen10 seems to consume 14 Watt at idle, booted into Centos 7 without any disk drives attached (removed all drives after booting). This 14 Watt figure is reported by my external power meter.

    Adding a single old 7200 1 TB drive drives power usage up to 21 Watt (as expected).

    With four older 7200 RPM drives the entire system uses about 43 Watt according to the external power meter.

    As an experiment, I've put two old 60 GB 2.5" laptop drives in the first two slots, configured as RAID1. Then I added two 1 TB 7200 RPM drives to fill up the remaining slots. This resulted in a power usage of 32 Watt.

    Dimensions and exterior

    Exactly the same as the Gen8, they stack perfectly.

    The Gen8 had a front door protecting the drive bays connected to the chassis with two hinges. HP has been cheap on the Gen10, so when you open the door, it basically falls off, there's no hinge. It's not a big issue, the overall build quality of the Gen10 is excellent.

    I have no objective measurements of noise levels, but the device seems almost silent to me.

    Evaluation and conclusion

    At first, I was a bit disappointed about the lack of iLO, but it turned out for the best. What makes the Gen10 so interesting is the idle power consumption. The lack of iLO support probably contributes to the improved idle power consumption.

    The Gen8 measures between 30 and 35 Watt idle power consumption, so the Gen10 does fare much better (~18 Watt).

    Firewall/Router

    At this level of power consumption, the Gen10 could be a formidable router/firewall solution. The only real downside is it's size as compared to purpose-built firewalls/routers. The two network interfaces may provide sufficient network connectivity but if you need more ports and using VLANs is not enough, it's easy to add some extra ports.

    If an ancient N40L with a piss-poor Atom processor can handle a 500 Mbit internet connection, this device will have no problems with it, I'd presume. Once I've taken this device into production as a replacement for my existing router/firewall, I will share my experience.

    Storage / NAS

    The Gen8 and Gen10 both have four SATA drive bays and a fifth internal SATA header. From this perspective, nothing has changed. The reduced idle power consumption could make the Gen10 an even more attractive option for a DIY home grown NAS.

    All things considered I think the Gen10 is a great device and I have not really encountered any downsides. If you have no problems putting a bit of effort into a DIY solution, the Gen10 is a great platform for a NAS or Router/Firewall, that can compete with most purpose-build devices.

    I may update this article as I gain more experience with this device.

    Tagged as : Storage Networking
  2. Using InfiniBand for Cheap and Fast Point-To-Point Networking

    March 25, 2017

    InfiniBand networking is quite awesome. It's mainly used for two reasons:

    1. low latency
    2. high bandwidth

    As a home user, I'm mainly interested in setting up a high bandwidth link between two servers.

    I was using quad-port network cards with Linux Bonding, but this solution has some downsides:

    1. you can only go to 4 Gbit with Linux bonding (or you need more ports)
    2. you need a lot of cabling
    3. it is similar in price as InfiniBand

    So I've decided to take a gamble on some InfiniBand gear. You only need InfiniBand PCIe network cards and a cable.

    1 x SFF-8470 CX4 cable                                              $16
    2 x MELLANOX DUAL-PORT INFINIBAND HOST CHANNEL ADAPTER MHGA28-XTC   $25
                                                                Total:  $66
    

    view of installed infiniband card and cable

    I find $66 quite cheap for 20 Gbit networking. Regular 10Gbit Ethernet networking is often still more expensive that using older InfiniBand cards.

    InfiniBand is similar to Ethernet, you can run your own protocol over it (for lower latency) but you can use IP over InfiniBand. The InfiniBand card will just show up as a regular network device (one per port).

    ib0 Link encap:UNSPEC HWaddr 80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.0.2.3  Bcast:10.0.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c902:29:8e01/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:7988691 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17853128 errors:0 dropped:10 overruns:0 carrier:0
          collisions:0 txqueuelen:256 
          RX bytes:590717840 (563.3 MiB)  TX bytes:1074521257501 (1000.7 GiB)
    

    Configuration

    I've followed these instructions to get IP over InfiniBand working.

    Modules

    First, you need to assure the following modules are loaded at a minimum:

    ib_mthca
    ib_ipoib
    

    I only had to add the ib_ipoib module to /etc/modules. As soon as this module is loaded, you will notice you have some ibX interfaces available which can be configured like regular ethernet cards

    Subnet manager

    In addition to loading the modules, you also may need a subnet manager but this seems only relevant if you have an InfiniBand switch. Such switches either have a build-in subnet manager or you can just install and use 'opensm'

    Link status

    if you want you can check the link status of your InfiniBand connection like this:

    # ibstat
    CA 'mthca0'
        CA type: MT25208
        Number of ports: 2
        Firmware version: 5.3.0
        Hardware version: 20
        Node GUID: 0x0002c90200298e00
        System image GUID: 0x0002c90200298e03
        Port 1:
            State: Active
            Physical state: LinkUp
            Rate: 20
            Base lid: 1
            LMC: 0
            SM lid: 2
            Capability mask: 0x02510a68
            Port GUID: 0x0002c90200298e01
            Link layer: InfiniBand
        Port 2:
            State: Down
            Physical state: Polling
            Rate: 10
            Base lid: 0
            LMC: 0
            SM lid: 0
            Capability mask: 0x02510a68
            Port GUID: 0x0002c90200298e02
            Link layer: InfiniBand
    

    Set mode and MTU

    Since my systems run Debian Linux, I've configured /etc/network/interfaces like this:

    auto ib0
    iface ib0 inet static
        address 10.0.2.2
        netmask 255.255.255.0
        mtu 65520
        pre-up echo connected > /sys/class/net/ib0/mode
    

    Please take note of the 'mode' setting. The 'datagram' mode gave abysmal network performance (< Gigabit). The 'connected' mode made everything perform acceptable.

    The MTU setting of 65520 improved performance by another 30 percent.

    Performance

    I've tested the card on two systems based on the Supermicro X9SCM-F motherboard. Using these systems, I was able to achieve file transfer speeds up to 750 MB (Megabytes) per second or about 6.5 Gbit as measured with iperf.

    ~# iperf -c 10.0.2.2
    ------------------------------------------------------------
    Client connecting to 10.0.2.2, TCP port 5001
    TCP window size: 2.50 MByte (default)
    ------------------------------------------------------------
    [  3] local 10.0.2.3 port 40098 connected with 10.0.2.2 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  7.49 GBytes  6.43 Gbits/sec
    

    Similar test with netcat and dd:

    ~# dd if=/dev/zero bs=1M count=100000 | nc 10.0.2.2 1234
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 128.882 s, 814 MB/s
    

    Testing was done on Debian Jessie.

    During earlier testing, I've also used these cards in HP Micro Proliant G8 servers. On those servers, I was running Ubuntu 16.04 LTS.

    As tested on Ubuntu with the HP Microserver:

    ------------------------------------------------------------
    Client connecting to 10.0.4.3, TCP port 5001
    TCP window size: 4.00 MByte (default)
    ------------------------------------------------------------
    [  5] local 10.0.4.1 port 52572 connected with 10.0.4.3 port 5001
    [  4] local 10.0.4.1 port 5001 connected with 10.0.4.3 port 44124
    [ ID] Interval       Transfer     Bandwidth
    [  5]  0.0-60.0 sec  71.9 GBytes  10.3 Gbits/sec
    [  4]  0.0-60.0 sec  72.2 GBytes  10.3 Gbits/sec
    

    Using these systems, I was able eventually able to achieve 15 Gbit as measured with iperf, although I have no 'console screenshot' from it.

    Closing words

    IP over InfiniBand seems to be a nice way to get high-performance networking on the cheap. The main downside is that when using IP over IB, CPU usage will be high.

    Another thing I have not researched, but could be of interest is running NFS or other protocols directly over InfiniBand using RDMA, so you would bypass the overhead of IP.

  3. Tracking Down a Faulty Storage Array Controller With ZFS

    December 15, 2016

    One day, I lost two virtual machines on our DR environment after a storage vMotion.

    Further investigation uncovered that any storage vMotion of a virtual machine residing on our DR storage array would corrupt the virtual machine's disks.

    I could easily restore the affected virtual machines from backup and once that was done, continued my investigation.

    I needed a way to quickly verifying if a virtual hard drive of a virtual machine was corrupted after a storage vMotion to understand what the pattern was.

    First, I created a virtual machine based on Linux and installed ZFS. Then, I attached a second disk of about 50 gigabytes and formatted this drive with ZFS. Once I filled the drive using 'dd' to about 40 gigabytes I was ready to test.

    ZFS was chosen for testing purposes because it stores hashes of all blocks of data. This makes it very simple to quickly detect any data corruption. If the hash doesn't match the hash generated from the data, you just detected corruption.

    Other file systems don't store hashes and don't check for data corruption so they just trust the storage layer. It may take a while before you find out that data is corrupted.

    I performed a storage vMotion of this secondary disk towards different datastores and then ran a 'zfs scrub' to track down any corruption. This worked better than expected: the scrub command would hang if the drive was corrupted by the storage vMotion. The test virtual machine required a reboot and a reformat of the secondary hard drive with ZFS as the previous file system, including data got corrupted.

    After performing a storage vMotion on the drive in different directions, from different datastores to other datastores slowly a pattern emerged.

    1. Storage vMotion corruption happened independent of the VMware ESXi host used.

    2. a Storage vMotion never caused any issues when the disk was residing on our production storage array.

    3. the corruption only happened when the virtual machine was stored on particular datastores on our DR storage array.

    Now it got really 'interesting'. The thing is that our DR storage array has two separate storage controllers running in active-active mode. However, the LUNs are always owned by a particular controller. Although the other controller can take over from the controller who 'owns' the LUNs in case of a failure, the owner will process the I/O when everything is fine. Particular LUNs are thus handled by a particular controller.

    So first I made a table where I listed the controllers and the LUNs it had ownership over, like this:

                Owner       
    Controller      a               b
                LUN001          LUN002
                LUN003          LUN004
                LUN005          LUN006
    

    Then I started to perform Storage vMotions of the ZFS disk from one LUN to the other. After performing several test, the pattern became quite obvious.

                LUN001  ->  LUN002  =   BAD
                LUN001  ->  LUN004  =   BAD
                LUN004  ->  LUN003  =   BAD
                LUN003  ->  LUN005  =   GOOD
                LUN005  ->  LUN001  =   GOOD
    

    I continued to test some additional permutations but it became clear that only LUNs owned by controller b caused problems.

    With the evidence in hand, I managed to convince our vendor support to replace storage controller b and that indeed resolved the problem. Data corruption due to a Storage vMotion never occurred after the controller was replaced.

    There is no need to name/shame the vendor in this regard. The thing is that all equipment can fail and what can happen will happen. What really counts is: are you prepared?

    Tagged as : ZFS

Page 1 / 63