1. How Traffic Shaping Can Dramatically Improve Internet Responsiveness

    At work, access to the internet is provided by a 10 Mbit down / 1 Mbit up ADSL-connection. As we are a mid-size company, bandwidth is clearly a severe constraint. But it was not our biggest problem. Even simple web-browsing was very slow.

    As I was setting up a monitoring environment based on Nagios and pnp4nagios, I started to graph the latency of our internet connection, just to prove that we have a problem.

    Boy did we get that proof:

    bad latency

    Just look at the y-axis, which scale is in milliseconds. For most of the day, the average latency is 175 ms, with some high spikes. Just browsing the web was a pain during times of high-latency, which was clearly almost all of the time.

    I became so fed up with our slow internet access that I decided to take matters in my own hands and resolve the low latency issue. The solution? Traffic shaping.

    I learned that as ADSL-connections are saturated, especially their upload capacity, you will experience high latency and packet loss. So the trick is to never saturate the connection.

    I grabbed a Linux box with two network interfaces and placed it between our internet router and our firewall in bridge mode.

    For actual traffic shaping I used wondershaper which is part of Debian or Ubuntu.

    apt-get install wondershaper
    

    The wondershaper script is extremely simple, it's specifically build to resolve the problem we face with our ADSL connection. It not only prioritises traffic, it allows you to limit bandwidth usage and thus prevent you from saturating the connection.

    This simple example limits bandwidth a bit below full capacity, which dramatically improved latency.

    Syntax:

    wondershaper <interface> <rx> <tx>
    

    Example:

    wondershaper eth1 9500 700
    

    As you can see, latency improved dramatically:

    good latency

    Again, look at the y-axis. We went from an average latency of 175 ms to an average of 35 ms. That's quite an improvement.

    Can you spot on which day I implemented traffic shaping?

    week latency

    At the time of writing this blog post, the company is working on fiber internet access, resolving our internet woes, but it will take quite some time before that will be installed, so this is a nice intermediate solution.

    Tagged as : traffic shaping
  2. Monitoring HP MSA P2000 G3 I/O Latency With Nagios

    At work we have a couple of HP MSA P2000 G3 SANs. These are entry-level SANs that still seem to have almost all features you might want from a SAN, except for official SSD-support.

    It seems that the new MSA 2040 adds support for SSDs and also provides 4 GB cache per controller instead of the somewhat meager 2GB of the P2000.

    Anyway, a very nice feature of the MSA P2000 G3 is the fact that the management interface also provides a well-documented API that allows you to collect detailed stats on subjects like:

    • Overall enclosure status
    • Reports on failed drives
    • Controller CPU usage
    • IOPs per controller
    • IOPs per vdisk
    • IOPs per disk

    Thomas Weaver has written a Nagios plugin that does that: it collects this information and in turn you can graph it with pnp4nagios.

    In more recent firmware updates HP has added support to monitor read and write I/O latency per vdisk. Latency is an important indicator for application-level performance so I was quite happy with that.

    As the plugin by Thomas did not support reading these parameters yet, I spend some time implementing this check and submitted this new version of check_p2000_api.php back to Thomas.

    Read Latency of a RAID 6 array

    readlatency

    Write Latency of a RAID 6 array

    readlatency

    You will notice that the write latency of this disk array is very high at times, which seem to indicate that this vdisk is taxed too much with too many I/O-requests.

    I'd like to thank Thomas Weaver for writing this plugin, I think it's very useful.

    Tagged as : Storage Nagios
  3. Creating a Basic ZFS File System on Linux

    Here are some notes on creating a basic ZFS file system on Linux, using ZFS on Linux.

    I'm documenting the scenario where I just want to create a file system that can tollerate at least a single drive failure and can be shared over NFS.

    Identify the drives you want to use for the ZFS pool

    The ZFS on Linux project advices not to use plain /dev/sdx (/dev/sda, etc.) devices but to use /dev/disk/by-id/ or /dev/disk/by-path device names.

    Device names for storage devices are not fixed, so /dev/sdx devices may not always point to the same disk device. I've been bitten by this when first experimenting with ZFS, because I did not follow this advice and then could not access my zpool after a reboot because I removed a drive from the system.

    So you should pick the appropriate device from the /dev/disk/by-[id|path] folder. However, it's often difficult to determine which device in those folders corresponds to an actual disk drive.

    So I wrote a simple tool called showdisks which helps you identify which identifiers you need to use to create your ZFS pool.

    diskbypath

    You can install showdisks yourself by cloning the project:

    git clone https://github.com/louwrentius/showtools.git
    

    And then just use showdisks like

    ./showdisks -sp  (-s (size) and -p (by-path) )
    

    For this example, I'd like to use all the 500 GB disk drives for a six-drive RAIDZ1 vdev. Based on the information from showdisks, this is the command to create the vdev:

    zfs create tank raidz1 pci-0000:03:00.0-scsi-0:0:21:0 pci-0000:03:00.0-scsi-0:0:19:0 pci-0000:02:00.0-scsi-0:0:9:0 pci-0000:02:00.0-scsi-0:0:11:0 pci-0000:03:00.0-scsi-0:0:22:0 pci-0000:03:00.0-scsi-0:0:18:0
    

    The 'tank' name can be anything you want, it's just a name for the pool.

    Please note that with newer bigger disk drives, you should test if the ashift=12 option gives you better performance.

    zfs create -o ashift=12 tank raidz1 <devices>
    

    I used this option on 2TB disk drives and the performance and the read performance improved twofold.

    How to setup a RAID10 style pool

    This is how to create the ZFS equivalent of a RAID10 setup:

    zfs create tank mirror <device 1> <device 2> mirror <device 3> <device 4> mirror <device 5> <device 6>
    

    How many drives should I use in a vdev

    I've learned to use a 'power of two' (2,4,8,16) of drives for a vdev, plus the appropriate number of drives for the parity. RAIDZ1 = 1 disk, RAIDZ2 = 2 disks, etc.

    So the optimal number of drives for RAIDZ1 would be 3,5,9,17. RAIDZ2 would be 4,6,10,18 and so on. Clearly in the example above with six drives in a RAIDZ1 configuration, I'm violating this rule of thumb.

    How to disable the ZIL or disable sync writes

    You can expect bad throughput performance if you want to use the ZIL / honour synchronous writes. For safety reasons, ZFS does honour sync writes by default, it's an important feature of ZFS to guarantee data integrity. For storage of virtual machines or databases, you should not turn of the ZIL, but use an SSD for the SLOG to get performance to acceptable levels.

    For a simple (home) NAS box, the ZIL is not so important and can quite safely be disabled, as long as you have your servers on a UPS and have it cleanly shutdown when the UPS battery runs out.

    This is how you turn of the ZIL / support for synchronous writes:

    zfs set sync=disabled <pool name>
    

    Disabling sync writes is especially important if you use NFS which issues sync writes by default.

    Example:

    zfs set sync=disabled tank
    

    How to add an L2ARC cache device

    Use showdisks to lookup the actual /dev/disk/by-path identifier and add it like this:

    zpool add tank cache <device>
    

    Example:

    zpool add tank cache pci-0000:00:1f.2-scsi-2:0:0:0
    

    This is the result (on another zpool called 'server'):

    root@server:~# zpool status
      pool: server
     state: ONLINE
      scan: none requested
    config:
    
        NAME                               STATE     READ WRITE CKSUM
        server                             ONLINE       0     0     0
          raidz1-0                         ONLINE       0     0     0
            pci-0000:03:04.0-scsi-0:0:0:0  ONLINE       0     0     0
            pci-0000:03:04.0-scsi-0:0:1:0  ONLINE       0     0     0
            pci-0000:03:04.0-scsi-0:0:2:0  ONLINE       0     0     0
            pci-0000:03:04.0-scsi-0:0:3:0  ONLINE       0     0     0
            pci-0000:03:04.0-scsi-0:0:4:0  ONLINE       0     0     0
            pci-0000:03:04.0-scsi-0:0:5:0  ONLINE       0     0     0
        cache
          pci-0000:00:1f.2-scsi-2:0:0:0    ONLINE       0     0     0
    

    How to monitor performance / I/O statistics

    One time sample:

    zpool iostat
    

    A sample every 2 seconds:

        zpool iostat 2
    

    More detailed information every 5 seconds:

        zpool iostat -v 5
    

    Example output:

                                          capacity     operations    bandwidth
    pool                               alloc   free   read  write   read  write
    ---------------------------------  -----  -----  -----  -----  -----  -----
    server                             3.54T  7.33T      4    577   470K  68.1M
      raidz1                           3.54T  7.33T      4    577   470K  68.1M
        pci-0000:03:04.0-scsi-0:0:0:0      -      -      1    143  92.7K  14.2M
        pci-0000:03:04.0-scsi-0:0:1:0      -      -      1    142  91.1K  14.2M
        pci-0000:03:04.0-scsi-0:0:2:0      -      -      1    143  92.8K  14.2M
        pci-0000:03:04.0-scsi-0:0:3:0      -      -      1    142  91.0K  14.2M
        pci-0000:03:04.0-scsi-0:0:4:0      -      -      1    143  92.5K  14.2M
        pci-0000:03:04.0-scsi-0:0:5:0      -      -      1    142  90.8K  14.2M
    cache                                  -      -      -      -      -      -
      pci-0000:00:1f.2-scsi-2:0:0:0    55.9G     8M      0     70    349  8.69M
    ---------------------------------  -----  -----  -----  -----  -----  -----
    

    How to start / stop a scrub

    Start:

    zfs scrub <pool>
    

    Stop:

    zfs scrub -s <pool>
    

    Mount ZFS file systems on boot

    Edit /etc/defaults/zfs and set this parameter:

    ZFS_MOUNT='yes'
    

    How to enable sharing a file system over NFS:

    zfs set sharenfs=on <poolname>
    

    Links to important ZFS information sources:

    Tons of information on using ZFS on Linux by Aaron Toponce:

    https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux/

    Understanding the ZIL (ZFS Intent Log)

    http://nex7.blogspot.nl/2013/04/zfs-intent-log.html

    Information about 4K sector alignment problems

    http://www.opendevs.org/ritk/zfs-4k-aligned-space-overhead.html

    Important read about using the proper number of drives in a vdev

    http://forums.freenas.org/threads/getting-the-most-out-of-zfs-pools.16/

    Tagged as : ZFS
  4. Why You Should Not Use IPsec for VPN Connectivity

    IPsec is a well-known and widely-used VPN solution. It seems that it's not widely known that Niels Ferguson and Bruce Schneier performed a detailed security analysis of IPsec and that the results were not very positive.

    We strongly discourage the use of IPsec in its current form for protection of any kind of valuable information, and hope that future iterations of the design will be improved.
    

    I conveniently left out the second part:

    However, we even more strongly discourage any current alterantives, and recommend IPsec when the alternative is an insecure network. Such are the realities of the world.
    

    To put this in context: keep in mind that this paper was released in 2003 and the actual research may even be older (1999!). OpenVPN, an open-source SSL-based VPN solution was born in 2001 and was still maturing in 2003. So there actually was no real alternative back then.

    It worries me that this research done by Ferguson and Schneier is more than a decade old. I've been looking for more recent articles on the current security status of IPsec, but I couldn't find much. There have been some new RFCs been published about IPsec but I'm not familiar enough with the material to understand the implications. They make a lot of recommendations in the paper to improve IPsec security, but are they actually implemented?

    I did find a presentation from 2013 by Peter Gutmann (University of Auckland). Based on his Wikipedia page, he seems to 'have some knowledge' about cryptography. The paper adresses the Snowden leaks about the NSA and also touches on IPsec. He basically relies on the paper written by Ferguson and Schneier.

    But let's think about this: Ferguson and Schneier criticises the design of IPsec. It is flawed by design. That's one of the worst criticisms any thing related to cryptography can get. That design has probably not changed much, from what I understand. So if their critique on IPsec is still mostly valid, all the more reason not to use IPsec.

    So this is part of the conclusion and it doesn't beat around the bush:

    We have found serious security weaknesses in all major components of IPsec.
    As always in security, there is no prize for getting 90% right; you have to get
    everything right. IPsec falls well short of that target, and will require some major
    changes before it can possibly provide a good level of security.
    What worries us more than the weaknesses we have identied is the complexity
    of the system. In our opinion, current evaluation methods cannot handle
    systems of such a high complexity, and current implementation methods are not
    capable of creating a secure implementation of a system as complex as this.
    

    So if not IPsec, what should you use? I would opt to use an SSL/TLS-based VPN solution like OpenVPN.

    I can't vouch for the security for OpenVPN, but a well-known Dutch security firm Fox-IT has released a stripped-down version of the OpenVPN software (removed features) that they consider fit for (Dutch) governmental use. Not to say that you should use that particular OpenVPN version: the point is that OpenVPN is deemed secure enough to be used for governmental usage. For whatever that's worth.

    At least, SSL-based VPN solutions have the benefit that they use SSL/TLS, which may have it's own problems, but is at least not as complex as IPsec.

    Tagged as : IPsec security
  5. Achieving 450 MB/s Network File Transfers Using Linux Bonding

    Linux Bonding

    In this article I'd like to show the results of using regular 1 Gigabit network connections to achieve 450 MB/s file transfers over NFS.

    I'm again using Linux interface bonding for this purpose.

    Linux interface bonding can be used to create a virtual network interface with the aggregate bandwidth of all the interfaces added to the bond. Two gigabit network interfaces will give you - guess what - two gigabit or ~220 MB/s of bandwidth.

    This bandwidth can be used by a single TCP-connection.

    So how is this achieved? The Linux bonding kernel module has special bonding mode: mode 0 or round-robin bonding. In this mode, the kernel will stripe packets across the interfaces in the 'bond' like RAID 0 with hard drives. As with RAID 0, you get additional performance with each device you add.

    So I've added HP NC364T quad-port network cards to my servers. Thus each server has a theoretical bandwidth of 4 Gigabit. These HP network cards cost just 145 Euro and I even found the card for 105 Dollar on Amazon.

    hp quad port nic

    With two servers, you just need four UTP cable's to connect the two network interfaces and you're done. This would cost you ~300 Euro or ~200 dollar in total.

    If you want to connect additional servers, you need a managed gigabit switch with VLAN-support and sufficient ports. Each additional server will use 4 ports on the switch, excluding interfaces for remote access and other purposes.

    Managed gigabit switches are quite inexpensive these days. I bought a 24 port switch: HP 1810-24G v2 (J9803A) for about 180 euros (209 Dollars on Newegg) and it can even be rack-mounted.

    switch

    Using VLANS

    So you can't just use a gigabit switch and connect all these quad-port network cards to a single VLAN. I tested this scenario first and only got a maximum transfer speed of 270 MB/s while copying a file between servers over NFS.

    The trick is to create a separate VLAN for every network port. So if you use a quad-port network card, you need four VLANs. Also, you must make sure that every port on every network card is in the same VLAN. For example, port 1 on every card needs to be in VLAN 21, port 2 in VLAN 22, and so on. You also must add the appropriate switch port to the correct VLAN. Last, you must add the network interfaces to the bond in the right order.

    So why do you need to use VLANs? The reason is quite simple. Bonding works by spoofing the same hardware or MAC-address on all interfaces. So the switch sees the same hardware address on four ports, and thus gets confused. To which port should the packet be sent? If you put each port in it's own VLAN, the 'spoofed' MAC-address is seen only once in each VLAN. So the switch won't be confused. What you are in fact doing by creating VLANs is creating four separate switches. So if you have - for example - four cheap 8-port unmanaged gigabit switches, this would work too.

    So assuming that you have four ethernet interfaces, this is an example of how you can create the bond:

    ifenslave bond0 eth1 eth2 eth3 eth4
    

    Next, you just assign an IP-address to the bond0 interface, like you would with a regular eth(x) interface.

    ifconfig bond0 192.168.2.10 netmask 255.255.255.0
    

    Up to this point, I only achieved about 350 MB/s. I needed to enable jumbo frames on all interfaces and the switch to achieve 450 MB/s.

    ifconfig bond0 mtu 9000 up
    

    Next, you can just mount any NFS share over the interface and start copying files. That's all.

    Once I added each interface to the appropriate VLAN, I got about 450 MB/s for a single file copy with 'cp' over NFS.

    450 MB/s

    I did not perform a 'cp' but a 'dd' because I don't have a disk array fast enough (yet) that can write at 450 MB/s.

    So for three servers, this solution will cost me 600 Euro or 520 Dollar.

    10 Gigabit ethernet?

    Frankly, it's quite expensive if you need to connect more than two servers. An entry-level 10Gbe NIC like the Intel X540-T1 does about 300 Euros or 450 Dollar. This card allows you to use Cat 6e UTP Cabling. (Pricing from Dutch Webshops in Euros and Newegg in Dollars).

    With two of those, you can have 10Gbit ethernet between two servers for 600 Euro or 700 Dollar. If you need to connect more servers, you would need a switch. The problem is that 10Gbit switches are not cheap. An 8-port unmanaged switch from Netgear (ProSAFE Plus XS708E) does about 720 Euro's or 900 Dollar.

    If you want to connect three servers, you need three network cards and a switch. So three network cards and a switch will cost you 900 Euro (1050 Dollar) for the network cards and 720 Euro (900 Dollar) for the switch, totalling 1800 euro or 1950 Dollar.

    You will get higher transfer speeds, but at a significantly higher price.

    For many business purposes this higher price can be easily justified and I would select 10 Gb over 1 Gb bonding in a heart-beat. Less cables, higher performance, lower latency.

    However, bonding gigabit interfaces allows you to use off-the-shelf equipment and maybe a nice compromis between cost, usability and performance.

Page 1 / 33