1. ZFS: Performance and Capacity Impact of Ashift=9 on 4K Sector Drives

    July 31, 2014

    Update 2014-8-23: I was testing with ashift for my new NAS. The ashift=9 write performance deteriorated from 1.1 GB/s to 830 MB/s with just 16 TB of data on the pool. Also I noticed that resilvering was very slow. This is why I decided to abandon my 24 drive RAIDZ3 configuration.

    I'm aware that drives are faster at the outside of the platter and slower on the inside, but the performance deteriorated so dramatically that I did not wanted to continue further.

    My final setup will be a RAIDZ2 18 drive VDEV + RAIDZ2 6 drive VDEV which will give me 'only' 71 TiB of storage, but read performance is 2.6 GB/s and write performance is excellent at 1.9 GB/s. I've written about 40+ TiB to the array and after those 40 TiB, write performance was about 1.7 GB/s, so still very good and what I would expect as drives fill up.

    So actually, based on these results, I have learned not to deviate from the ZFS best practices too much. Use ashift=12 and put drives in VDEVS that adhere to the 2^n+parity rule.

    The uneven VDEVs (18 disk vs. 6 disks) are not according to best practice but ZFS is smart: it distributes data across the VDEVs based on their size. So they fill up equally.

    Choosing between ashift=9 and ashift=12 for 4K sector drives is not always a clear cut case. You have to choose between raw performance or storage capacity.

    My testplatform is Debian Wheezy with ZFS on Linux. I'm using a system with 24 x 4 TB drives in a RAIDZ3. The drives have a native sector size of 4K, and the array is formatted with ashift=12.

    First we create the array like this:

    zpool create storage -o ashift=12 raidz3 /dev/sd[abcdefghijklmnopqrstuvwx]

    Note: NEVER use /dev/sd? drive names for an array, this is just for testing, always use /dev/disk/by-id/ names.

    Then we run a simple sequential transfer benchmark with dd:

    root@nano:/storage# dd if=/dev/zero of=ashift12.bin bs=1M count=100000 
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 66.4922 s, 1.6 GB/s
    root@nano:/storage# dd if=ashift12.bin of=/dev/null bs=1M
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 42.0371 s, 2.5 GB/s

    This is quite impressive. With these speeds, you can saturate 10Gbe ethernet. But how much storage space do we get?

    df -h:

    Filesystem                            Size  Used Avail Use% Mounted on
    storage                                69T  512K   69T   1% /storage

    zfs list:

    storage  1.66M  68.4T   435K  /storage

    Only 68.4 TiB of storage? That's not good. There should be 24 drives minus 3 for parity is 21 x 3.6 TiB = 75 TiB of storage.

    So the performance is great, but somehow, we lost about 6 TiB of storage, more than a whole drive.

    So what happens if you create the same array with ashift=9?

    zpool create storage -o ashift=9 raidz3 /dev/sd[abcdefghijklmnopqrstuvwx]

    These are the benchmarks:

    root@nano:/storage# dd if=/dev/zero of=ashift9.bin bs=1M count=100000 
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 97.4231 s, 1.1 GB/s
    root@nano:/storage# dd if=ashift9.bin of=/dev/null bs=1M
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 42.3805 s, 2.5 GB/s

    So we lose about a third of our write performance, but the read performance is not affected, probably by read-ahead caching but I'm not sure.

    With ashift=9, we do lose some write performance, but we can still saturate 10Gbe.

    Now look what happens to the available storage capacity:

    df -h:

    Filesystem                         Size  Used Avail Use% Mounted on
    storage                             74T   98G   74T   1% /storage

    zfs list:

    storage   271K  73.9T  89.8K  /storage

    Now we have a capacity of 74 TiB, so we just gained 5 TiB with ashift=9 over ashift=12, at the cost of some write performance.

    So if you really care about sequential write performance, ashift=12 is the better option. If storage capacity is more important, ashift=9 seems to be the best solution for 4K drives.

    The performance of ashift=9 on 4K drives is always described as 'horrible' but I think it's best to run your own benchmarks and decide for yourself.

    Caveat: I'm quite sure about the benchmark performance. I'm not 100% sure how reliable the reported free space is according to df -h or zfs list.

    Edit: I have added a bit of my own opinion on the results.

    Tagged as : ZFS Linux
  2. Linux: Script That Creates Table of Network Interface Properties

    August 15, 2013

    My server has 5 network interfaces and I wanted a quick overview of some properties. There may be an existing linux command for this but I couldn't find it so I quickly wrote my own script (download).

    This is the output:


    The only requirement for this script is that you have 'ethtool' installed.

    Update 2013-08-17

    I recreated the script in python (download) so I can just dynamically format the table and not use ugly hacks I used in the bash script.

    Tagged as : Linux Networking
  3. Why Filtering DHCP Traffic Is Not Always Possible With Iptables

    December 27, 2010

    When configuring my new firewall using iptables, I noticed something very peculiar. Even if all input, forward and output traffic was dropped, DHCP traffic to and from my DHCP server was not blocked even if there were no rules permitting this traffic.

    I even flushed all rules, put a drop all rule on all chains and only allowed SSH to the box. It did not matter. The DHCP server received the DHCP requests and happily answered back.

    How on earth is this possible? In my opinion, a firewall should block all traffic no matter what.

    But at least I found out the cause of this peculiar behaviour. The ISC DHCP daemon does not use the TCP/UDP/IP stack of the kernel. It uses RAW sockets. Raw sockets bypass the whole netfilter mechanism and thus the firewall.

    So remember: applications using RAW sockets cannot be fire walled by default. Applications need root privileges to use RAW sockets, so RAW sockets thankfully cannot be used by arbitrary unprivileged users on a system, but never the less. Be aware of this issue.

    Please understand that if a serious security vulnerability is found in the ISC DHCP daemon, you cannot protect your daemon with a local firewall on your system. Patching or disabling would then be the only solution.

Page 1 / 4