1. ZFS: Performance and Capacity Impact of Ashift=9 on 4K Sector Drives

    Choosing between ashift=9 and ashift=12 for 4K sector drives is not always a clear cut case. You have to choose between raw performance or storage capacity.

    My testplatform is Debian Wheezy with ZFS on Linux. I'm using a system with 24 x 4 TB drives in a RAIDZ3. The drives have a native sector size of 4K, and the array is formatted with ashift=12.

    First we create the array like this:

    zpool create storage -o ashift=12 raidz3 /dev/sd[abcdefghijklmnopqrstuvwx]
    

    Note: NEVER use /dev/sd? drive names for an array, this is just for testing, always use /dev/disk/by-id/ names.

    Then we run a simple sequential transfer benchmark with dd:

    root@nano:/storage# dd if=/dev/zero of=ashift12.bin bs=1M count=100000 
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 66.4922 s, 1.6 GB/s
    root@nano:/storage# dd if=ashift12.bin of=/dev/null bs=1M
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 42.0371 s, 2.5 GB/s
    

    This is quite impressive. With these speeds, you can saturate 10Gbe ethernet. But how much storage space do we get?

    df -h:

    Filesystem                            Size  Used Avail Use% Mounted on
    storage                                69T  512K   69T   1% /storage
    

    zfs list:

    NAME      USED  AVAIL  REFER  MOUNTPOINT
    storage  1.66M  68.4T   435K  /storage
    

    Only 68.4 TiB of storage? That's not good. There should be 24 drives minus 3 for parity is 21 x 3.6 TiB = 75 TiB of storage.

    So the performance is great, but somehow, we lost about 6 TiB of storage, more than a whole drive.

    So what happens if you create the same array with ashift=9?

    zpool create storage -o ashift=9 raidz3 /dev/sd[abcdefghijklmnopqrstuvwx]
    

    These are the benchmarks:

    root@nano:/storage# dd if=/dev/zero of=ashift9.bin bs=1M count=100000 
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 97.4231 s, 1.1 GB/s
    root@nano:/storage# dd if=ashift9.bin of=/dev/null bs=1M
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 42.3805 s, 2.5 GB/s
    

    So we lose about a third of our write performance, but the read performance is not affected, probably by read-ahead caching but I'm not sure.

    With ashift=9, we do lose some write performance, but we can still saturate 10Gbe.

    Now look what happens to the available storage capacity:

    df -h:

    Filesystem                         Size  Used Avail Use% Mounted on
    storage                             74T   98G   74T   1% /storage
    

    zfs list:

    NAME      USED  AVAIL  REFER  MOUNTPOINT
    storage   271K  73.9T  89.8K  /storage
    

    Now we have a capacity of 74 TiB, so we just gained 5 TiB with ashift=9 over ashift=12, at the cost of some write performance.

    So if you really care about sequential write performance, ashift=12 is the better option. If storage capacity is more important, ashift=9 seems to be the best solution for 4K drives.

    The performance of ashift=9 on 4K drives is always described as 'horrible' but I think it's best to run your own benchmarks and decide for yourself.

    Caveat: I'm quite sure about the benchmark performance. I'm not 100% sure how reliable the reported free space is according to df -h or zfs list.

    Edit: I have added a bit of my own opinion on the results.

    Tagged as : ZFS Linux
  2. Achieving 2.3 GB/s With 16 X 4 TB Drives

    I'm in the process of building a new storage server to replace my 18 TB NAS.

    The server is almost finished, it's now down to adding disk drives. I'm using the HGST 4 TB 7200 RPM drive for this build (SKU 0S03356) (review).

    I have not bought all drives at once, but slowly adding them in smaller quantities. I just don't want to feel too much pain in my wallet at once I guess.

    According to my own tests, this drive has a read/write throughput of 160 MB/s, which is in tune with it's specification.

    So the theoretical performance of a RAID 0 with 16 drives x 160 MB/s = 2560 MB/s. That's over 2.5 gigabytes per second.

    This is the actual real-life performance I was able to achieve.

    root@nano:/storage# dd if=pureawesomeness.dd of=/dev/null bs=1M
    1000000+0 records in
    1000000+0 records out
    1048576000000 bytes (1.0 TB) copied, 453.155 s, 2.3 GB/s
    

    Personally, 2.3 GB/s is not too shabby in my opinion. Please note that I used a test file of one terabyte, so the 16 GB of RAM my server has, doesn't skew the result.

    This result is very nice, but in practice almost useless. I can saturate dual 10 Gbit NICs with this system, but I don't have that kind of equipment or any other device that could handle such performance.

    But I think it's amazing anyway.

    I'm quite curious how the final 24 drive array will perform in a RAID 0.

    Tagged as : Storage
  3. Affordable Server With Server-Grade Hardware Part II

    If you want to build a home server, it may be advised to actually use server-grade components. I documented the reasons for choosing server-grade hardware already in an earlier post on this topic.

    It is recommended to read the old post first. In this new post, I only show new hardware that could also be chosen as a more modern hardware option.

    My original post dates back to December 2013 and centers around the popular X9SCM-F which is based on the LGA 1155 socket. Please note that the X9SCM-F / LGA 1155 based solution may be cheaper if you want the Xeon processor.

    So I'd like to introduce two Supermicro motherboards that may be of interest.

    Supermicro X10SLL-F Supermicro X10SLL-F

    Some key features are:

    • 2 x Gigabit NIC on-board
    • 6 onboard SATA ports
    • 3 x PCIe (2 x 8x + 1 x 4x)
    • Costs $169 or €160

    This board is one of the cheapest Supermicro boards you can get and it has 3 x PCI-e, which may be of interest if you need to install extra HBA's or RAID cards, SAS expanders and/or network controllers.

    Supermicro X10SL7-F Supermicro X10SL7-F

    This board is about $80 or €90 more expensive than the X10SLL-F but in return, you get eight extra SAS/SATA ports, for a total of 14 SATA ports. With 4 TB drives, this would give you 56 TB of raw storage capacity. This motherboard provides a cheaper solution than an add-on HBA card, which would occupy a PCIe slot. Hoever, the's a caveat: this board has 'only' two PCIe slots. But there's still room for an additional quad-port or 10 Gbe NIC and an extra HBA if required.

    • 2 x Gigabit NIC on-board
    • 6 onboard SATA ports
    • 8 onboard SAS/SATA ports via LSI 2308 chip
    • 2 x PCIe (8x and 4x)
    • Costs $242 or €250

    Overview of CPU's

    CPUPassmark scorePrice in EuroPrice in Dollars
    Intel Pentium G3420 @ 3.20GHz345955 Euro74 Dollar
    Intel Core i3-4130 @ 3.40GHz482794 Euro124 Dollar
    Intel Xeon E3-1230 V3 @ 3.30GHz9459216Euro279 Dollar

    • Dollars are from Newegg, Euro's are from Tweakers.net.
    • Euros are including taxes.
    Tagged as : Supermicro Intel ECC
  4. How to Resolve Extreme Memory Usage on Windows 2008 R2-Based File Servers

    I'm responsible for a file server with about 5 terrabytes of data. The file server is based on Windows 2008 R2. I've noticed extreme memory usage on the server. After a reboot, it slowly builds up until almost all RAM memory is consumed.

    So I googled around and found this post and it turned out I had the same exact issue.

    I've confirmed with the tool 'RAMmap' that NTFS metadata is the issue. Microsoft also created a blog post about this.

    The author of the first article resolved the issue by adding more RAM memory. But with 16 GB already assigned, I was not to happy to add more memory to the virtual file server, eating away on the RAM resources of our virtualisation platform.

    I could never find a root cause of the issue. In that case, you need to obtain the 'Microsoft Windows Dynamic Cache Service'. This application allows you to configure how large the medata caching may grow.

    Please note that this services is not a next-next-finish installation. Follow the included Word document with instructions carefully and configure a sane memory setting for your server. I limited the cache to half the RAM available to the server and this works out well.

    Tagged as : Windows file server
  5. My Experiences With DFS Replication on Windows 2008 R2

    If you are considering implementing DFS replication, consider using Windows 2012 R2 because DFS replication has been massively improved. It supports larger data sets and performance has dramatically been improved over Windows 2008 R2.

    I've implemented DFS replication to keep two file servers synchronised. Click here if or there you want to learn more about DFS itself.

    With DFS, I wanted to create a high-available file server service, based on two file servers, each with their own physical storage. DFS replication makes sure that both file servers are kept in sync.

    If you setup DFS, you need to copy all the data from the original server to the secondary server. This is called seeding and I've used robocopy as recommended by Microsoft in the linked article.

    Seeding is not mandatory. You can just start with an empty folder on the secondary server and just have DFS replicate all files. I've experienced myself that DFS replication can be extremely slow on Windows 2008 R2.

    Once all files are seeded and DFS is configured, the initial replication can still takes days. Replication times are based on:

    1. the number of files
    2. the size of the data
    3. the performance of the disk subsystems of both source and destination

    Note: windows 2012 R2 improves DFS replication dramatically, only more reason to upgrade your file servers to 2012 R2 or higher.

    If you seed the files, DFS will not transfer files if they are identical, thus this saves bandwidth and time. DFS checks if files differ based on their hash. So even if you seed all data, the initial replication can take a while.

    On our virtualised platform, the initial replication of 2.5 GB of data consisting of about five million files took about a full week. To me, that is not a very desirable outcome, but once the initial replication is done, there is no performance issue and all changes are nearly instantly replicated to the secondary server.

    For the particular configuration I've setup, the performance storage subsystem could contribute to the slow initial replication.

    To speed up the replication process, it's important that you install the latest version of robocopy for Windows 2008 R2 on both systems. There is a bug in older versions of robocopy that do not properly set permissions on files. This results in file hash mismatches, causing DFS to replicate all files, nullifying the benefit of seeding.

    Hotfixes for Windows 2008 R2: Hotfixes for Windows 2012 R2:

    To verify if a file on both servers has identical hashes, follow these instructions

    If you've checked a few files and assured that the hashes are identical, it's ok to configure DFS replication. If you see a lot of Event ID 4412 messages in the DFS Replication event log, there probably is an issue with the file hashes.

Page 1 / 34