Articles in the Storage category

  1. Scrub Your NAS Hard Drives Regularly if You Care About Your Data

    Wed 22 April 2020

    Introduction

    Lots of people run a NAS at home. Maybe it's a COTS device from one of the well-known vendors1, or it's a custom build solution (DIY2) based on hardware you bought and assembled yourself.

    Buying or building a NAS is one thing, but operating it in a way that assures that you won't lose data is something else.

    Obviously, the best way to protect against dataloss, is to make regular backups. So ideally, even if the NAS would go up in flames, you would still have your data.

    Since backup storage costs money, people make tradeoffs. They may decide to take the risk and only backup a small portion of the really important data and take their chances with the rest.

    Well that is their own right. But still, it would be nice if we would reduce the risk of dataloss to a minimum.

    The risk: bad sectors

    The problem is that hard drives may develop bad sectors over time. Bad sectors are tiny portions of the drive that have become unreadable3. How small a sector may be, if any data is stored in them, it is now lost and this could cause data corruption (one or more corrupt files).

    This is the thing: those bad sectors may never be discovered until it is too late!

    With todays 14+ TB hard drives, it's easy to store vast amounts of data. Most of that data is probably not frequently accessed, especially at home.

    One or more of your hard drives may be developing bad sectors and you wouldn't even know it. How would you?

    Your data might be at risk right at this moment while you are reading this article.

    A well-known disaster scenario in which people tend to lose data is double hard drive failure where only one drive faillure can be tolerated (RAID 1 (mirror) or RAID 5, and in some scenario's RAID 10).

    In this scenario, a hard drive in their RAID array has failed and a second drive (one of the remaining good drives) has developed bad sectors. That means effectively a second drive has failed although the drive may still seem operational. Due to the bad sectors, data required to rebuild the array is lost because there is no longer any redundancy4.

    If you run a (variant of) RAID 5, you can only lose a single disk, so if a second disk fails, you lose all data5.

    The mitigation: periodic scrubbing / checking of your disks

    The only way to find out if a disk has developed bad sectors is to just read them all. Yes: all the sectors.

    Checking your hard drives for bad sectors (or other issues) is called 'data scrubbing'. If you bought a NAS from QNAP, Synology or another vendor, there is a menu which allows you to control how often and when you want to perform a data scrub.

    RAID solutions are perfectly capable of handling bad sectors. For a RAID array, it's just equivalent to a failed drive and an affected drive will be kicked out of the RAID array if bad sectors start causing read errors. The big issue we want to prevent is that multiple drives start to develop bad sectors at the same time, because that is the equivalent of multiple simultaneous drive failures, which many RAID arrays can't recover from.

    For home users I would recommend checking all hard drives once a month. I would recommend configuring the data scrub to run at night (often the default) because a scrub may impact performance in a way that can be noticeable and even inconvenient.

    Your vendor may have already configured a default schedule for data scrubs, so you may have been protected all along. If you take a look, at least you know.

    People who have built a DIY NAS have to setup and configure periodic scrubs themselves or they won't happen at all. However, that's not entirely true: I've noticed that on Ubuntu, all Linux software RAID arrays (MDADM) are checked once a month at night. So if you use Linux software RAID you may already be scrubbing.

    A drive that develops bad sectors should be replaced as soon as possible. It should no longer be trusted. The goal of scrubbing is to identify these drives as soon as possible. You don't want to get in a position that multiple drives have started developing bad sectors. You can only prevent that risk by scanning for bad sectors periodically and replacing bad drives.

    You should not be afraid about having to spend a ton of money replacing drives all the time. Bad sectors are not that common. But they are a common enough that you should check for them. There is a reason why NAS vendors offer the option to run data scrubs and recommend them6.

    You probably forgot to configure email alerting

    If a disk in your NAS would fail, how would you know? If the scrub would discover bad sectors, would you ever notice7?

    The answer may be: only when it's too late. Maybe a drive already failed and you haven't even noticed yet!

    When you've finished reading this article, it may be the right moment to take some time to check the status of your NAS and configure email alerting (or any other alerting mechanism that works for you). Make your NAS sends out a test message just to confirm it actually works!

    Closing words

    So I would like to advice you to do two things:

    1. Make sure your NAS runs a data scrub once a month
    2. Make sure your NAS is able to email alerts about failed disks or scrubs.

    These actions allow you to fix problems before they become catastrophic.

    P.S. S.M.A.R.T. monitoring

    Hard drives have a build-in monitoring system called S.M.A.R.T.

    If you have a NAS from one of the NAS vendors, they will allert on SMART monitoring information that would indicate that a drive is failing. DIY builders may have to spend time setting up this kind of monitoring manually.

    For more information about SMART I would recommend [this][this article] and this one.

    this article and also this one

    Linux users can take a look at the SMART status of their hard drives with this tool (which I made).


    1. QNAP, Synology, Netgear, Buffalo, Thecus, Western Digital, and so on. 

    2. FreeNAS, Unraid, Windows/Linux with Snapraid, OpenMediaVault, or a custom solution, and so on. 

    3. Bad sectors cause 'unrecoverable read errors' or UREs. Bad sectors have nothing to do with 'silent data corruption'. There's nothing silent about unrecoverable read errors. Hard drives report read errors back to the operating system, they won't go unnoticed. 

    4. A DIY NAS based on ZFS (FreeNAS is based on ZFS) may help mitigate the impact of such an event. ZFS can continue reading data from the remaining drives, even if bad sectors are encountered. Some files will be corrupted, but most of the data would still be readable. I think this capability is by itself not enough reason to pick a NAS based on ZFS because ZFS also has a cost involved that you need to accept too. For my large NAS I have chosen ZFS because I was prepared to 'pay the cost'. 

    5. Some people may chose to go with RAID 6 which tolerates two simultaneous drive failures but they also tend to run larger arrays with more drives, which also increases the risk of drive failure or one of the drives developing bad sectors. 

    6. Enterprise storage solutions (Even entry level storage arrays) often run patrol reads both on individual hard drives and also the RAID arrays on top of them. They are also enabled by default. 

    7. At one time I worked for a small company that ran their own (single) email server. One of the system administrators discovered totally by accident that one of the two drives in a RAID 1 had failed. It turns out we were running on a single drive for months before we discovered it, because we forgot to setup email alerting. We didn't lose data, but we came close. 

    Tagged as : Storage
  2. Benchmarking Storage With Fio and Generating Charts of the Results

    Tue 21 April 2020

    Introduction

    Fio is a widely-used tool for performing storage benchmarks. Fio offers a lot of options to create a storage benchmark that would best reflect your needs. Fio allows you to assess if your storage solution is up to its task and how much headroom it has.

    Fio outputs .json and .log files that need further processing if you would like to make nice charts. Charts may help better communicate your test results to other people.

    To make graphs of Fio benchmark data, I've created fio-plot. With fio-plot you can generate charts like:

    example1 example2 example3

    It's very common that you want to run multiple benchmarks with different parameters to compare results. To generate the data of the charts, many benchmarks need to be run. This process needs to be automated.

    Automating Fio benchmarks

    I've chosen to build my own tool to automate Fio benchmarking. This tool is called bench_fio and is part of fio-plot. I'm aware that - as part of fio - a tool called genfio is provided, to generate fio job files with multiple benchmarks. It's up to you what you want to use. Bench-fio is tailored to output data in a way that aligns with fio-plot.

    Bench-fio allows you to benchmark loads with different iodepths, simultaneous jobs, block sizes and other parameters. A benchmark run can consist of hundreds of tests and take many hours.

    When you run bench_fio, you can expect output like this:

    ████████████████████████████████████████████████████
        +++ Fio Benchmark Script +++
    
    Job template:                  fio-job-template.fio
    I/O Engine:                    libaio
    Number of benchmarks:          98
    Estimated duration:            1:38:00
    Devices to be tested:          /dev/md0
    Test mode (read/write):        randrw
    IOdepth to be tested:          1 2 4 8 16 32 64
    NumJobs to be tested:          1 2 4 8 16 32 64
    Blocksize(s) to be tested:     4k
    Time per test (s):             60
    Mixed workload (% Read):       75 90
    
    ████████████████████████████████████████████████████
    4% |█                        | - [0:04:02, 1:35:00]-]
    

    Bench-fio runs real-time and shows the expected remaining time. It also shows all relevant parameters that have been configured for this benchmark run. This makes it easier to spot any mis-configurations.

    Notice that this benchmark consists of 98 individual tests: iodepth x NumJobs x Mixed Workload parameters (7 x 7 x 2). With a standard of 60 seconds per benchmark

    This is an example of the command-line syntax: :::text ./bench_fio --target /dev/md0 -t device --mode randrw -o RAID_ARRAY --readmix 75 90

    More examples can be found here.

    Tagged as : Fio
  3. The Impact of the MDADM Bitmap on RAID Performance

    Mon 06 April 2020

    Introduction

    I'm aware that most people with intensive storage workloads won't run those workloads on hard drives anymore, that ship has sailed a long time ago. SSDs have taken their place (or 'the cloud').

    For those few left who do use hard drives in Linux software RAID setups and run workloads that generate a lot of random IOPS, this may still be relevant.

    I'm not sure how much a bitmap affects MDADM software RAID arrays based on solid state drives as I have not tested them.

    The purpose of the bitmap

    By default, when you create a new software RAID array with MDADM, a bitmap is also configured. The purpose of the bitmap is to speed up recovery of your RAID array in case the array gets out of sync.

    A bitmap won't help speed up the recovery from drive failure, but the RAID array can get out of sync due to a hard reset or power failure during write operations.

    The performance impact

    During some benchmarking of various RAID arrays, I noticed very bad random write IOPS performance. No matter what the test conditions were, I got the random write performance of a single drive, although the RAID array should perform better.

    Then I noticed that the array was configured with a bitmap. Just for testing purposes, I removed the bitmap all together with:

    mdadm --grow --bitmap=none /dev/md0
    

    Random write IOPs figures improved immediately. This resource explains why:

    If  the  word internal is given, then the bitmap is stored with the metadata
    on the array, and so is replicated on all devices.
    

    So when you write data to our RAID array, the bitmap is also constantly updated. Since that bitmap lives on each drive in the array, it's probably obvious that this really deteriorates random write IOPS.

    Some examples of the performance impact

    Bitmap disabled

    An example of a RAID 5 array with 8 x 7200 RPM drives.

    nobitmap

    Another example with 10.000 RPM drives:

    10knobitmap

    Bitmap enabled (internal)

    We observe significant lower random write IOPs performance overall:

    bitmapenabled

    Which is also true for 10.000 RPM drives.

    10kbitmap

    External bitmap

    You could keep the bitmap and still get great random write IOPS by putting the bitmap on a separate SSD. Since my boot device is an SSD, I tested this option like this:

    mdadm --grow --bitmap=/raidbitmap /dev/md0
    

    I noticed excellent random write IOPS with this external bitmap, similar to running without a bitmap at all. An external bitmap has it's own risks and caveats, so make sure it really fits your needs.

    Note: external bitmaps are only known to work on ext2  and  ext3. 
    Storing bitmap files on other filesystems may result in serious problems.
    

    Conclusion

    For home users who build DIY NAS servers and who do run MDADM RAID arrays, I would recommend leaving the bitmap enabled. The impact on sequential file transfers is negligible and the benefit of a quick RAID resync is very obvious.

    Only if you have a workload that would cause a ton of random writes on your storage server would I consider disabling the bitmap. An example of such a use case would be running virtual machines with a heavy write workload.

    Update on bitmap-chunks

    Based on feedback in the comments, I've performed a benchmark on a new RAID 5 array setting the --bitmap-chunk option to 128M (Default is 64M).

    The results seem to be significantly worse than the default for random write IOPS performance.

    bitmapenabled128

    Tagged as : mdadm

Page 3 / 17