1. Don't Be Afraid of RAID

    Fri 22 May 2020

    Introduction

    I sense this sentiment on the internet that RAID is dangerous, that the likelihood of your RAID array failing during a rebuild is almost a certainty, because hard drives have become so large.

    I think nothing is further from the truth and I would like to dispel this myth.

    Especially for home users and small businesses, RAID arrays are still a reliable and efficient way of storing a lot of data in a single place.

    Perception of RAID reliability

    There are many horror stories to be found on the internet about people at home losing their RAID array. These stories may have contributed to a negative attitude towards RAID in general.

    You may acuse me of victim blaming, but in many cases, I do wonder if those incidents were due to user error1, due to bad luck or actual RAID causing problems. And there is a bias in reporting: you won't hear from the countless people who have no issues.

    In any case, the damage is done, but I still think (software) RAID is perfectly fine.

    The myth about the Unrecoverable Read Error (URE)

    I think the trouble started with this terrible article on ZDNET from 2007.

    In this article, it's argued that as drives become bigger, but not more reliable, you will see more unrecoverable read errors (UREs). More capacity means more sectors, so more risk of one of them going bad.

    An URE is an incident where the hard drive can't read a sector5. For old people like me, that sounds like the definition of a 'bad sector'. The article argues that on average you would encounter an URE for every 12.5 TB of data read.

    By the logic of the ZDNET acticle, just copying all data from a 14 TB drive would probably be impossible, because you would probably hit an URE / bad sector before you finish your copy.

    This is a very big issue for RAID arrays. A RAID array rebuild consists of reading the contents of all remaining drives in their entirety2. So you are guaranteed to hit an URE during a RAID rebuild.

    The good news is that you don't have to worry about any of this. Because it is not true.

    Hard drives are not that unreliable in practice. On the contrary. They are remarkably reliable, I would say. Just look at the Backblaze drive statistics6.

    The prediction of the infamous ZDNET article has not come true. The URE specification for hard drive describes a worst-case scenario and seem to be more about marketing (a way to differentiate enterprise drives from consumer drives) than about reality.

    If the ZDNET article were true, I, myself, should have encountered many UREs because of the many RAID array scrubs/patrol reads that have completed acros various RAID arrays.

    RAID has never stopped working and is still going strong.

    Card

    Scrubbing protects against the impact of bad sectors

    When a drive fails in a RAID array that can only tollerate one drive failure, it's very important that all remaining drives won't encounter any read errors. Because redundancy is lost, any read errors due to bad sectors could mean that the entire array is lost or at least some files are corrupted7.

    Every RAID array supports 'scrubbing'. It's a process where every sector of the RAID array is read, which in effect causes all sectors of all hard drives to be read.

    A scrub is a process to check for bad sectors in advance. If bad sectors are found on a hard drive, the drive can be replaced so it will not cause problems during a potential future rebuild. Replacing the drive itself will cause a rebuild, but assuming the scrub didn't find any other drives with bad sectors, that rebuild will be fine.

    A RAID array that doesn't undergo a regular scrub is a disaster waiting to happen. Bad sectors may be building up on one of the other drivs and when a drive actually fails, the entire array may be lost because of the undetected bad sectors on (one of) the remaining drives.

    If you want to store data in a reliable way on a RAID array, you need to assure the array is scrubbed periodically. And even if you don't use RAID, I would recommend running a long SMART test once a month against every hard drive you own.

    By default, a Linux software RAID array is scrubbed once a week on Ubuntu. For details, look at the contents of /etc/cron.d/mdadm.

    If you use ZFS on Linux, your array is automatically scrubbed on the second Sunday of every month if you run Ubuntu.

    NAS vendors like Synology or QNAP have data scrubs enabled by default. Consider the manual of your particular NAS to adjust the frequency. I would recommend to scrub at least once a month and at night.

    Why is RAID 5 considered harmful?

    Frankly, I wonder that too.

    I notice a lot of people on the internet claiming that you should never use RAID 5 but I disagree. It all depends on the circumstances. Finding a balance between cost and risk is important.

    This page dating back to 2003 advocated not to use RAID 5 but that's focused on the enterprise environment and even there I see its uses.

    For small RAID arrays with five or less drives I think RAID 5 is still a great fit. Especially if you run a small 4-bay NAS it would make total sense to use RAID 5. You get a nice balance between capacity and the cost of availability.

    It's not really recommended to create larger RAID 5 arrays. Compared to a single drive, a RAID array with 8 drives is 8 times more likely to experience a drive failure. You multiply the risk of a single drive failing by eight. With larger arrays, double drive failure becomes a serious risk.

    This is why it's really recommended to use RAID 6 for larger RAID arrays, because RAID 6 can tollerate two simultaneous drive failures. I've used RAID 6 in the past and I use RAIDZ2 (ZFS) as the basis for my current NAS.

    I also run an 8-drive RAID 5 in one of my servers that hosts not so important data that I still want to keep around and would rather not lose, but not at every cost. It's all about a balance between risk and cost. Please also read the postscript of this post, you will like it.

    It is true that during a rebuild, hard drives are strained more, but unless the RAID array is also in heavy use, the load on the drive isn't that big: the data is read sequentially, which is quite easy on the drives.

    RAID rebuild performance is mostly determined by the size of the drives and not by the number of drives in the RAID array3.

    Years ago I ran a 20-drive RAID 6 based on 1 TB drives and it did a rebuild in 5 hours. Recently I tested a rebuild of 8 drives in RAID 5 (using the same drives) and it also took almost 5 hours (4H45M).

    The RAID write hole

    The RAID 5/6 'write hole' is often mentioned as something you should be afraid about.

    Parity-based RAID like RAID 5 and RAID 6 may be affected by an issue called the 'write hole'. To (over)simplify: if a computer would experience a sudden power failure, a write to the RAID array may be interrupted. This could cause a partial write to the RAID array, leaving it in an inconsistent state.

    As a side note, I would always recommend protecting your NAS with a UPS (battery backup) so your server can shut down in a clean way, before power is lost as the battery runs out.

    ZFS RAIDZ is not affected by the 'write hole' issue, because it writes data to a log first before writing it to the actual array4.

    Linux MDADM software RAID also is protected against the 'write hole' phenomenon by using a bitmap (which is enabled by default4).

    Hardware RAID is also protected against this by using a battery backup for the cache memory. The data in the cache memory is written to disk as soon as the computer is powered back on.

    Setup alerting if you care about your data

    I think that a lot of RAID horror stories are due to the fact that people may never notice any problems until it is too late because they never set up any kind of alerting (by email or other).

    Ideally, you would also make sure your system monitors the SMART data of your hard drives and alert when critical numbers start to rise (Reallocated Sector count and Current Pending Sector count).

    This is also a moment of personal reflection. Do you run a RAID array? Did you setup alerting? Or could your RAID array be failing this very moment and you wouldn't know?

    Anyway: I think a lack of proper alerting is a nice way of getting into trouble with RAID, but that's not on RAID. Any storage solution that is not monitored is just a disaster waiting to happen.

    Why people choose not to use RAID

    If a RAID array fails, all data is lost. Some people are not comfortable with this risk. They would rather lose the contents of some drives, but not all of them.

    Solutions like Unraid and SnapRAID use one or more dedicated hard drives to store redundant (parity) data. The other hard drives are formatted with your filesystem of choice and can be accessed as normal hard drives. Altough I have no experience with this product, StableBit DrivePool seems to work in a similar manner.

    If you would have six hard drives, thus five data drives and one parity disk, the loss of two drives would result in data loss, as with RAID 5. However, the data on the remaining four drives would still be intact. The data loss is limited to just one drive worth of data.

    The 'all-or-nothing' risk associated with regular software RAID is thus mitigated. I myself don't think those risks aren't that large, but Unraid and snapraid are popular product and I think they are reasonable alternatives.

    Mergerfs could also be an interesting option, although it only supports mirroring.

    Backups are still important

    Storing your data on any kind of RAID array is never a substitute for a backup.

    You should still copy your data to some other storage if you want to protect your data. You may chose to only make a backup of a subset of all of the data, but at least you take an informed risk.

    Evaluation

    I hope I have demonstrated why RAID is still a valid and reliable option for data storage.

    Feel free to share your own views in the comments.

    P.S.

    I ran a scrub on my 8-disk RAID 5 array (based on 2 TB drives) as I was writing this article. My servers are only powered on when I need them and while powered off, it's easy for them to miss their periodic scrub window.

    So as to practice what I preach I ran a scrub. Lo and behold, one of the drives was kicked out of my Linux software RAID array. Don't you love the irony?

    sd 0:0:4:0: [sde] tag#29 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    sd 0:0:4:0: [sde] tag#29 Sense Key : Medium Error [current] 
    sd 0:0:4:0: [sde] tag#29 Add. Sense: Unrecovered read error
    sd 0:0:4:0: [sde] tag#29 CDB: Read(10) 28 00 9f 42 9e 30 00 04 00 00
    print_req_error: critical medium error, dev sde, sector 2671943216
    

    Followed by:

    md/raid:md6: Disk failure on sde, disabling device.
    md/raid:md6: Operation continuing on 7 devices.
    

    The drive was clearly kicked out because the drive encountered bad sectors. A quick check of the SMART data revealed more than 300+ sectors were already remapped, but the data stored in them could not be recovered, causing read errors.

    This drive is clearly done, although it was still operational.

    After swapping this defective drive with a spare replacement, I started the rebuild proces, which took four hours and twenty minutes. My RAID 5 has rebuild and is now perfectly fine.

    If an event like this doesn't drive the point home that scrubs are important, I don't know what will.


    1. Sometimes I read what hardware people use for storage and I think about this quote by John Glenn: ‘I felt exactly how you would feel if you were getting ready to launch and knew you were sitting on top of 2 million parts — all built by the lowest bidder on a government contract.’ 

    2. ZFS works differently, it only reads the sectors containing actual data. 

    3. ZFS rebuilds or 'resilvers' become slower as you add more drives to a RAIDZ(2/3) VDEV, it seems. I'm not sure this is still the case with more recent ZFS versions. 

    4. Both ZFS and MDADM will take a performance hit by using a log/bitmap. Both solutions support using an SSD to accelerate the log/bitmap to remove this performance hit. Most home users probably won't need this. 

    5. The smallest unit of storage a drive can store, often 4K or 512 bytes for older, smaller drives. 

    6. Those hard drive live in a datacenter with a conditioned environment, which you probably don't have at home. But as long as you keep the temperature of hard drive within limits, I don't think it matters that much. 

    7. ZFS is both a RAID solution and a filesystem in one and can tell you exactly which file is affected. A nice feature. 

    Tagged as : storage RAID

Page 1 / 1