Articles in the Storage category

  1. An Ode to the 10,000 RPM Western Digital (Veloci)Raptor

    Sat 30 October 2021

    Introduction

    Back in 2004, I visited a now bankrupt Dutch computer store called MyCom1, located at the Kinkerstraat in Amsterdam. I was there to buy a Western Digital Raptor model WD740, with 74 GB of capacity, running at 10,000 RPM.

    mywd

    When I bought this drive, we were still in the middle of the transition from the PATA interface to SATA2. My raptor hard drive still had a molex connector because older computer power supplies didn't have SATA power connectors.

    olds

    You may notice that I eventually managed to break off the plastic tab of the SATA power connector. Fortunately, I could still power the drive through the Molex connector.

    A later version of the same drive came with the Molex connector disabled, as you can see below.

    news

    Why did the Raptor matter so much?

    I was very eager to get this drive as it was quite a bit faster than any consumer drive on the market at that time.

    This drive not only made your computer start up faster, but it made it much more responsive. At least, it really felt like that to me at the time.

    The faster spinning drive wasn't so much about more throughput in MB/s - although that improved too - it was all about reduced latency.

    A drive that spins faster3 can complete more I/O operations per second or IOPs4. It can do more work in the same amount of time, because each operation takes less time, compared to slower turning drives.

    The Raptor - mostly focussed on desktop applications5 - brought a lot of relief for professionals and consumer enthusiasts alike. Hard disk performance, notably latency, was one of the big performance bottlenecks at the time.

    For the vast majority of consumers or employees this bottleneck would start to be alleviated only well after 2010 when SSDs slowly started to become standard in new computers.

    And that's mostly also the point of SSDs: their I/O operations are measured in micro seconds instead of milliseconds. It's not that throughput (MB/s) doesn't matter, but for most interactive applications, you care about latency. That's what makes an old computer feel as new when you swap out the hard drive for an SSD.

    The Raptor as a boot drive

    For consumers and enthusiast, the Raptor was an amazing boot drive. The 74 GB model was large enough to hold the operating system and applications. The bulk of the data would still be stored on a second hard drive either also connected through SATA or even still through PATA.

    Running your computer with a Raptor for the boot drive, resulted in lower boot times and application load times. But most of all, the system felt more responsive.

    And despite the 10,000 RPM speed of the platters, it wasn't that much louder than regular drives at the time.7.

    In the video above, a Raspberry Pi4 boots from a 74 GB Raptor hard drive.

    Alternatives to the raptor at that time

    To put things into perspective, 10,000 RPM drives were quite common even in 2003/2004 for usage in servers. The server-oriented drives used the SCSI interface/protocol which was incompatible with the on-board IDE/SATA controllers.

    Some enthusiasts - who had the means to do so - did buy both the controller8 and one or more SCSI 'server' drives to increase the performance of their computer. They could even get 15,000 RPM hard drives! These drives however, were extremely loud and had even less capacity.

    The Raptor did perform remarkably well in almost all circumstances, especially those who mattered to consumers and consumer enthusiasts alike. Suddenly you could get SCSI/Server performance for consumer prices.

    The in-depth review of the WD740 by Techreport really shows how significant the raptor was.

    The Velociraptor

    The Raptor eventually got replaced with the Velociraptor. The Velociraptor had a 2.5" formfactor, but it was much thicker than a regular 2.5" laptop drive. Because it spun at 10,000 RPM, the drive would get hot and thus it was mounted in an 'icepack' to disipate the generated heat. This gave the Velociraptor a 3.5" formfactor, just like the older Raptor drives.

    velociraptor

    In the video below, a Raspberry Pi4 boots from a 500 GB Velociraptor hard drive.

    Benchmarking the (Veloci)raptor

    Hard drives do well with sequential read/write patterns, but their performance implodes when the data access pattern becomes random. This is due to the mechanical nature of the device. That random access pattern is where 10,000 RPM outperform their slower turning siblings.

    Random 4K read performance showing both IOPs and latency. This is kind of a worst-case benchmark to understand the raw I/O and latency performance of a drive.

    fios

    Drive ID Form Factor RPM Size (GB) Description
    ST9500423AS 2.5" 7200 500 Seagate laptop hard drive
    WD740GD-75FLA1 3.5" 10,000 74 Western Digital Raptor WD740
    SAMSUNG HD103UJ 3.5" 7200 1000 Samsung Spintpoint F1
    WDC WD5000HHTZ 2.5" in 3.5" 10,000 500 Western Digital Velociraptor
    ST2000DM008 3.5" 7200 2000 Seagate 3.5" 2TB drive
    MB1000GCWCV 3.5" 7200 1000 HP Branded Seagate 1 TB drive

    I've tested the drives on an IBM M1015 SATA RAID card flashed to IT mode (HBA mode, no RAID firmware). The image is generated with fio-plot, which also comes with a tool to run the fio benchmarks.

    It is quite clear that both 10,000 RPM drives outperform all 7200 rpm drives, as expected.

    If we compare the original 3.5" Raptor to the 2.5" Velociraptor, the performance increase is significant: 22% more IOPs and 18% lower latency. I think that performance increase is due to a combination of the higher data density, the smaller size (r/w head is faster in the spot it needs to be) and maybe better firmware.

    Both the laptop and desktop Seagate drives seem to be a bit slower than they should be based on theory. The opposite is true for the HP (rebranded Seagate), which seem to perform better than expected for the capacity and rotational speed. I have no idea why that is. I can only speculate that because the HP drive came out of a server, that the fireware was tuned for server usage patterns.

    Closing words

    Although the performance increase of the (veloci)raptor was quite significant, it never gained wide-spread adoption. Especially when the Raptor first came to marked, its primary role was that of a boot drive because of its small capacity. You still needed a second drive for your data. So the increase in performance came at a significant extra cost.

    The Raptor and Velociraptor are now obsolete. You can get a solid state drive for $20 to $40 and even those budget-oriented SSDs will outperform a (Veloci)raptor many times over.

    If you are interested in more pictures and details, take a look at this article.

    This article was discussed on Hacker News here.

    Reddit thread about this article can be found here


    1. Mycom, a chain store with quite a few shops in all major cities in The Netherlands, went bankrupt twice, once in 2015 and finally in 2019. 

    2. We are talking about the first SATA version, with a maximum bandwidth capacity of 150 MB/s. Plenty enough for hard drives at that time. 

    3. https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics 

    4. https://louwrentius.com/understanding-storage-performance-iops-and-latency.html 

    5. I read that WD intended the first Raptor (34 GB version) to be used in low-end servers as a cheaper alternative to SCSI drives . After the adoption of the Raptor by computer enthusiasts and professionals, it seems that Western Digital pivoted, so the next version - the 74 GB I have - was geared more towards desktop usage. That also meant that this 74 GB model got fluid bearings, making it quieter6

    6. The 74 GB model is actually rather quiet drive at idle. Drive activity sounds rather smooth and pleasant, no rattling. 

    7. Please note that the first model, the 37 GB version, used ball bearings in stead of fluid bearings, and was reported to be significant louder. 

    8. Low-end SCSI card were often used to power flatbed scanners, Iomega ZIP drives, tape drives or other peripherals, but in order to benefit from the performance of those server hard drives, you needed a SCSI controller supporting higher bandwidth and those were more expensive. 

    Tagged as : Storage
  2. Don't Be Afraid of RAID

    Fri 22 May 2020

    Introduction

    I sense this sentiment on the internet that RAID is dangerous, that the likelihood of your RAID array failing during a rebuild is almost a certainty, because hard drives have become so large.

    I think nothing is further from the truth and I would like to dispel this myth.

    Especially for home users and small businesses, RAID arrays are still a reliable and efficient way of storing a lot of data in a single place.

    Perception of RAID reliability

    There are many horror stories to be found on the internet about people at home losing their RAID array. These stories may have contributed to a negative attitude towards RAID in general.

    You may acuse me of victim blaming, but in many cases, I do wonder if those incidents were due to user error1, due to bad luck or actual RAID causing problems. And there is a bias in reporting: you won't hear from the countless people who have no issues.

    In any case, the damage is done, but I still think (software) RAID is perfectly fine.

    The myth about the Unrecoverable Read Error (URE)

    I think the trouble started with this terrible article on ZDNET from 2007.

    In this article, it's argued that as drives become bigger, but not more reliable, you will see more unrecoverable read errors (UREs). More capacity means more sectors, so more risk of one of them going bad.

    An URE is an incident where the hard drive can't read a sector5. For old people like me, that sounds like the definition of a 'bad sector'. The article argues that on average you would encounter an URE for every 12.5 TB of data read.

    By the logic of the ZDNET acticle, just copying all data from a 14 TB drive would probably be impossible, because you would probably hit an URE / bad sector before you finish your copy.

    This is a very big issue for RAID arrays. A RAID array rebuild consists of reading the contents of all remaining drives in their entirety2. So you are guaranteed to hit an URE during a RAID rebuild.

    The good news is that you don't have to worry about any of this. Because it is not true.

    Hard drives are not that unreliable in practice. On the contrary. They are remarkably reliable, I would say. Just look at the Backblaze drive statistics6.

    The prediction of the infamous ZDNET article has not come true. The URE specification for hard drive describes a worst-case scenario and seem to be more about marketing (a way to differentiate enterprise drives from consumer drives) than about reality.

    If the ZDNET article were true, I, myself, should have encountered many UREs because of the many RAID array scrubs/patrol reads that have completed acros various RAID arrays.

    RAID has never stopped working and is still going strong.

    Card

    Scrubbing protects against the impact of bad sectors

    When a drive fails in a RAID array that can only tollerate one drive failure, it's very important that all remaining drives won't encounter any read errors. Because redundancy is lost, any read errors due to bad sectors could mean that the entire array is lost or at least some files are corrupted7.

    Every RAID array supports 'scrubbing'. It's a process where every sector of the RAID array is read, which in effect causes all sectors of all hard drives to be read.

    A scrub is a process to check for bad sectors in advance. If bad sectors are found on a hard drive, the drive can be replaced so it will not cause problems during a potential future rebuild. Replacing the drive itself will cause a rebuild, but assuming the scrub didn't find any other drives with bad sectors, that rebuild will be fine.

    A RAID array that doesn't undergo a regular scrub is a disaster waiting to happen. Bad sectors may be building up on one of the other drivs and when a drive actually fails, the entire array may be lost because of the undetected bad sectors on (one of) the remaining drives.

    If you want to store data in a reliable way on a RAID array, you need to assure the array is scrubbed periodically. And even if you don't use RAID, I would recommend running a long SMART test once a month against every hard drive you own.

    By default, a Linux software RAID array is scrubbed once a week on Ubuntu. For details, look at the contents of /etc/cron.d/mdadm.

    If you use ZFS on Linux, your array is automatically scrubbed on the second Sunday of every month if you run Ubuntu.

    NAS vendors like Synology or QNAP have data scrubs enabled by default. Consider the manual of your particular NAS to adjust the frequency. I would recommend to scrub at least once a month and at night.

    Why is RAID 5 considered harmful?

    Frankly, I wonder that too.

    I notice a lot of people on the internet claiming that you should never use RAID 5 but I disagree. It all depends on the circumstances. Finding a balance between cost and risk is important.

    This page dating back to 2003 advocated not to use RAID 5 but that's focused on the enterprise environment and even there I see its uses.

    For small RAID arrays with five or less drives I think RAID 5 is still a great fit. Especially if you run a small 4-bay NAS it would make total sense to use RAID 5. You get a nice balance between capacity and the cost of availability.

    It's not really recommended to create larger RAID 5 arrays. Compared to a single drive, a RAID array with 8 drives is 8 times more likely to experience a drive failure. You multiply the risk of a single drive failing by eight. With larger arrays, double drive failure becomes a serious risk.

    This is why it's really recommended to use RAID 6 for larger RAID arrays, because RAID 6 can tollerate two simultaneous drive failures. I've used RAID 6 in the past and I use RAIDZ2 (ZFS) as the basis for my current NAS.

    I also run an 8-drive RAID 5 in one of my servers that hosts not so important data that I still want to keep around and would rather not lose, but not at every cost. It's all about a balance between risk and cost. Please also read the postscript of this post, you will like it.

    It is true that during a rebuild, hard drives are strained more, but unless the RAID array is also in heavy use, the load on the drive isn't that big: the data is read sequentially, which is quite easy on the drives.

    RAID rebuild performance is mostly determined by the size of the drives and not by the number of drives in the RAID array3.

    Years ago I ran a 20-drive RAID 6 based on 1 TB drives and it did a rebuild in 5 hours. Recently I tested a rebuild of 8 drives in RAID 5 (using the same drives) and it also took almost 5 hours (4H45M).

    The RAID write hole

    The RAID 5/6 'write hole' is often mentioned as something you should be afraid about.

    Parity-based RAID like RAID 5 and RAID 6 may be affected by an issue called the 'write hole'. To (over)simplify: if a computer would experience a sudden power failure, a write to the RAID array may be interrupted. This could cause a partial write to the RAID array, leaving it in an inconsistent state.

    As a side note, I would always recommend protecting your NAS with a UPS (battery backup) so your server can shut down in a clean way, before power is lost as the battery runs out.

    ZFS RAIDZ is not affected by the 'write hole' issue, because it writes data to a log first before writing it to the actual array4.

    Linux MDADM software RAID also is protected against the 'write hole' phenomenon by using a bitmap (which is enabled by default4).

    Hardware RAID is also protected against this by using a battery backup for the cache memory. The data in the cache memory is written to disk as soon as the computer is powered back on.

    Setup alerting if you care about your data

    I think that a lot of RAID horror stories are due to the fact that people may never notice any problems until it is too late because they never set up any kind of alerting (by email or other).

    Ideally, you would also make sure your system monitors the SMART data of your hard drives and alert when critical numbers start to rise (Reallocated Sector count and Current Pending Sector count).

    This is also a moment of personal reflection. Do you run a RAID array? Did you setup alerting? Or could your RAID array be failing this very moment and you wouldn't know?

    Anyway: I think a lack of proper alerting is a nice way of getting into trouble with RAID, but that's not on RAID. Any storage solution that is not monitored is just a disaster waiting to happen.

    Why people choose not to use RAID

    If a RAID array fails, all data is lost. Some people are not comfortable with this risk. They would rather lose the contents of some drives, but not all of them.

    Solutions like Unraid and SnapRAID use one or more dedicated hard drives to store redundant (parity) data. The other hard drives are formatted with your filesystem of choice and can be accessed as normal hard drives. Altough I have no experience with this product, StableBit DrivePool seems to work in a similar manner.

    If you would have six hard drives, thus five data drives and one parity disk, the loss of two drives would result in data loss, as with RAID 5. However, the data on the remaining four drives would still be intact. The data loss is limited to just one drive worth of data.

    The 'all-or-nothing' risk associated with regular software RAID is thus mitigated. I myself don't think those risks aren't that large, but Unraid and snapraid are popular product and I think they are reasonable alternatives.

    Mergerfs could also be an interesting option, although it only supports mirroring.

    Backups are still important

    Storing your data on any kind of RAID array is never a substitute for a backup.

    You should still copy your data to some other storage if you want to protect your data. You may chose to only make a backup of a subset of all of the data, but at least you take an informed risk.

    Evaluation

    I hope I have demonstrated why RAID is still a valid and reliable option for data storage.

    Feel free to share your own views in the comments.

    P.S.

    I ran a scrub on my 8-disk RAID 5 array (based on 2 TB drives) as I was writing this article. My servers are only powered on when I need them and while powered off, it's easy for them to miss their periodic scrub window.

    So as to practice what I preach I ran a scrub. Lo and behold, one of the drives was kicked out of my Linux software RAID array. Don't you love the irony?

    sd 0:0:4:0: [sde] tag#29 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    sd 0:0:4:0: [sde] tag#29 Sense Key : Medium Error [current] 
    sd 0:0:4:0: [sde] tag#29 Add. Sense: Unrecovered read error
    sd 0:0:4:0: [sde] tag#29 CDB: Read(10) 28 00 9f 42 9e 30 00 04 00 00
    print_req_error: critical medium error, dev sde, sector 2671943216
    

    Followed by:

    md/raid:md6: Disk failure on sde, disabling device.
    md/raid:md6: Operation continuing on 7 devices.
    

    The drive was clearly kicked out because the drive encountered bad sectors. A quick check of the SMART data revealed more than 300+ sectors were already remapped, but the data stored in them could not be recovered, causing read errors.

    This drive is clearly done, although it was still operational.

    After swapping this defective drive with a spare replacement, I started the rebuild proces, which took four hours and twenty minutes. My RAID 5 has rebuild and is now perfectly fine.

    If an event like this doesn't drive the point home that scrubs are important, I don't know what will.


    1. Sometimes I read what hardware people use for storage and I think about this quote by John Glenn: ‘I felt exactly how you would feel if you were getting ready to launch and knew you were sitting on top of 2 million parts — all built by the lowest bidder on a government contract.’ 

    2. ZFS works differently, it only reads the sectors containing actual data. 

    3. ZFS rebuilds or 'resilvers' become slower as you add more drives to a RAIDZ(2/3) VDEV, it seems. I'm not sure this is still the case with more recent ZFS versions. 

    4. Both ZFS and MDADM will take a performance hit by using a log/bitmap. Both solutions support using an SSD to accelerate the log/bitmap to remove this performance hit. Most home users probably won't need this. 

    5. The smallest unit of storage a drive can store, often 4K or 512 bytes for older, smaller drives. 

    6. Those hard drive live in a datacenter with a conditioned environment, which you probably don't have at home. But as long as you keep the temperature of hard drive within limits, I don't think it matters that much. 

    7. ZFS is both a RAID solution and a filesystem in one and can tell you exactly which file is affected. A nice feature. 

    Tagged as : storage RAID
  3. What Home NAS Builders Should Understand About Silent Data Corruption

    Thu 23 April 2020

    Introduction

    When it comes to dealing with storage in a DIY NAS context, two important topics come up:

    1. Unrecoverable read errors (UREs) or what old people like me call 'bad sectors'
    2. Silent data corruption (data corruption unnoticed by the storage layers)

    I get a strong impression that people tend to confuse those concepts. However, they often come up when people evaluate their options when they want to buy or build their own do-it-yourself NAS.

    In this article, I want to make a clear distinction between the two and assess their risk. This may help you evaluating these risks and make an informed decision.

    Unrecoverable read errors (due to bad sectors)

    When a hard drive hits a 'bad sector', it means that it can't read the contents of that particular sector anymore.

    If the hard drive is unable to read that data even after multiple attempts, the operating system will return an Unrecoverable Read Error (URE).

    This is an example (on Linux) of a drive experiencing read errors, as pulled from /var/log/syslog (culled a bit for readability):

    sd 0:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    sd 0:0:0:0: [sda] tag#19 Sense Key : Medium Error [current] 
    sd 0:0:0:0: [sda] tag#19 Add. Sense: Unrecovered read error
    sd 0:0:0:0: [sda] tag#19 CDB: Read(10) 28 00 02 1c 8c 00 00 00 98 00
    blk_update_request: critical medium error, dev sda, sector 35425280 op 0x0:(READ)
    sd 0:0:0:0: [sda] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    sd 0:0:0:0: [sda] tag#16 Sense Key : Medium Error [current] 
    sd 0:0:0:0: [sda] tag#16 Add. Sense: Unrecovered read error
    sd 0:0:0:0: [sda] tag#16 CDB: Read(10) 28 00 02 1c 8d 00 00 00 88 00
    blk_update_request: critical medium error, dev sda, sector 35425536 op 0x0:(READ)
    

    If a sector cannot be read, the data stored in that sector is lost. And in my experience, if you encounter a single bad sector, soon, there will be more. So if this happens, it's time to replace the hard drive.

    We use RAID to protect against drive failure. RAID (no matter the implementation) also can deal with 'partial failure' such as a drive encountering bad sectors.

    In a RAID array, a drive encountering unrecoverable read errors is just kicked out of the array, so it doesn't 'hang' or 'stall' the entire RAID array.

    Please note that this behaviour does depend on the particular RAID solution of choice1. The point is though that bad sectors or UREs are a common event and RAID solutions can deal with them properly.

    The real problem with bad sectors (resulting in UREs) is that they can remain undiscovered until it is too late. So to uncover them in an early state, it's very important to run regular data scrubs. I've written an article specifically about this topic.

    Silent data corruption

    An unrecoverable read error means that we can't read (a portion of) a file. Although it is unfortunate - because we better have an intact backup of that file - we are also fortunate.

    Why are we fortunate?

    We are fortunate because the storage system - the hard drive and in turn the operating system - reported an error. We were able to diagnose the problem an take action.

    But it is possible that bits and bytes get mangled without your hard drive, SATA controller or operating system noticing. Somewhere, somehow, a bit is read or transmitted as a 1 where it should have been a 0.

    This is really bad, because this data corruption is undetected, it is 'silent', there is no notification.

    Because imagine what happens: the corrupted file is happily backed up by your backup software, because it's unaware that anything is wrong. And by the time you discover the data corruption, the original pristine file is no longer part of the backup (rotated out). You are left with a lot of backups of a corrupted file. We encounter dataloss.

    This is one of the scariest kinds of data loss. Because it's very difficult to detect. You'll have to constantly calculate the checksum of a file and verify it's still ok.

    And that's - although rather simplified - exactly what ZFS does (amongst many other things). ZFS uses checksums at the block-level and thus assures with every read if the data contained in the block is still valid. ZFS is one of the few file systems that has this very powerfull feature (BTRFS is another example).

    Regular RAID arrays (be it hardware-based or software-based) cannot detect silent data corruption (although it could be possible with RAID6). So it must be clear that ZFS is capable of protecting against a risk 'regular' RAID cannot cope with.

    Is silent data corruption a significant threat for home DIY NAS builders?

    Although silent data corruption is a very scary threat, from what I can tell there is no significant independant evidence that the risk of silent data corruption is so high that the average home DIY NAS builder should take this risk into account2.

    Maybe I'm wrong, but I think many people mistakenly confuse UREs or unrecoverable read errors (caused by bad sectors) with silent data corruption. And I think that's wrong, because there's nothing silent about an unrecoverable read error.

    The truth is that hard drives are in fact very reliable when it comes to silent data corruption, because they make heavy use of error detection and correction algoritms. A significant portion of the raw capacity of a hard drive is sacrificed to store redundant information to aid in detecting and correcting data corruption. According to wikipedia, hard drives used Reed-Solomon error correction in the past and more modern drives use LDPC.

    These error correction codes asure data integrity. Although 'soft' read errors may occur, there is enough additional redundant information stored on the hard drive to detect errors and even reconstruct the data (to some extend). Your hard drive handles this all by itself, it's part of normal operation.

    So this is my point: it's important to understand that there is a lot of protection against silent data corruption in a hard drive. The risk of silent data corruption is therefore small3.

    Sometimes the read data is so garbled that even the error correction codes cannot reconstruct the data as it was originally stored and that's what we then experience as an unrecoverable read error. But the disk notices! And it will report it!. This is not silent at all!

    To really create silent data corruption, something very special need to happen. And to be very clear: such events do happen. But they are very rare.

    Somehow, a bit must flip and this event is not detected by the error correction algorithm. Maybe the bit flipped in the hard drive cache memory when it was read from the drive. Maybe it flipped during transport over the SATA cable.

    But it's fun to realise that the SATA protocol also has error detection embedded in the protocol for reliable data transmission. It's error detection and correction all the way down.

    The risk that silent data corruption happens is thus very small, especially for home users.

    Again, make no mistake: the risk is real and storage solutions for larger scale storage solutions (SANs / Storage arrays) with hundreds, thousands or tens of thousands of drives do really have to take into account the risk of silent data corruption. At scale, even very small risks become a certainty.

    Enterprise storage solutions often employ their own proprietary solutions to protect against silent data corruption. Although it depend on the particular solution4, it's often part of the storage array. ZFS was revolutionary because they put the data integrity checking in the filesystem itself.

    So if you think the risk of silent data corruption is still high enough that you should protect yourself against it, I would recommend to consider using ECC memory to protect against corrupted data in memory. To be frank: I consider non-ECC memory a more likely cause of silent data corruption than the storage subsystem, which already employs all these error detection and correction algoritms. Non-ECC memory is totally unprotected.

    Anekdote: I myself run a 24-drive NAS based on ZFS and it has been rock-solid for 6 years straight.

    mynasimage

    From time to time, I do run disk 'scrubs', which can take quite some time. Although I have many terrabytes of data protected by ZFS, not a single instance of silent data corruption has been detected. And I have performed so many scrubs that I've read more than a petabyte worth of data.

    Anekdote: Somebody made a mistake and used the wrong type of cable to connect the hard drives to the HBA controller card. This caused actual silent data corruption. Because that person was running ZFS, it was detected so ZFS saved his data. This an example where ZFS did protect a person against silent data corruption.

    Evaluation

    I hope that the difference between unrecoverable read errors and silent data corruption is clear and that we should not confuse the two. They have different risk profiles associated with them.

    Furthermore, I have argued that silent data corruption is real and a serious issue at scale, and that it is that is dealt with accordingly.

    However, I've also argued that unless you are a home user running a small datacenter inside your basement, the risk of silent data corruption is so small that it is reasonable to accept the risk as a DIY NAS builder and not seek specific protection against it.

    The decision is up to you. If you want to go with ZFS and protect against silent data corruption, you should also be aware and accept the cost of ZFS. I myself have accepted that cost for my own NAS, but it's OK if you don't. If you care about silent data corruption so much, please also consider using ECC-memory.

    But in my opinion, you are not taking an unreasonable risk if you chose to go with Unraid, Snapraid, Linux kernel RAID, Windows Storage Spaces or maybe other options in the same vein. I would say that this is reasonable and up to you.

    Remember: the famous vendors of home user NAS boxes all seem to use regular Linux kernel RAID under the hood. And they seem to think that's fine.

    In the end, what really matters is a solution that suits your needs and also fits your budget and level of expertise. Can you fix problems when something goes wrong?


    1. I've noticed while testing with this particular drive that the drive was not kicked out of the array, and it just kept trying to read, grinding the Linux software RAID array to a halt. Removing the drive from the array fixed this. There is a 'failfast' option that only works with RAID1 or RAID10. 

    2. I don't want to suggest in any way that it would be wrong to take silent data corruption into account, but just to say I think it's not mandatory to really fret over it. 

    3. The most significant risk is that enterprise grade hard drives use on-board ECC cache memory, whereas consumer drives use non-ECC cache memory. So silently corrupted data in the cache memory of the drive could be a risk. 

    4. Storage vendors often choose to reformat har drives with larger sector sizes5. Those larger sectors then also incorporate additional checksum data to better protect against data corruption or unrecoverable read errors. 

    5. https://www.seagate.com/files/staticfiles/docs/pdf/whitepaper/safeguarding-data-from-corruption-technology-paper-tp621us.pdf 

    Tagged as : Storage

Page 2 / 17