How important is availability of an information system to you and your
company? What are the costs of, let's say, a couple of hours downtime and
maybe the loss of all the work since the last backup?
Depending on the information system, the impact can be quite grave, I presume.
So what are the biggest risks regarding the availability of your systems?
Human error is probably number one. Number two might be the hardware.
One of the most unreliable components of the hardware on which your precious
information systems run is the good old hard drive. There are not two but
three certainties in life: death, taxes and that sooner or later hard drives
So back in the eighties (some patent was already awarded back in 1978) some
smart people invented RAID. Using RAID, your information system can tollerate
a disk faillure, and still continue to operate.
There are many different tastes of RAID, so called RAID levels. One of the
most populair RAID levels is RAID 1. Two disks acting like 1. If one fails,
the other takes over. For performance, you can stack them together and you get
RAID 10 arrays. However, 50% of your storage space is waisted because for
every n of storage, you need (n / c ) * 2 disks, where c represents the
capacity of a single drive.
RAID level 5 is a more efficient solution. Using this RAID level, only the
capacity of one disk is lost in order to provide redundancy. So for every n of
storage, you need ( n / c ) + 1 disks. It is easy to see that for larger
arrays with more disks, RAID 5 is much more efficient. The downside of RAID 5
is mainly (write) performance, if compared to RAID 10. However, if it is
sufficient, that is often not an issue. Hence the popularity of RAID 5.
This story is all about risk vs. costs. And there is a risk using RAID 1 and 5
that can not be neglected that should be pointed out. If a drive fails,
redundancy is lost. At that moment, until the faulty drive is replaced you
will run the risk of losing the entire RAID array and all data if another disk
How big is that risk? Well, that is the weakest point of this article. I
honestly don't know. There is some anecdotal "evidence" that it
occurs occasionally. And it is not that surprising: restoring an array puts
extra stress on all de disks involved, which might be fatal for a second
Today, RAID arrays of 10+ disks are not a rarity. With that amount of drives,
it wouldn't be surprising if, during recovery, a second drive would fail. It's
easy: with a 10-disk array the chance that a disk fails is twice that of a 5
The most common solution is to revert back to RAID 10. RAID 10 consists of
disk pairs concatenated to one big virtual disk. RAID 10 can tollerate up to
50% loss of drives if one member of every pair would fail. The caveat is
obvious: if a disk fails and the other drive of that pair will fail during
recovery, the whole array will be lost. However, compared with RAID 5, the
risk is reduced. In degraded mode (non-redundant) any drive failure will
destroy a RAID 5 array. RAID 10 can tollerate additional drive failures as
long as it is not the drive of the pair that just already lost one.
So, although the risk that a second drive failure might destroy your array is
greatly reduced using RAID 10 (compared to RAID 5), there is still a risk that
the array is lost is the 'wrong' drive fails.
So the solution should be that redundancy is not lost if a single
drive failure occurs. RAID 6 provides that solution. RAID 6 is in nature
identical to RAID 5. However, an additional drive is sacrified for additional
redundancy. So for every n of storage, you need ( n / c ) + 2 disks. If you
need 10 TB of storage using 1 TB disks, you need 12 disks. If a disk fails,
the array is still redundant. Even a second drive can fail and the array will
still continue to operate. I think that the chance that a third drive would
fail is so low that it is an accepted risk.
For smaller arrays, the risk of a double drive failure might not that high to
justify RAID 6, but with larger arrays (more drives) RAID 6 might become a
So there you have it. With current costs of hard drives and the wide support
for RAID 6, it is an option that should be taken into account when designing
the hardware platform for an information system.
Aftertought: this article is mainly about considering RAID 6 in stead of RAID
5. Raid 5 or 6 may often not be a solution if performance in terms of IO
(input/output) is an issue. Please note that when running in degraded mode (a
drive failure occurred) the performance penalty on RAID 5 and RAID 6 will be
severe (may be 80%). RAID 10 will suffer far less in that regard.