Louwrentius

The Impact of the MDADM Bitmap on RAID Performance

Mon 06 April 2020
Introduction

I'm aware that most people with intensive storage workloads won't run those workloads on hard drives anymore, that ship has sailed a long time ago. SSDs have taken their place (or 'the cloud').

For those few left who do use hard drives in Linux software RAID setups and run workloads that generate a lot of random IOPS, this may still be relevant.

I'm not sure how much a bitmap affects MDADM software RAID arrays based on solid state drives as I have not tested them.

The purpose of the bitmap

By default, when you create a new software RAID array with MDADM, a bitmap is also configured. The purpose of the bitmap is to speed up recovery of your RAID array in case the array gets out of sync.

A bitmap won't help speed up the recovery from drive failure, but the RAID array can get out of sync due to a hard reset or power failure during write operations.

The performance impact

During some benchmarking of various RAID arrays, I noticed very bad random write IOPS performance. No matter what the test conditions were, I got the random write performance of a single drive, although the RAID array should perform better.

Then I noticed that the array was configured with a bitmap. Just for testing purposes, I removed the bitmap all together with:
```
mdadm --grow --bitmap=none /dev/md0
```
Random write IOPs figures improved immediately. This resource explains why:
```
If  the  word internal is given, then the bitmap is stored with the metadata
on the array, and so is replicated on all devices.
```
So when you write data to our RAID array, the bitmap is also constantly updated. Since that bitmap lives on each drive in the array, it's probably obvious that this really deteriorates random write IOPS.

Some examples of the performance impact

Bitmap disabled

An example of a RAID 5 array with 8 x 7200 RPM drives.

Another example with 10.000 RPM drives:

Bitmap enabled (internal)

We observe significant lower random write IOPs performance overall:

Which is also true for 10.000 RPM drives.

External bitmap

You could keep the bitmap and still get great random write IOPS by putting the bitmap on a separate SSD. Since my boot device is an SSD, I tested this option like this:
```
mdadm --grow --bitmap=/raidbitmap /dev/md0
```
I noticed excellent random write IOPS with this external bitmap, similar to running without a bitmap at all. An external bitmap has it's own risks and caveats, so make sure it really fits your needs.
```
Note: external bitmaps are only known to work on ext2  and  ext3. 
Storing bitmap files on other filesystems may result in serious problems.
```
Conclusion

For home users who build DIY NAS servers and who do run MDADM RAID arrays, I would recommend leaving the bitmap enabled. The impact on sequential file transfers is negligible and the benefit of a quick RAID resync is very obvious.

Only if you have a workload that would cause a ton of random writes on your storage server would I consider disabling the bitmap. An example of such a use case would be running virtual machines with a heavy write workload.

Update on bitmap-chunks

Based on feedback in the comments, I've performed a benchmark on a new RAID 5 array setting the --bitmap-chunk option to 128M (Default is 64M).

The results seem to be significantly worse than the default for random write IOPS performance.
Tagged as : mdadm

Read and Post Comments
'Linux RAID Level and Chunk Size: The Benchmarks'

Sun 23 May 2010
Introduction

When configuring a Linux RAID array, the chunk size needs to get chosen. But what is the chunk size?

When you write data to a RAID array that implements striping (level 0, 5, 6, 10 and so on), the chunk of data sent to the array is broken down in to pieces, each part written to a single drive in the array. This is how striping improves performance. The data is written in parallel to the drive.

The chunk size determines how large such a piece will be for a single drive. For example: if you choose a chunk size of 64 KB, a 256 KB file will use four chunks. Assuming that you have setup a 4 drive RAID 0 array, the four chunks are each written to a separate drive, exactly what we want.

This also makes clear that when choosing the wrong chunk size, performance may suffer. If the chunk size would be 256 KB, the file would be written to a single drive, thus the RAID striping wouldn't provide any benefit, unless manny of such files would be written to the array, in which case the different drives would handle different files.

In this article, I will provide some benchmarks that focus on sequential read and write performance. Thus, these benchmarks won't be of much importance if the array must sustain a random IO workload and needs high random iops.

Test setup

All benchmarks are performed with a consumer grade system consisting of these parts:

Processor: AMD Athlon X2 BE-2300, running at 1.9 GHz.

RAM: 2 GB

Disks: SAMSUNG HD501LJ (500GB, 7200 RPM)

SATA controller: Highpoint RocketRaid 2320 (non-raid mode)

Tests are performed with an array of 4 and an array of 6 drives.
- All drives are attached to the Highpoint controller. The controller is not used for RAID, only to supply sufficient SATA ports. Linux software RAID with mdadm is used.
- A single drive provides a read speed of 85 MB/s and a write speed of 88 MB/s
- The RAID levels 0, 5, 6 and 10 are tested.
- Chunk sizes starting from 4K to 1024K are tested.
- XFS is used as the test file system.
- Data is read from/written to a 10 GB file.
- The theoretical max through put of a 4 drive array is 340 MB/s. A 6 drive array should be able to sustain 510 MB/s.
About the data:
- All tests have been performed by a Bash shell script that accumulated all data, there was no human intervention when acquiring data.
- All values are based on the average of five runs. After each run, the RAID array is destroyed, re-created and formatted.
- For every RAID level + chunk size, five tests are performed and averaged.
- Data transfer speed is measured using the 'dd' utility with the option bs=1M.
Test results

Results of the tests performed with four drives:

Test results with six drives:

Analysis and conclusion

Based on the test results, several observations can be made. The first one is that RAID levels with parity, such as RAID 5 and 6, seem to favor a smaller chunk size of 64 KB.

The RAID levels that only perform striping, such as RAID 0 and 10, prefer a larger chunk size, with an optimum of 256 KB or even 512 KB.

It is also noteworthy that RAID 5 and RAID 6 performance don't differ that much.

Furthermore, the theoretical transfer rates that should be achieved based on the performance of a single drive, are not met. The cause is unknown to me, but overhead and the relatively weak CPU may have a part in this. Also, the XFS file system may play a role in this. Overall, it seems that on this system, software RAID does not seem to scale well. Since my big storage monster (as seen on the left) is able to perform way better, I suspect that it is a hardware issue.

because the M2A-VM consumer-grade motherboard can't go any faster.
Tagged as : linux mdadm raid chunk size benchmark performance

Read and Post Comments

20 Disk 18 TB RAID 6 Storage Based on Debian Linux

This system is no longer operational and has been decomissioned (2017)

This is my NAS storage server based on Debian Linux. It uses software RAID and 20 one terrabyte hard drives. It provides a total usable storage capacity of 18 terrabytes in a single RAID 6 array.

One of the remarkable side effects of using 20 drives within a single array is the read performance of over one gigabyte per second.

Case:	Norco RPC-4020
Processor:	Core 2 duo E7400 @ 2.8GHz
RAM:	4 GB
Motherboard:	Asus P5Q-EM DO
LAN:	Intel Gigabit
PSU:	~~Coolermaster 600 Watt~~ Corsair CMPSU-750HX 750 Watt (Coolermaster died)
Controller:	HighPoint RocketRAID 2340 (16) and on-board controller (6).
Disks:	20 x Samsung Spinpoint F1 (1 TB) and 2 x FUJITSU MHY2060BH (60 GB)
Arrays:	Boot: 2x 60 GB RAID 1 and storage: 20 x 1 TB RAID 6
RAID setup:	Linux software RAID using MDADM.
RAM:	4 GB
Read performance:	1.1 GB/s (yes this is correct, not a typo)
Write performance:	~~350~~ 450 MB/s. (suddenly faster after Debian update)
OS:	Linux Debian Squeeze 64-bit
Filesystem:	XFS (can handle > 16 TB partitions.)
Rebuild time:	about 5 hours.
UPS:	Back-UPS RS 1200 LCD using Apcupsd
Idle power usage:	about 140 Watt

Tagged as : norco RPC-4020 Linux RAID 6 MDADM 20 disk 18 TB

Read and Post Comments

The Impact of the MDADM Bitmap on RAID Performance

Introduction

The purpose of the bitmap

The performance impact

Some examples of the performance impact

Bitmap disabled

Bitmap enabled (internal)

External bitmap

Conclusion

Update on bitmap-chunks

'Linux RAID Level and Chunk Size: The Benchmarks'

Test setup

Test results

Analysis and conclusion

20 Disk 18 TB RAID 6 Storage Based on Debian Linux

This system is no longer operational and has been decomissioned (2017)

Solar Status

71 TiB NAS

20C/40T 128G Server

Projects

Categories

Archive