|
Page 1 of 2
Linux RAID Overview
In this KB, we are going to show you how to create a software RAID device in Linux. It's important to note that we will only cover software RAID devices in this KB and will be using the term "RAID device" to infer software RAID and not hardware.
RAID stands for Redundant Array of Inexpensive Disks. RAID was originally introduced as a way of gaining higher availability and redundancy by eliminating a single point of failure with the loss of a single volume/disk. Since its introduction, RAID has grown in popularity and today is widely used throughout the enterprise. For example, most SAN (storage array networks) technologies utilize a combination of hardware and software RAID techniques to present volumes or LUNs to hosts.
RAID devices provide a means of achieving higher availability by "mirroring" or copying data to multiple disks. When data is written to the disk, RAID instructs the system to write multiple copies of the data on different disks to prevent data loss should a drive fail.
RAID can not only increase your disk's availability, but it can also achieve higher I/O throughput. "Striping", allows you to overcome the limitations of a single disk's I/O. RAID stripes allows you to aggregate the bandwidth of several disks by writing blocks of data horizontally across the array spreading the I/O out evenly amongst the members of the array.
Depending on your particular need, you may opt for speed, redundancy or a combination. The different levels of RAID provide varying levels of speed and redundancy, so you can choose the setting that is right for you. Software RAID in Linux supports 7 different modes:
- Linear - Disks are appended to one another to form a larger device. There is no striping or redundancy in linear mode. The system simply treats the disks like an aggregate of disks, which spills over from one disk to the next as they fill up.
- RAID 0 - Is striping mode. In a RAID 0 configuration, read and writes are done in parallel to all the members of the array. Blocks of data written to the array are broken down into smaller chucks and written simultaneously to all members of the array. The same is true with reads (but in reverse), chunks of data are simultaneously fetched from the different members of the array and assembled to build the block. RAID 0 provides the greatest I/O but lacks redundancy. A single drive loss in the array will corrupt the data on all the remaining volumes so use with caution.
- RAID 1 - Is mirroring mode. In a RAID 1 configuration, data is mirrored between 1 or more volumes. RAID 1 provides the highest level of availability offered in RAID, however because of the redundancy, it is also the slowest. Because each drive contains a full copy of the data, you can loose multiple drives in the array (up-to N-1 failures) and continue to operate without a hiccup. It should go without saying that RAID 1 also uses a significant amount of storage since multiple copies of data are stored in multiple locations. It's also important to note that RAID 1 has very fast read times due to is ability to load-balance read requests amongst the members of the array.
- RAID 4 - Is striping with parity. In a RAID 4 configuration, data is striped across the members of the array (similar to RAID 0) however parity information is also calculated providing the ability to recover from the loss of a single disk. The redundancy of RAID 4 comes with a cost of N-1 in storage to the array (since one disk is used for parity, the total capacity of the array is reduced by 1.) RAID 4, like RAID 0, generally provides better I/O performance since read/writes are done in parallel. However, RAID 4 is seldom used because of the bottleneck caused by the parity disk. Since each I/O write requires the parity information to be updated on the parity disk, the total throughput of the array can be limited by the I/O of the parity disk (Use RAID 5 instead of RAID 4.)
- RAID 5 - Is also known as striping with parity and very similar to RAID 4. However in a RAID 5 configuration, the parity information is spread evenly across all the disks in the array eliminating the bottleneck of the RAID 4 parity disk. RAID 5 offers all the features of RAID 4 but without the bottleneck of the parity disk. Therefore, RAID 5 offers a better solution for speed with redundancy compared to that of RAID 4, but not quite the speed of Linux RAID 10.
- RAID 6 - Is striping with dual distributed parity. RAID 6 works the same as RAID 5 however with dual parity. Although less common, the additional parity information allows up-to two simultaneous disk failures of an array without data loss. Dual parity eliminates the risk associated during the rebuild process of a RAID 5 array. On a RAID 5 array, should another disk fail during the rebuild process, the array will be lost. Since RAID 6 can sustain multiple drive losses within the array, the loss of a single disk does not pose a risk of data loss should another disk fail during the rebuild. However, the added redundancy comes at a cost both in capacity and performance. RAID 6 requires a minimum of 4 drives in the array with the total capacity reduced by N-2. Also, because of the dual parity, write speeds can take significantly longer than that of RAID 5.
- RAID 10 (available in kernel v.2.6.9 and greater) could probably warrant an article all of it's own but I'll try and sum it up briefly so we can move on to the configurations. Linux RAID 10, aka MD (multiple disk) RAID 10 can be configured in "near" or "far" configurations depending on the number of disks available in the array. Near configurations are built from a odd number of disks and are more similar to that of RAID 1, while far configurations require even number of disks and are more similar to that of RAID 0 (the far configuration provides better performance than that of the near configuration.) Linux MD RAID 10 should not be confused with RAID 1+0 (combination RAID levels). MD RAID 10 in a far configuration (even number of disks in the array) will have performance characteristics similar to that of RAID 0 for reading, but half the speed for writing. Therefore generally speaking, RAID 10 offers better performance than that of RAID 5 however comes at a total cost in capacity of N/2.
Now that we know a little more about the different RAID levels, lets begin with our configuration.
|