I work with the MetaArchive, a PLN with 20+ members distributed internationally.
RAID is not a perfect solution. A bad controller can destroy data on a drive. Another problem is the write hole phenomenon in certain configurations like RAID1 and RAID5, where data being written between drives during a power failure is lost silently. And the potential of loss due to random bit flips is not completely mitigated with RAID. Even with multiple levels of parity information, a full array reconstruction can contain an error during the read/write operations. See here for an explanation of the math.
Despite all of those imperfections, RAID is still a very good tool to use for us.
-
RAID allows you to construct storage pools much larger than a single hard drive. The hard drives currently max out at 4tb. MetaArchive caches are 11-22tb. Constructing a large pool means the cache admin does not have to manage exactly where the data resides on a disk, which can become complex as collections grow over time. We can manage collections at a cache level instead of tracking Archival Unit 1 on Disk 2, Archival Unit 2 on Disk 4, etc.
-
With RAID5 and RAID6, if 1 hard drive fails, the array can be rebuilt by plugging in a new drive and using the remaining drives to reconstruct the array automatically. Since MetaArchive caches are distributed internationally, this is much cheaper than waiting for other caches to request new copies of data.
We believe we are protect from RAID's imperfections because our LOCKSS system maintains 7 copies of all content. Write hole, bit flip, and controller errors can be found and healed when LOCKSS compares those copies.
We've seen success with a variety of configurations including virtual and rack-mounted. For reference, our recommended hardware configuration for 2014 is on page 3 of this document. http://www.metaarchive.org/public/resources/charter_member/ma_2014technicalspecifications.pdf