• Register

What storage hardware should I use for a LOCKSS node?

+1 vote

I am planning to set up a LOCKSS (“Lots Of Copies Keep Stuff Safe”) node for a small PLN (“Private Locks Network”) intended to safely store research and publication data from a few universities.

A partner in this PLN recommended against using a RAID (“Redundant Array of Inexpensive Disks”) for storage because RAID controllers can be buggy, leading to a complete loss of all data stored on this node. On the other hand, most data centers rely on RAso it cannot be all that bad. LOCKSS was designed for commodity hardware, but at a university, it can be more difficult to get a consumer-grade computer with a cheap external USB hard drive than to get a server with redundant storage or a virtual machine running on enterprise-class hardware.

In my mind, the two technologies should complement each other fairly well: LOCKSS guards against bitrot and other inconsistencies in the files, and RAID avoids data loss when a hard drive fails. Are there serious reasons for not running a LOCKSS node on a server with RAID storage? How about using a virtual machine?

Original Question from DP Stack Exchange by Christian Pietsch.

asked Jul 23, 2014 by jeffersonbailey (380 points)

2 Answers

0 votes

De Montfort University is a part of the UK LOCKSS Alliance and public LOCKSS network. We considered and rejected using RAID and currently have 4 separate disks in a separate server.

  • If the RAID disks were bought at the same time they could all be a part of the same batch with similar vulnerabilities;
  • The duplication within RAID can be considered wasteful given the duplication of content with a PLN.

LOCKSS can be put on a virtual server, but is a busy service and may not play nicely with any other co-hosted servers.

Original Answer from Digital Preservation Stack Exchange by Fulup

answered Jul 23, 2014 by jeffersonbailey (380 points)
Original Comments from Digital Preservation Stack Exchange:

- Christian Pietsch: Thank you for answering the VM aspect of my question. Can you please elaborate on how you use the 4 separate disks you mentioned? Is there any redundancy between them locally, e.g. do you rsync one drive to another regularly or something like that?

- Fulup: We have just added disks when they become full, there is no synching between them. The first disk was about 250GB, when that filled up we added a 750-GB disk. As that too became full we added two 2TB disks and add content to them evenly. As they fill we are likely to swap the first two disks for two larger ones and migrate the content onto them. At that point, however, we might also be looking at other disk arrangements (e.g RAID).
0 votes

I work with the MetaArchive, a PLN with 20+ members distributed internationally.

RAID is not a perfect solution. A bad controller can destroy data on a drive. Another problem is the write hole phenomenon in certain configurations like RAID1 and RAID5, where data being written between drives during a power failure is lost silently. And the potential of loss due to random bit flips is not completely mitigated with RAID. Even with multiple levels of parity information, a full array reconstruction can contain an error during the read/write operations. See here for an explanation of the math.

Despite all of those imperfections, RAID is still a very good tool to use for us.

  • RAID allows you to construct storage pools much larger than a single hard drive. The hard drives currently max out at 4tb. MetaArchive caches are 11-22tb. Constructing a large pool means the cache admin does not have to manage exactly where the data resides on a disk, which can become complex as collections grow over time. We can manage collections at a cache level instead of tracking Archival Unit 1 on Disk 2, Archival Unit 2 on Disk 4, etc.
  • With RAID5 and RAID6, if 1 hard drive fails, the array can be rebuilt by plugging in a new drive and using the remaining drives to reconstruct the array automatically. Since MetaArchive caches are distributed internationally, this is much cheaper than waiting for other caches to request new copies of data.

We believe we are protect from RAID's imperfections because our LOCKSS system maintains 7 copies of all content. Write hole, bit flip, and controller errors can be found and healed when LOCKSS compares those copies.

We've seen success with a variety of configurations including virtual and rack-mounted. For reference, our recommended hardware configuration for 2014 is on page 3 of this document. http://www.metaarchive.org/public/resources/charter_member/ma_2014technicalspecifications.pdf


answered Jul 23, 2014 by nkrabben (1,990 points)