At MoMA we store all digital collections materials in the bagit format, and worked with Artefactual Systems to develop a fixity checker that runs on the same server that hosts our Archivematica instance. It checks all AIPs found in the Archivematica Storage Service for validity (which if you're familliar with the bagit format, you'll know, checks aspects other than file fixity - i.e. files that shouldn't be there, files that are missing), and emails any failures to Storage Service admin accounts.
This fixity checker sends metadata about its activity to our browser-based repository management system (also built by the lovely people at Artefactual Systems), so that we can index and retain a permenant record of all fixity checks, allowing reporting in various ways. In the event of a fixity error, our management app allows users to recover the AIP from a manually retrieved offline backup (i.e. an offsite LTO copy).
Here's a screenshot of the fixity audit dashboard widget (faulures are fake!):
And here is a screenshot of (part of) a full report:
This system is brand new, so I don't have much in the way of lessons learned at scale. All of this is being released by Artefactual Systems (free and open source, naturally).
There are however some major changes we are making to our storage system this year due to an incomming massive influx of collections materials in the area of 2PB. We are moving from a fully disk based storage system, to a heriarchical system that is primarily LTO with a fast disk cache. This will give us an exponentially lower $ cost per TB, lower energy bills, as well as a built-in method for making a third backup for off-site offline storage. This new system is a managed service, and actually provides file-level fixity checking that can be queried via an API. We will be working to get our fixity checker app talking to this API so that we can get the managed service's fixity check metadata into our management app in the same way we get our "own" fixity metadata in. Will will probably rely solely on the managed service's fixity checks, since it has a logic for ensuring things are checked that is more sensitive to the fact that the materials are on tape rather than disk. If you are comfortable with moving away from file level fixity checks there are some proprietary tape formats that have built-in fixity metdata that allows for fixity to be checked without reading the full bit streams (i.e. T10000C) – though that's not what we're doing. We're still big on file-level checks (for now). The kind of fixity Andy is talking about with HDFS is in my mind a given – or at least should be at any institution where you have an IT department. If your storage appliance is not doing some method of integrity checking (at the filesystem level, or at the appliance level via RAID, etc) then you're in big trouble and have bigger fish to fry than human readable fixity checks. This is a basic requirement of any enterprise-ish storage appliance, not just for "archival" applications.
The AVPS fixity tool serves a great purpose, even in our situation. I'm thinking about using it for monitoring our pre-ingest staging area, where things live prior to ingest into our repository. The stats in the report are pretty great not just for monitoring fixity, but for just keeping a handle on what's going in/out on a file share.