• Register

Is simple fixity information valuable to digital stewards?

+1 vote
From #digpres14:

The BagIt specification includes an oxsum in the transfer materials.  This oxsum is the size of the bitstream divided by the number of files.  Why is simple fixity information like this included when each file also includes a checksum, a much more precise measure of fixity?

More generally, why do we care about this low-level fixity information?
asked Jul 23, 2014 by SpencerGoodwin (450 points)

2 Answers

+1 vote
From #digpres14:

1) This information is very important!  If you are receiving a very large transfer, it can take a long time and a lot of resources to actually perform checksum validation on all the files.  The oxsum generally gives you a very simple way to eyeball is the detailed validation is necessary.  As a first step, it let's you know if more work needs to be done on the ingested files

2) We once received a bag with 1.2 million files from Portico.  The oxsum was very useful in this case!
answered Jul 23, 2014 by SpencerGoodwin (450 points)
+1 vote

Redundancy is incredibly helpful with fixity for a couple of reasons.

One piece of fixity information isn't infallible. Even checksums can be defeated through collision attacks. For that reason, it can be helpful to store more than one type of checksum such as both md5 and sha-256 and/or other fixity values like an oxsum, file names, and file system metadata.

Different types of fixity information require different computing resources. Running and comparing checksums on a million files will take a significant amount of time. It's much cheaper to generate an oxsum, compare file names, and check file modification metadata, than computing checksums. It might be advantageous for your workflow to check those cheaper values frequently and use them as an indicator whether to run a full checksum routine, in addition to less frequent but regular full checksum routines.

answered Jul 23, 2014 by nkrabben (1,760 points)