• Register

When is "Preservation Format" a useful concept in digital preservation?

+4 votes
The concept of "preservation format" is a big part of discussion in some areas of work in digital preservation and not in others. I hear a good bit of discussion of "preservation formats" in terms of digital image & moving image formats for instance. However, other areas of digital preservation involve working with troves of heterogeneous material where the term doesn’t come up as much. For example, while Web Archives have WARC as a format, it's really a bundle of whatever the crawler hoovered up and bundled. Similarly, as more and more archives and manuscript collections focus on disk imaging they end up creating bundles of myriad digital formats.

So, is it that the concept of "preservation format" is useful in some areas but not in others? If so, what kind of principle separates the cases when it is from the cases when it isn't? Or, is it that the concept should either be used more broadly than it is or less so? That is, is the concept of a "preservation format" something that is universally relevant, relevant for some types of content or becoming irrelevant?
asked Nov 18, 2014 by tjowens (2,360 points)

4 Answers

+4 votes

I think it's important to recognise that all content - regardless of which format is used to store it - requires active management to preserve it long-term.  There is no 'preservation format' which abrogates your responsibility to actively manage your content.  There is no file format silver bullet -- "I migrated everything to pdf/a, burned it all to a gold CD and stuck them all in a shoebox which I keep under my desk and use as a foot rest."  NOT.

answered Dec 15, 2014 by crouchmi (270 points)
Yes, totally agree. I should have said that I understand  "Preservation Format" to imply a commitment to support and maintain access to that format. We can try to choose formats that we imagine should be easier to maintain over time, but it's the commitment that makes a "preservation format" special.
+3 votes
The idea of the "Preservation format" is only useful if you get to choose (or influence) the format of the material. This is common when the material is being digitised by the same organisation that is going to preserve it. The term is often used in contrast to the "Access format" (often a lossily-compressed version), and so may also be referred to as the "Preservation master".

Similarly, many archives operate under a deposit model for digital content where the primary "master" and "access" forms are decided by the original authors, and where the repository has to make some policy decisions about how to understand and handle those authorial decisions. In this case, "preservation formats" tend to go hand-in-hand with normalisation strategies, generating a secondary copy in a format that is believed to have a better chance at longeivity (even if this does mean some risk of loss).

For web archives, forensics, etc. it's not up to us to choose the format. We can still call the original WARCs/images the "preservation master", but the formats within reflect the values and choices of the original authors, and cannot be overridden lightly (apart from anything else, it's usually not possible/feasible to negociate with those original authors in these cases).
answered Nov 18, 2014 by anjackson (2,950 points)
+2 votes

If you have control of the process creating the files (eg digitisation) then it's worth thinking about characteristics which will hopefully make it easy to preserve.  At other times you will have little or no control over what it going to be given to you, so you'll just have to lump it.  Even then, some of the material may be better presented in derivative formats (eg a lower res JPEG rather than hi res TIFF or JP2, not trying to push out broadcast quality video or audio and similar), so in that sense the original is a preservation format (and some users may still wish to consume it in the original).

answered Nov 18, 2014 by DavidUnderdown (790 points)
+1 vote
A preservation format only makes sense if the file system is not able to otherwise meaningfully manage the data.

WARC for web archiving is such an example, it records full HTTP traffic, not files.

Otherwise, migration to a preservation formats will typically prevent an artifact from being used in the same environment it was created in or created for, and then I wonder what was actually preserved. :)
answered Mar 26, 2015 by despens (930 points)