• Register

Is there any best practice/guidance on where to store, in METS, metadata about digitisation/digital creation processes?

0 votes
300 views

For digitised files, what is the best way to capture metadata about the digitisation process (equipment/software, settings, dates, operator names, etc) in METS? Should it be captured?

I'm not just thinking of image based digitisation, so a couple of example scenarios would be:

  1. Digitising a newspaper to a TIFF or JP2?
  2. "Digitising" sound content from carrier form to a WAVE file?

My thoughts were around using the <techMD> or <digiprovMD> elements, however neither seem to completely fit.

The LoC’s METS primer suggests "<techMD> records technical metadata about a component of the METS object, such as a digital content file” (http://www.loc.gov/standards/mets/METSPrimerRevised.pdf#page=41). So this appears to be aimed at the technical characteristics of the files being preserved.

Just below that, in the same document (http://www.loc.gov/standards/mets/METSPrimerRevised.pdf#page=43), is says the "<digiprovMD> can be used to record preservation-related actions taken on the various files which comprise a digital object (e.g., those subsequent to the initial digitization of the files such as transformation or migrations) or, in the case of born digital materials, the files' creation”. So for digitised material, this seems to be about actions *after* digitisation, not the digitisation itself. On the contrary, for born-digital content the wording implies it can be used to capture details of content creation.

Perhaps I've missed something obvious (or used the wrong search terms), but I'm not finding much in the way of digitisation provenance capture.

Does anyone else record such info in METS, and if so, how so?

asked Feb 7 by petemay (160 points)

1 Answer

0 votes

It might be a bit late to answer this, but at the National Library of Luxembourg we have been using METS/ALTO for digitization projects for more than 10 years. All the technical metadata about and resulting from the digitization project is stored in MIX schema elements within a techMD section, as shown below:

<amdSec ID="IMGPARAM00004">
    <techMD ID="IMGPARAM00004TECHMD">
      <mdWrap MIMETYPE="text/xml" MDTYPE="NISOIMG">
        <xmlData>
          <mix:mix xmlns:mix="http://www.loc.gov/mix/">
            <mix:BasicImageParameters>
              <mix:Format>
                <mix:MIMEType>image/tiff</mix:MIMEType>
                <mix:ByteOrder>little-endian</mix:ByteOrder>
                <mix:Compression>
                  <mix:CompressionScheme>1</mix:CompressionScheme>
                </mix:Compression>
                <mix:PhotometricInterpretation>
                  <mix:ColorSpace>1</mix:ColorSpace>
                </mix:PhotometricInterpretation>
              </mix:Format>
              <mix:File>
                <mix:ImageIdentifier imageIdentifierLocation="//Dss2/d/docworks/EXPORT/bnl_lw/1900-09-10_01/tif">1900-09-10_01-00004.tif</mix:ImageIdentifier>
                <mix:FileSize>25003209</mix:FileSize>
                <mix:Orientation>1</mix:Orientation>
                <mix:DisplayOrientation>1</mix:DisplayOrientation>
              </mix:File>
            </mix:BasicImageParameters>
            <mix:ImageCreation>
              ...
              <mix:ScanningSystemCapture>
                <mix:ScanningSystemHardware>
                  <mix:ScannerManufacturer>ASSY.SA</mix:ScannerManufacturer>
                  <mix:ScannerModel>
                    <mix:ScannerModelName>Digitizing line</mix:ScannerModelName>
                    <mix:ScannerModelNumber>DL 3000</mix:ScannerModelNumber>
                    <mix:ScannerModelSerialNo>2006 11 25</mix:ScannerModelSerialNo>
                  </mix:ScannerModel>
                </mix:ScanningSystemHardware>
                <mix:ScanningSystemSoftware>
                  <mix:ScanningSoftware>LIBFORMAT (c) Pierre-e Gougelet + Page improver</mix:ScanningSoftware>
                  <mix:ScanningSoftwareVersionNo>1.0.2519.16684</mix:ScanningSoftwareVersionNo>
                </mix:ScanningSystemSoftware>
                <mix:ScannerCaptureSettings>
                  <mix:PixelSize>0.0846666</mix:PixelSize>
                  <mix:PhysScanResolution>
                    <mix:XphysScanResolution>300</mix:XphysScanResolution>
                    <mix:YphysScanResolution>300</mix:YphysScanResolution>
                  </mix:PhysScanResolution>
                </mix:ScannerCaptureSettings>
              </mix:ScanningSystemCapture>
            </mix:ImageCreation>
            <mix:ImagingPerformanceAssessment>
               ...
            </mix:ImagingPerformanceAssessment>
             ...
          </mix:mix>
        </xmlData>
      </mdWrap>
    </techMD>
  </amdSec>
 
A complete METS/ALTO sample package can be found on our tender information page: downloads.bnl.lu/tend2018.html
answered Sep 17 by roxponinja (140 points)
That's right, content creation processes as well as transfer, 'preconditioning' actions prior to ingest and preservation actions after ingest are typically recorded in <techMD> or <digiprovMD> elements.
You can either
- as Roxana indicated, produce this information as internal metadata, extract them with Jhove, and keep the outcome with other technical metadata in a <techMD> element, or
- record this as event- and agent-related information, by embedding PREMIS (or equivalent) metadata in a <digiprovMD> element (the PREMIS Editorial Committee provides guidelines to do so at https://www.loc.gov/standards/premis/guidelines2017-premismets.pdf)
…or even do both (that's what BnF does).
You can see an example of embedded PREMIS Event ant Agent metadata in a <digiprovMD> at http://bibnum.bnf.fr/mets/filnumconsa_producerPackage_initialDelivery_example_20160615.xml (note that, as the METS example corresponds to a SIP, there is no <techMD> sections, as they are added at ingest). See also the BnF reference document (unfortunately, it’s in French!) for describing production processes in PREMIS: http://www.bnf.fr/documents/ref_num_metadonnees_mets.pdf#page=41.
Also note that the PREMIS Editorial Committee maintains a controlled vocabulary of event types at http://id.loc.gov/vocabulary/preservation/eventType.html providing general terms for common actions performed on digital objects that affect their long-term preservation.
...