• Register

How/where to store metadata about optical media sector layout in METS/PREMIS

+3 votes
580 views

I'm drafting a METS/PREMIS profile for images/rips of optical media images (ISOs for data sessions; WAV or FLAC files for audio). One of the pieces of metadata I'd like to include is the output of the cd-info tool, which contains information about the sector layout of the disc. Here's an example:

https://gist.github.com/bitsgalore/9a2838481574c040f7c4b7da4ed59926

However I'm unsure how (and where) to store this info in METS. My initial idea was something like this:

  • Create a METS techMD element which is associated with the structmap div element that encompasses all files that were extracted from the physical disc (typically one ISO image and/or multiple audio files).
  • Inside this techMD element, create a PREMIS  OBJECT instance with xsi:type="premis:representation" (since it describes the disc as a whole, and not an individual ISO image or audio file!) 
  • Then use PREMIS unit 1.5.7 objectCharacteristicsExtension as a container for wrapping the iso-info output.


The problem here is that the PREMIS 3.0 data dictionary says that the objectCharacteristicsExtension unit (and also its parent unit 1.5 objectCharacteristics) is "Not Applicable" for the intellectual Entity and Representation object types!

This makes me wonder how others are handling this. Is this an oversight of PREMIS, or is there some other (possibly 
better) way to do this that I've overlooked?

Any suggestions appreciated!

asked Feb 5, 2018 by johanvanderknijff (2,060 points)
edited Feb 5, 2018 by johanvanderknijff

4 Answers

+2 votes

We're working on a METS profile related to this right now. 

 

Our approach is to add this kind of techinical metadata to www.wikidata.org and reference it using <mdRef> in a <techMD> element in an <amdSec>.

An example (work in progress) related to this follows:

<mets:amdSec ID="Format-f4cf59e20612806030c7b7104e2eb7efdaad1e65847a15903de7676a1d521fbf892e8a8870cedd4c48732371bba71974a27387f3a5276f318e4b22520bc761d4"> 
            <mets:techMD ID="FileFormat-f4cf59e20612806030c7b7104e2eb7efdaad1e65847a15903de7676a1d521fbf892e8a8870cedd4c48732371bba71974a27387f3a5276f318e4b22520bc761d4">
               <mets:mdRef ID="Q3146723" LABEL="Raw disk image" LOCTYPE="URL" MDTYPE="OTHER" xlink:href="https://www.wikidata.org/wiki/Q3146723" />
            </mets:techMD>
</mets:amdSec>
 
I haven't yet been able to find the appropriate item in Wikidata for your particular scenario but since its a wiki, it can be added!
 
Once the amdSec is in the METS it can be referenced in other elements using the ADMID attribute.  That attribute can also reference multiple amdSec elements (as you showed in your tweet recently) so you should be able to document everything you need using this approach. 
 
 
Wikidata may not be the best place for the disc-specific metadata, but the general approach should work (use of external references) and some aspects, such as "multi-session disc" could certainly be referenced to wikidata. 
 
The track-specific information looks to be descriptive metadata as well as technical, so could probably go in your descriptive catalogue even if you also descide to put it in your technical metadata section
 
 
 
answered Feb 5, 2018 by euanc (3,910 points)
edited Feb 5, 2018 by euanc
0 votes

Euan's response made me wonder whether it would be possible to put the cd-info output directly in a METS techMD/mdWap element. Turns out that won 't work either. From the mdWrap definition in the METS schema:

Such metadata can be in one of two forms: 1) XML-encoded metadata, with the XML-encoding identifying itself as belonging to a namespace other than the METS document namespace. 2) Any arbitrary binary or textual form, PROVIDED that the metadata is Base64 encoded and wrapped in a binData element within the internal descriptive metadata element.

Form 1 doesn't apply to my use case, because iso-info output is not XML. Form 2 (binData element) would force me to Base64 encode the iso-info output. This is possible, but it would be a major pain in the ass (mainly from accessibility point of view).

Another option (mentioned by Euan on Twitter) is to serialize the cd-info output to XML. Not great either, if only because AFAIK there's no documentation that fully describes cd-info's output fields. (Having said that, the parser code in Iromlab could be a reasonable starting point).

Of course nothing would stop me from making the XML serialization ridiculously dumb (e.g. dump the whole contents of the iso-info file in one single elelement).

answered Feb 6, 2018 by johanvanderknijff (2,060 points)
edited Feb 6, 2018 by johanvanderknijff
Hi Johan,

If you want to be perfectly PREMIS-orthodox, I guess you would have to store sector information in a contentLocation semantic unit attached to PREMIS Bitstreams that would correspond to each track...

I would personnally have a METS <div> for each track and attach to it a <techMD> section with a premis:object inside, type=Bitstream, and use the storage semantic container to keep it.

All the best!

Bertrand
0 votes

On Twitter Bertrand Caron made the following suggestion (source here):

Ok, so just using METS <area>s with BEGIN / END attributes (and possibly requesting thé board to add a new value to the allowed values for the BETYPE attribute) could work...?

This would actually work provided there is a BETYPE attribute that corresponds to 2048-byte sectors. Also the METS primer only mentions the area element in the context of file elements; not immediately clear to me if/how this would work for a representation that does not correspond to a a file (need to check the METS schema in more detail for that).

 

answered Feb 7, 2018 by johanvanderknijff (2,060 points)
0 votes

So here's a little update:in the end I serialized the cd-info output to XML, and embedded that in a METS techMD/mdWrap element. Here's an example for an 'enhanced' CD with 2 sessions that contain 20 audio tracks and 1 data track, respectively:

    <mets:techMD ID="techMD_22">
      <mets:mdWrap MIMETYPE="text/xml" MDTYPE="OTHER" OTHERMDTYPE="cd-info output">
        <mets:xmlData>
          <cd-info:cd-info>
            <cd-info:trackList>
              <cd-info:track>
                <cd-info:trackNumber>1</cd-info:trackNumber>
                <cd-info:MSF>00:02:00</cd-info:MSF>
                <cd-info:LSN>000000</cd-info:LSN>
                <cd-info:type>audio</cd-info:type>
              </cd-info:track>
              <cd-info:track>
                <cd-info:trackNumber>2</cd-info:trackNumber>
                <cd-info:MSF>01:22:02</cd-info:MSF>
                <cd-info:LSN>006002</cd-info:LSN>
                <cd-info:type>audio</cd-info:type>
              </cd-info:track>
              ::
              ::
              <cd-info:track>
                <cd-info:trackNumber>20</cd-info:trackNumber>
                <cd-info:MSF>55:23:38</cd-info:MSF>
                <cd-info:LSN>249113</cd-info:LSN>
                <cd-info:type>audio</cd-info:type>
              </cd-info:track>
              <cd-info:track>
                <cd-info:trackNumber>21</cd-info:trackNumber>
                <cd-info:MSF>59:41:40</cd-info:MSF>
                <cd-info:LSN>268465</cd-info:LSN>
                <cd-info:type>data</cd-info:type>
              </cd-info:track>
              <cd-info:track>
                <cd-info:trackNumber>170</cd-info:trackNumber>
                <cd-info:MSF>63:08:25</cd-info:MSF>
                <cd-info:LSN>283975</cd-info:LSN>
                <cd-info:type>leadout</cd-info:type>
              </cd-info:track>
            </cd-info:trackList>
            <cd-info:analysisReport>
              <cd-info:cdExtra>True</cd-info:cdExtra>
              <cd-info:multiSession>True</cd-info:multiSession>
              <cd-info:mixedMode>False</cd-info:mixedMode>
              <cd-info:fullReport>No CD-TEXT on Disc.
                CD-Plus/Extra
                session #2 starts at track 21, LSN: 268465, ISO 9660 blocks: 283825
                ISO 9660: 283825 blocks, label `ELL2                            '
              </cd-info:fullReport>
            </cd-info:analysisReport>
          </cd-info:cd-info>
        </mets:xmlData>
      </mets:mdWrap>
    </mets:techMD>

A full example METS file (which also includes EbuCore audio metadata, DFXML metadata and Isolyzer output) can be found here:

https://gist.githubusercontent.com/bitsgalore/ef164e953ca52218930a4bc512bfe96c/raw/c225a00ef997c99f1898d1f80783f334d357d5c9/mets-cdinfo-dfxml.xml

Thanks again to Euan and Bertrand for your suggestions!

answered Feb 8, 2018 by johanvanderknijff (2,060 points)
...