• Register

Unusual sources of file format signatures

+5 votes
1,849 views

I've been compiling a list of unusual sources (i.e. not places where you expect them like PRONOM or libmagic) of file format signatures.

E.g. mime type applications often have them ("magic number" is one of the fields in the application template). Here is an example - http://www.iana.org/assignments/media-types/application/vnd.olpc-sugar

Often software applications that aren't focused on file type identification also have their own lists of format signatures.

E.g. ClamAV has a pretty good set: https://github.com/vrtadmin/clamav-devel/blob/master/libclamav/filetypes_int.h

Camlistore (a software project worth a look if you haven't seen it already) also has one: https://github.com/bradfitz/camlistore/blob/master/pkg/magic/magic.go

SQLite has an application_id pragma for applications that use it as a file format container to declare themselves. The SQLite source has a list of these: http://www.sqlite.org/src/artifact?ci=trunk&filename=magic.txt

Are there any other strange places where you've found a set of file format signatures?

asked Jun 6, 2014 by richardlehane (1,000 points)
edited Jul 17, 2014 by richardlehane
BTW your Camilstore link actually seems to go to the ClamAV URL.
fixed, thanks!

3 Answers

+3 votes
Quite a few I've come across over the 2.5 or so years of doing PRONOM stuff:

Gary Kessler's list: http://www.garykessler.net/library/file_sigs.html

Trid (FFID tool) has a similar paradigm to DROID and a huge library of definitions http://mark0.net/soft-trid-e.html

Source code of specialist tools, for example - http://www.freemxf.org/mxflib-docs/mxflib-1.0.0-docs/dict_8h-source.html helped me determine how to distinguish between different MXF 'operational patterns.'

Random lists found online e.g. - http://code.google.com/p/dbscope/source/browse/trunk/DBScopeMagic.xml?r=11 ; http://www.computec.ch/projekte/filerecon/?s=database ; http://stam.blogs.com/8bits/2007/11/file-headers.html

Indirectly via lists of published format specs - http://graphcomp.com/info/specs/

 

Of course I can't just take this stuff wholesale - everything that goes into PRONOM needs the extra research, testing and verification, but it's useful to see what others have gathered
answered Aug 29, 2014 by dclipsham (380 points)
All good sources. There are a few more linked from the Sustainability of Digital Formats website: http://www.digitalpreservation.gov/formats/intro/resources.shtml#central4.

Of course, we check out any info as much as possible before we add it to a format description.
+2 votes

I use this website for quick reference. http://filesignatures.net/

answered Jun 6, 2014 by thorsted (560 points)
+1 vote
found another good one, the "mime sniffing standard": http://mimesniff.spec.whatwg.org/
answered Aug 23, 2014 by richardlehane (1,000 points)
That's a good source, although given how widely deployed it is, I'm not sure it should count as 'unusual'! ;-)
fair enough, & so I will resist the temptation to give myself "best answer" for that one
...