• Register

What do you (or should you do) with ‘thumbs.db’ and other hidden system files?

+3 votes

Recently we (a national archive) received a transfer of born-digital records which included ‘thumbs.db’ files.

As a refresher – thumbs.db is a Windows thumbnail cache file.  It is generated by the system when ever a user selects ‘thumbnail view’ in a Windows Explorer folder, and the thumbs.db file lives in the directory for which it contains the information about the thumbnails for.  It is normally a ‘hidden’ file, but the file can be made visible with system settings.

If the thumbs.db file is deleted a new one is created when ever the contents of that that directory is viewed in ‘thumbnail view’ mode.

Since it is an automatically created, hidden file created as needed, my inclination is to delete the files and NOT ingest them into our digital preservation system.  They seem to serve no other function than to allow the user of a Windows machine to view the contents of a directory in a different way.

HOWEVER – the thumbs.db file can hold information about files which no longer exist in the directory.  Traces of information can linger in a thumbs.db file.  The file is amended when new content is added to the directory and thumbnail viewed, but when files are removed from the directory information is not subtracted from the file.  Therefore there may be indications of files which once existed.  According to Wikipedia a paedophile was caught using image information contained in a thumbs.db file even though the actual images were deleted.

From an ‘evidence’ perspective, thumbs.db files might give an interesting clue into what the user of the computer was doing.

From and ‘archival’ perspective I’m dubious – these are hidden files which are not (as far as I know) actively used by end users - and therefore not records.  “Hmmm,” says Mr End User, “What was the name of that photograph I was looking at last month?  I know - I’ll look into my thumbs.db file and get the file name.”

Questions: do any of you have experience with thumbs.db files?  Do any of you have written policies on hidden system files you can share with us?


asked Jun 17, 2014 by crouchmi (270 points)

2 Answers

+4 votes

Government archive here.

I think these kind of system files can be records - your paedophile example is a case in point - but we'd generally purge them anyway as we only accept transfers of records that have been appraised (in a disposal schedule) as having archival value. These kind of files tend to be stowaways and not deliberately transferred by the agency under a disposal schedule. We don't have a specific written policy for this but it falls under general guidance issued by our organisation on Normal Administrative Practice : this is a provision in our legislation that permits the destruction of ephemeral information including "computer support records" such as these.

That said... I think there are some circumstances in which we might do a more thorough evaluation of whether thumbs.db or other system files should be kept - e.g. if there were legal proceedings involved, or if we accepted the transfer of a controversial minister's hard drive, etc. And, if I need to peek inside a thumbs.db file, I'll use this little utility

answered Jun 18, 2014 by richardlehane (1,000 points)
+1 vote

Thumbs.db is part of a class of files OS's uses to store temporary metadata. Another example of these is .DS_STORE on OS X. These can store information like thumbnails and view preferences for folders. Potentially useful, but typically incidental.

At MetaArchive, we have written a script to identify system files for removal. Our systems use both LOCKSS and Bags, both of which are based on checksums. Including files that may be silently edited causes several administrative problems.

  1. We revisit members' collections regularly to update the LOCKSS caches with new files. Filenames with new checksums are identified as new versions and added to the collection. System files changed frequently by the OS will have many many versions of low to no informational value requiring needless extra computation, space, etc.
  2. We also provide tools for members to compare the LOCKSS manifests to Bag manifests. Removing these potentially ever-changing files helps us limit false warnings of corruption.

Here's a link to the script we created. The list of files to ignore can be augmented by adding to the data dictionary of rules at the beginning of the file.

Our list of files to ignore is based in part on the OS generated files of .gitignore configurations.  Developers committing packages to a repository have similar needs to ignore files of low to no informational value that complicate administration.

answered Jun 25, 2014 by nkrabben (1,990 points)
edited Jun 27, 2014 by nkrabben