• Register

Tools for identifying obfuscated files, specifically password protected and encrypted formats?

+2 votes
627 views

I am looking for tools capable of detecting and flagging up encrypted and password protected files in any given collection.

First there is a necessity for a tool that is capable of doing this and understanding what is out there (and its range and abilities). Second there is a requirement for usability and distribution to a probable non-expert audience.

Note: This question was first asked in 2013 and a handful of answers canbe found here: http://anjackson.github.io/zombse/062013%20Libraries%20&%20Information%20Science/static/questions/1445.html 

asked Nov 23, 2014 by ross-spencer (190 points)

2 Answers

+1 vote

(Adding my original answer so I can update it.)

There are two ways of answering this question. Firstly, for formats that are known to support encryption (e.g. PDF, ePub, ZIP, etc.), you might expect that characterisation tools can report the presence of encryption. This is true, but in practice many of the tools perform poorly because encryption is used in two ways: as well as full encryption (requiring a specific password to unpack a bytestream), encryption can be used to support obfuscation (where the encryption algorithm does not require a password, or only requires a known, shared, password). For example, PDF uses a default empty string password in some cases, and the JHOVE characterisation tool cannot distinguish between this and full encryption.

The second way of answering this question is as follows - are there any ways of spotting encryption in general, i.e. without understanding the format of the bytestream. The answer to this appears to be 'sometimes'. Methods such as the visualisation of bytestream entropy have been used to spot encrypted regions of files, and so this kind of information-theoretical analysis may be useful in some cases. Formally, however, encryption is statistically indistinguishable from compression, and therefore this approach will lead to a lot of false-positives in the presence of compressed data.

There appear to be very few established tools that perform these tasks.

As @Euan pointed out, there are in fact two approaches to my first way of answering this question. You can use/create analysis software like JHOVE, or you can instead use the 'native' software that implements the format and observe its behaviour (e.g. 'Does Adobe Reader ask for a password?').

UPDATE: During the SCAPE project, the DRMLint tool was developed for this purpose, and is now called Flint.

answered Nov 23, 2014 by anjackson (2,950 points)
0 votes

In addition to Andy's answer above, we recently did some more work on the detection of encrypted EPUB files with the Epubcheck tool (also used by Flint), see the following blog:

https://researchkb.wordpress.com/2015/03/13/policy-based-assessment-of-epub-with-epubcheck/

I also contains a link to a set of example files that may be use useful for testing.

answered Apr 1, 2015 by johanvanderknijff (2,060 points)
...