• Register

A tool to detect encrypted Office documents?

+3 votes
2,290 views
Is anyone testing Office documents to see if they have been password protected? If so, what tools/scripts are you using?
asked May 29, 2014 by richardlehane (1,000 points)

1 Answer

+2 votes
 
Best answer

As part of our full-text indexing stack, we run Apache Tika on every resource to extract the text. When it hits an encrypted Office document, it throws an error something like this:

org.apache.poi.EncryptedDocumentException: Cannot process encrypted word file

...and so we can use that to spot the (very very few) encrypted documents in the archive. You can do the same from the command line:

$ tika password-protected.doc
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@119e7782
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:142)
    at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:418)
    at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:112)
Caused by: org.apache.poi.EncryptedDocumentException: Cannot process encrypted word file
    at org.apache.poi.hwpf.model.FileInformationBlock.<init>(FileInformationBlock.java:77)
    at org.apache.poi.hwpf.HWPFDocumentCore.<init>(HWPFDocumentCore.java:155)
    at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:218)
    at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:80)
    at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199)
    at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:167)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)

or, for DOCX

$ tika password-protected.docx
Exception in thread "main" org.apache.tika.exception.EncryptedDocumentException: Unable to process: document is encrypted
    at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:245)
    at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:167)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:142)
    at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:418)
    at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:112)

 

answered May 29, 2014 by anjackson (2,950 points)
selected May 29, 2014 by richardlehane
...