• Register

What are current best practices for acquiring & preserving Google Docs?

+2 votes
602 views
What are current (2018) best practices for acquiring and preserving records created in Google's cloud platform?

In particular, are there methods that preserve the rich metadata around document creation, editing, and commenting that exist in the native apps?
asked Feb 5, 2018 by ChristiePeterson (580 points)

2 Answers

+5 votes
Some background on Google docs. I haven't been able to get an example of a native google docs file. If you synced Google Drive to your computer, all '.gdocs' files are json files that only store the linke to the file in the editor.

I've found references to the data format as the kix format. It seems to be stored as a long series of diff operations. There's a great illustration of how this works here. http://features.jsomers.net/how-i-reverse-engineered-google-docs/

Jenny Mitcham experimented with exporting Google Docs into various formats and what the Takeout service does. http://digital-archiving.blogspot.co.uk/2017/04/how-can-we-preserve-google-documents.html That's probably the best source of digipres knowledge on these formats at the moment.

edit: I dug a bit further into the James Somers post about the JSON representation of a Google Doc. If you have edit access to a Google Doc, you can use the following URL to see the JSON.

https://docs.google.com/document/d/{doc_id}/showrevision?id={doc_id}&end={end}&start={start}
doc_id is the long alpha numeric string that identifies the documents
start is a number between 0 and the last revision number, which can be in the thousands
end is a number between start and the last revision number

If you craft that URL, a JSON file will be downloaded that shows how the data is stored. This JSON file does not store everything. I think comments are in a separate datastore, although I haven't figured how they're referenced yet.
answered Feb 5, 2018 by nkrabben (1,990 points)
edited May 11, 2018 by nkrabben
0 votes
rclone can mass download Google Docs and Spreadsheets and do convert them into Microsoft Office or Libre Office formats on the fly, see https://rclone.org/drive/#import-export-of-google-documents

Of course that will loose the edit history, but you will get a reasonable current snapshot of a document and can perhaps regularly download to track development over time.
answered May 29 by despens (990 points)
...