• Register

What are current best practices for acquiring & preserving Google Docs?

+2 votes
430 views
What are current (2018) best practices for acquiring and preserving records created in Google's cloud platform?

In particular, are there methods that preserve the rich metadata around document creation, editing, and commenting that exist in the native apps?
asked Feb 5 by ChristiePeterson (580 points)

1 Answer

+4 votes
Some background on Google docs. I haven't been able to get an example of a native google docs file. If you synced Google Drive to your computer, all '.gdocs' files are json files that only store the linke to the file in the editor.

I've found references to the data format as the kix format. It seems to be stored as a long series of diff operations. There's a great illustration of how this works here. http://features.jsomers.net/how-i-reverse-engineered-google-docs/

Jenny Mitcham experimented with exporting Google Docs into various formats and what the Takeout service does. http://digital-archiving.blogspot.co.uk/2017/04/how-can-we-preserve-google-documents.html That's probably the best source of digipres knowledge on these formats at the moment.

edit: I dug a bit further into the James Somers post about the JSON representation of a Google Doc. If you have edit access to a Google Doc, you can use the following URL to see the JSON.

https://docs.google.com/document/d/{doc_id}/showrevision?id={doc_id}&end={end}&start={start}
doc_id is the long alpha numeric string that identifies the documents
start is a number between 0 and the last revision number, which can be in the thousands
end is a number between start and the last revision number

If you craft that URL, a JSON file will be downloaded that shows how the data is stored. This JSON file does not store everything. I think comments are in a separate datastore, although I haven't figured how they're referenced yet.
answered Feb 5 by nkrabben (1,960 points)
edited May 11 by nkrabben
...