• Register

Best Practices for Accessioning Email from Web Applications?

+2 votes
People increasingly use web applications to manage their email. So going forward, there is a good chance that many donors to archives will not have any local copies of their email messages. What issues should archives consider in accessioning email from these third party web based systems? Specifically, is it better to have a donor export their email and submit it? Or is it better for an organization to ask for their account information to connect to the account via POP or IMAP or something and slurp in all the messages?
asked Sep 30, 2014 by tjowens (2,350 points)

2 Answers

+2 votes
Best answer

I've been spending a lot of time with this question lately, too. As with almost every digital preservation conundrum, I think the initial answer is “It depends…”

There are a lot of factors to consider with web email: the type of email provider; whether the platform uses POP or IMAP protocol; the number of folders/messages in the account; whether there is contextual information in the email’s organization/folder structure; whether the donor wants to provide the entire email account or just selected folders; whether the donor is able to export the email and has the necessary technical skills to do so; how to ensure the integrity of the download and transfer (i.e., make sure that the sent/received dates of the original messages aren’t changed to the date of download); whether the donor’s email system/client uses a proprietary format for export; and, not least, the PII and security complications of requesting account usernames and passwords.

We recently accessioned our first web-based email collection (38,000 messages), which was hosted by Earthlink Web Mail. Earthlink retired its desktop client application (TotalAccess Mailbox) some years ago and now recommends downloading web mail to Windows Live Mail or Outlook. We weren’t thrilled with those options, so we experimented with using Mozilla Thunderbird to download the account to a local workstation and save the email in the emerging-standard .MBOX format.

Since Earthlink uses a POP server, it was only possible to download messages from the account’s Inbox; to download the messages in other folders, we needed to download the Inbox, move those messages to a folder in Thunderbird, empty the Inbox in the web account, move the contents of the remaining Earthlink folders into the Inbox (one folder at a time), download the “Inbox” again, move those messages to a labeled folder, and repeat the process for all folders. It was quite a laborious and time-intensive task and we had to take great care to preserve the associations between messages and the original folder hierarchy (for example, if a message had been saved in the “Drafts” folder we wanted to make sure that information was preserved.) 

In our case, the email creator had passed away in 2011, and the web account was being maintained by the donor. Since Earthlink has some tricky complications, and because we did not have much information about the creation/use of the account, we definitely wanted to handle the download ourselves so that we could slurp in all the messages and keep the metadata and context intact. Requesting the username and password was not as much of a concern, since the creator was no longer alive (and because the account was to be closed after we downloaded the email).

Ideally I think it's best to have an ongoing process with creators/donors to appraise and securely capture the email they wish to transfer. I’m really looking forward to the outcome of Stanford University’s ePADD Project, which will produce an open-source tool that allows individuals and repositories to evaluate and process email archives before and after they have been transferred.

I found these resources especially helpful, too, and would be glad to hear of others:

answered Oct 3, 2014 by KateTasker (480 points)
selected Oct 3, 2014 by tjowens
+1 vote

I'm pretty new to the whole topic, but I've been researching and thinking about it a lot, so I hope you don't mind if I use an my answer to throw a couple thoughts out there and hope it gets some response from others.

I think first and foremost, the method of acquisition is going to be limited by the email service in question.  As with pretty much everything else archivists collect, we like to figure out what formats we prefer but we'll ultimately take whatever we're given (provided we want it bad enough).

After technical limitations, then the donor's personal preference and the general situation come into play.  Are you collecting from a private donor or an outgoing leader in your organization?   What kind of private information might other people have sent to the donor via email?  What kind of expectation of privacy does the donor have?  What level of need do we have to maintain an authentic, transparent record by collecting everything?  

Barring some truly exceptional situations, I personally would prefer to allow the donor the ability to decide exactly what emails they donate and which ones they keep private (in addition to any privacy-related weeding I might do).  We expect that donors go through paper correspondence before it comes to us, and I personally am OK with email getting the same treatment.  In that sense, I think its better to have the donor export their email and submit it (or whatever the donor is most comfortable with).

What does everyone else think?

answered Oct 2, 2014 by sarah.barsness (1,060 points)