Great question! For the Parliamentary Archives, we have done exactly as you describe in terms of outsourcing the majority of our web archive capability to the Internet Memory Foundation. The justification for doing so is simple; we simply wouldn’t have the resources or infrastructure to do so in-house. The Houses of Parliament web estate forms a vital corporate record of the organisations activity which should be preserved and made accessible.
To set our Quality Assurance practices into context, it will help to give a very brief introduction to our web archive process:
-
First, Archive staff sign-off the Seed List (usually around 30 URL’s).Second, IMF carries out the crawls. There is the option for Archives staff to set off crawls using ArchiveTheNet (AtN).
-
An initial QA stage is then done by IMF to identify and rectify initial issues (such as ensuring crawl parameters have been met, technical limitations noted etc.).
-
Using project management software (JIRA) IMF then hand over the captured URL’s for the second QA stage which is carried out by Archive staff. We assign one person to lead the QA and then distribute QA tasks across multiple staff.
-
Any further QA issues are reported through JIRA and resolved by IMF.
-
Finally, captured crawls are delivered by IMF to the Archives for ingest into the Parliament’s digital repository (Preservica Enterprise Edition).
As you can see, there are multiple levels of QA which occur. From the initial QA carried out by IMF, to the second QA undertaken by the Archives. I would argue that there is a final level of QA involved during the ingest workflow as the ARCs/(W)ARC’s are characterised, validated, and virus scanned etc. Setting aside the ingest workflow, the bulk of our QA is a manual human process which involves a significant amount of time checking the captured data. We also have a Service Level Agreement (SLA) with IMF which is vital in ensuring that the required service and QA approaches are met. For obvious reasons, I can’t go into too much detail in terms of what the SLA contains!
I am aware that our process could be improved, and that our QA steps rely upon manual intervention. I’d be happy to discuss in more detail the QA steps I have outlined and potential areas of improvement. I’d also be particularly interested to hear from other web archive experts with suggestions as to what we should be doing as part of our QA activities.
~Chris Fryer, Senior Digital Archivist, Parliamentary Archives