• Register

What makes for well defined selection criteria for an Archive-It Web Archive collection?

+1 vote
Tools like Archive-It make it relaitvly easy to archive the web. However, given how heterogeneous the web and web content is it becomes a challenge to pin down and define what web content matters to a particular organization and how to identify that content's place in seed URLs. To this end, I am curious to hear what people think make for the core requirements for selection criteria for an Archive-It collection. Idealy, answers here could point to some Archive-It collections that clearly follow through on well defined selection criteria.
asked Jul 30, 2014 by tjowens (2,360 points)
It's worth giving some thought to defining selection criteria for each Archive-It collection because there is no automated way of moving seeds from one collection to another. You have to disable a seed from one collection and then start it in the new collection. Internet Archive states that there is currently no way to move archived materials from one collection to another. So if you don't get it "right" the first time, you end up with a bifurcated collection.
I can speak for Archive-It and say that the ability to move seeds between collections is something high on the list of added functionality for a future AIT release. It is a complicated technical change, but one requested and that we plan on addressing in the near future.
good to know!

3 Answers

+2 votes
Like all acquisition decisions, the development of a web archives should be based on the collection development policies and strategies of the collecting institution. To my mind, web archives are just another format an organization may choose to collect, and practitioners should base curatorial and selection decisions for web archives on the same commitment to long-term collection development and maintenance that the institution employs in other media.
answered Jul 31, 2014 by ChristiePeterson (580 points)
+2 votes
Totally agree with Christie. Speaking only for myself, I'd say appraisal criteria for web archives are no different than any other format. What is needed, regardless of whether something is web-based or paper-based, are clearly conceptualized, articulated, and executed collecting policies whose outcomes (aka collection-specific decision making) are documented upon acquisition/accession.

It is, of course (and were it your mandate), easier to collect the web presence of x-number of organizations than it is to collect the documentary records of x-number of organizations (even in just a rote logistics comparison), but I think that makes the application of institutionally-defined appraisal criteria all the easier, really. In my experience, archives are more likely to "fuzzy" their selection standards for analog collections that "show up on their doorstep." The web isn't going to show up on your doorstep (though how awesome would it be if it did!) -- you have to go out and get it. Which, again imo, makes those selection decision a bit less prone to contingency and thus more aligned with existing institutional practices.
answered Aug 19, 2014 by jeffersonbailey (380 points)
+2 votes

I agree with Christie and Jefferson on the importance of well-defined, format-agnostic collection development policies, but I also think that format-specific considerations can usefully inform the selection of web content for archiving.

Case-in-point: the Curator for the Russia and Eurasia Collection at the Hoover Institution Library and Archives approached me about archiving some websites about the Ukraine conflict that weren't represented in Archive-It's collaborative collection. Archive-It had already facilitated the creation of the most essential curated archive of web materials related to the topic. The marginal value of nominating the additional websites for archiving within the Archive-It collection, to researchers at both the Hoover Institution and, well, anywhere else, far exceeded the value of a proprietary Hoover Institution web archive collection with an arbitrary subset of its content. So, I plugged him into the Archive-It effort.

Collection development policy doesn't typically account for the fact that what other organizations are collecting and, incidentally, making publicly available through Archive-It (or other web archive access points) matters a lot more than it did in a context where acquired analog resources were at most shared within a regional consortium. The potential volume of web content that turns out to be in scope for even a highly-targeted collection development policy and the comparatively small fraction of the changing, disappearing web that our collective web archiving efforts are able to capture and re-present suggest to me that we should actively avoid duplicating efforts and focus foremost on at-risk content. On the other hand, maybe the scale of the problem means that all of our organizations can freely pursue their collection development policies without concern for collisions?

At Stanford University Libraries, web content collecting has not been incorporated into subject-specific policies, but we have created format-specific recommendations that operate as "overlay" guidance for collection development.

answered Nov 12, 2014 by nullhandle (320 points)