• Register

How to browse web archives using an old browser?

+3 votes

We have a researcher who would like to browse the Internet Archive web collection using an appropriate browser, from the time. I've come across some examples, like Accessing Authentic Archived Websites Well (and indeed Euan has offered to help out), but I was wondering if anyone else had any advice? The researcher seems reasonably technically able, so non-trivial solutions are fine, but then perhaps something like the KEEP Framework could help simplify things?

asked Jul 8, 2014 by anjackson (2,950 points)
edited Jul 8, 2014 by anjackson
I would like to answer this, but before I do I must say I'm not quite sure what the problem you are trying to solve here is. Is the challenge setting up a good emulator with the right browsers? Or have you done that and the challenge is that Wayback injects contemporary HTML and JS for the histogram interface? Both?
I was primarily concerned with the issue of setting up an emulator, as this is something the researcher would have to do. Perhaps I could split the playback service issues off into a separate question?

1 Answer

+4 votes

The setup and configuration of an internet-connected emulation or virtualization is relatively trivial. Emulators such as Basilisk II, Sheepshaver are likely sufficient for your researcher's needs if they wish to emulate a mid 1990's Macintosh computer. There's also MESS, which is a more technically accurate solution, but trickier to configure, and the implementations are not always complete. If they prefers PCs, then VirtualBox is a good and free option for the virtualization of early Windows systems.

The best approach entirely depends on the skills and equipment in possession of the researcher, or whoever is configuring the emulator. There are essentially three ways of getting the aforementioned emulators up and running:

  1. Illegally download disk images of the OS installer and the machine ROM 
  2. Purchase OS installer disks and make disk images of these, illegally download machine ROM
  3. Source a vintage machine supported by your emulator of choice, which is already running the OS you wish to run in the emulated system. Disk image the hard drive of this machine, and dump its ROM.

In all cases, the machine ROMs are only necessary for the Macintosh emulators - not necessary for PC virtualization tools like VirtualBox. In the case of 1 and 2 you start with a "blank" disk image for the system disk, mount the installer disk images and install the OS to the blank system disk image. In the case of 3, you more or less can load the raw disk image into the emulator and most of the time this "just works." MESS takes a special format of disk image, so you need to use a tool they provide to convert the raw disk image. Sometimes VirtualBox is fussy with raw disk images as well.

To make a long story short, if the researcher is willing to take personal legal risks (arguably very small risk) by digging around online and finding the OS installer disk images and machine ROMs, this is really the easiest way to go. These things are out there and it doesn't take too much googling to find them. With this option, setting up an emulation or virtualization is incredibly simple, and they will find that there is a great wealth of information online about setting these systems up (versus something like the KEEP project, which is far more obscure). Getting all of the emulators I've mentioned to leverage the internet connection of the host machine is documented, and trivial. The researcher can likely find a common web browser of the proper period on oldapps.com. So, short of providing step-by-step instructions on on how to accomplish the above three options, these would be my recommendations. For resources and help with Basilisk II and Sheepshaver, see the Emaculation forums. For help with MESS, see the documentation or the #MESSDEV IRC. For VirtualBox, documentation and user forums.

As I mentioned in my above comment though, the researcher will inevitably find that browsing the internet archive using vintage software is quite useless due to the contemporary markup and JS that Wayback injects into the pages for the Internet Archive's histogram interface. While of course this interface is incredibly useful for the way that 99% of users interact with the Wayback Machine, it is incompatible with older systems and represents what is essentially an "inauthentic" rendering of the web page. One work-around for coping with this problem would be to set up a Content Adaptation Proxy server – this would take a bit of work and familiarity with configuring such systems, but it would be completely feasible to use something like Squid's ICAP or eCAP to entirely remove the histogram interface from the pages being served by the proxy.

answered Jul 9, 2014 by benfinoradin (460 points)
edited Jul 9, 2014 by benfinoradin