• Register

How to access a subset of Web Archive pages from a vintage web browser on vintage hardware?

+5 votes
1 view

I am in possession of an old 1998 Powerbook looking to browse era-specific websites (1995-2000, roughly) from the Web Archive using Netscape Communicator 4.08.  I'm doing so specifically to demonstrate the full capabilites of this specific browser, not simply trying to view the websites from any other browser, nor on any other machine.  Furthermore, I need to show that these web sites are, in fact, web-accessible, so simply saving the web pages on a modern computer, removing the offending incompatibilities, transferring the files over to the Powerbook through other means and opening the files locally using the browser doesn't cut it.  Obviously, there are many issues with this, incompatibility with the modern https protocol now used by Web Archive and the version of JavaScript it uses chief among them.  The solutions I'm thinking of attempting, are as follows:

1) Save the web pages I want from Web Archive, manually strip away the incompatible elements, host the pages on local servers on my network using protocols compatible with my vintage browser.  This is my "brute force" option.

2) Write a transcoding proxy that automatically strips away the incompatible elements from the Web Archived versions of the web sites, so that any attempt to access the site by my vintage browser will return these transcoded versions.

Both options are time consuming, and I was simply wondering if this problem has already been tackled, before I waste my time reinventing a solution.

asked Sep 29, 2020 by EriolGaurhoth (190 points)
How about using https://oldweb.today ?
"I'm doing so specifically to demonstrate the full capabilites of this specific browser, not simply trying to view the websites from any other browser, nor on any other machine. "

oldweb.today is a really awesome resource, and is great for emulating the look-and-feel of an old browser...from a new browser on a modern computer.  I'm still looking for a solution that allows you to access web archived sites directly from an actual old browser running on actual vintage hardware.  The tag editing method mentioned by nullhandle is a great start, I just need to write a quick proxy to automate the process and append the "id_" string.

Another good resource I came across is https://wiby.me/ which searches mostly just "old" web pages that can display on older browsers, only it won't retrieve older archived web pages of sites that still exist from web archive (so I can't see Apple's 2001 homepage from wiby).  There was another one, whose name escapes me, that is a search engine that strips every result down to its basic HTML 1.0 text-only version, which of course breaks most modern websites and has the same problem as wiby; no retrieving of old web pages, just old-ifying new web pages.

2 Answers

+2 votes

You can request the raw, unrewritten archived web content from the Internet Archive Wayback Machine by appending "id_" to the date string in the URL (e.g., https://web.archive.org/web/19961022173245/http://www.geocities.com/ https://web.archive.org/web/19961022173245id_/http://www.geocities.com/). See https://web.archive.org/web/20130329115724/http://faq.web.archive.org/page-without-wayback-code/.

answered Oct 9, 2020 by nullhandle (320 points)
0 votes

Get yourself a pywb proxy and set it up like this (config.yaml):

framed_replay: false
enable_cdx_api: true
enable_http_proxy: true
enable_coll_info: false
    cookie_resolver: false
    coll: proxy
    enable_client_rewrite: false
    enable_banner: false
    default_timestamp: '19980908120000'

            - index_group:
                ia: memento+http://web.archive.org/web/
            - index_group:
                loc: memento+http://webarchive.loc.gov/all/
                uk_na: memento+http://webarchive.nationalarchives.gov.uk/
                pt_wa: memento+http://arquivo.pt/wayback/

Then connect your legacy browser to the service on port 8080 or whatever other port you're running pywb on.

Pywb will act as a proxy to the web archives listed in "sequence" and deliver the un-rewritter, original raw data via memento protocol. The default_timestamp is the time web archives are asked for resources. The first index_group is just the Internet Archive. The second lists the Library of Congress, the UK National Archives, and the Portuguese Web Archive in cas Internet Archive doesn't have a resource. Like this you can build your own oldweb.today so to say, with real hardware.

answered Feb 14, 2022 by despens (980 points)