• Register

Setting a custom date for wget or wpull?

0 votes
334 views
Both wget and wpull, when saving to WARC, will store the date and time of when the archiving action took place in every HTTP request.

Is there a way to modify this date, either during or after recording?
asked Dec 17, 2015 by despens (930 points)

2 Answers

+1 vote
One thing that certainly works is setting the system time on the host computer to the time frame that you want. We've done this and it worked well.

However, we've since become uncomfortable with having done it. Such WARCs are essentially lying, in that they look exactly like they were *really* taken in the past. If this is a common need, it would be better to have 'honest' WARCs that say the date they were taken and the date they are 'simulating', and extend the playback mechanisms to cope.
answered Dec 17, 2015 by anjackson (2,950 points)
+1 vote

A useful analog may be our work on the SLAC web archive. In that case, we web-published then re-captured historical web content with wget into WARC, then adulterated the CDX indices to reflect the "true" timestamps corresponding to the dates of the backups from which we restored the web content. That way, we had both a faithful record of the actual HTTP communications in the WARCs and timestamp-appropriate indexing for the purposes of access.

answered Dec 17, 2015 by nullhandle (320 points)
...