Skip to toolbar

Community & Business Groups

Archiving Format

I was poking around in search of good ways to grab pages for preservation, since I feel like “Save as Web Archive” in browsers isn’t really a good long-term solution.  Is there anything better than or otherwise preferable to Web Curator?

(Added 7 June 2012 7:45pm) — The reason I ask about Web Curator is that it appears to save content in the WARC format, which—according to the answers on this Stack Exchange post—is the preferred format for archiving (static) web content.  Is grabbing stuff with wget sufficient?  Should we go with WARC regardless of tool, or is it too much/not enough/not right for us?

4 Responses to Archiving Format

  • For publicly-accessible sites I have relied on HTTrack for years (or WinHTTrack in my case) to pull down completely stand-alone sites with resource paths remapped. It has limitations but it has performed well for me.

    Reply

  • Shane Hudson

    I was planing to ask the same question. I was thinking perhaps create a text file full of links and just have wget download it whenever a new one is added. But not sure if that is a very nice way.

    Reply

  • Adrian: looks like a decent tool! But I’m on a Mac, so it’s not very handy for me.

    Shane: I had also considered wget (and it was suggested to me via email) but I don’t know if that’s a better solution or not. I don’t have a lot of experience with real content preservation. I should update the post to flesh out the question.

    Reply

  • Shane Hudson

    Right, for now I have set up on my computer (will move the server soon) a list of websites in backup_list.txt and a Backups folder where I run ‘wget -x -i ../backup_list.txt’

    If anyone has a better solution then please let us know, but for now this is working well enough.

    EDIT: Oops sorry Eric, didn’t see your comment or edited post. We should definitely find something that works best for both sharing and searching. Also perhaps store it on a server? Shouldn’t take up too much space or bandwidth but collaborative archive would be useful. I don’t have a dedicated server at the moment but may do soon (thinking about upgrading).

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Before you comment here, note that this forum is moderated and your IP address is sent to Akismet, the plugin we use to mitigate spam comments.

*