Stuff that occurs to me: How are we cache-ing pages?

Stuff that occurs to me

All of my 'how to' posts are tagged here. The most popular posts are about blocking and private accounts on Twitter, also the science communication jobs list. None of the science or medical information I might post to this blog should be taken as medical advice (I'm not medically trained).

Think of this blog as a sort of nursery for my half-baked ideas hence 'stuff that occurs to me'.

Contact: @JoBrodie Email: jo DOT brodie AT gmail DOT com

Science in London: The 2018/19 scientific society talks in London blog post

Wednesday, 28 October 2009

How are we cache-ing pages?

When people say, in their delightfully sneaky blog posts, "But what's this? Here's a copy I cached earlier" what is that they're actually doing?

I'm curious to know about the different methods used and if there's a 'best' way of doing this. I suppose it would have to be something that also doesn't permit tampering with the source code to falsify the webpage.

1. Rely on Google cache

Probably unreliable (and websites can override Google's webcrawling robots) but at least search terms are nicely highlighted. I don't think this is easily falsifiable.

2. File / Save As... / Web archive, single file (.mht)

The option presented to me by MSIE - I think I may have used this for "working offline" but not with any particular competence. Is this what people are doing? Does it save an entire website or just the page you're on? I don't think this is easily falsifiable either.

3. Save the html code and regurgitate as a page later on

View / Source gives a small notepad file (which can be saved as .htm which can then be opened for editing in notepad, or in any browser as a webpage) with all the text needed to recreate the page. Images need to be saved later. Very very falsifiable.

4. Wait for the Wayback Archive to do the work for you

Wayback archives a lot of pages and they seem to appear six months after the page was live so changes might be harder to find depending on how many 'impressions' the Archive makes of the page, unless you remember the date on which the information you want to record was available. Doesn't seem to be falsifiable.

5. Take a screen shot

Press the button marked PrtSc (or something similar) and a copy of the entire visible screen is pasted to the clipboard. Paste (Ctrl V) this into Paint or other image editing software to select the relevant bit and save as a .bmp (or .jpeg etc). Probably quite fiddly to falsify the picture of words in Paint but might be doable in other software.

6. Something clever on Firefox

I haven't used it for a while but I think there was a gadget which helped with cacheing pages.

This post is all about creating copies of web pages but for more on 'finding old web pages' go here http://www.searchengineshowdown.com/others/archive.shtml

2 comments:

JoWed Oct 28, 04:39:00 pm 2009
@zeno001 suggested http://www.freezepage.com which says:-

"You already have a free account on this computer. Use it to:

>>Take copies of Web pages and keep them for your own records.
>>Easily and safely share Web Pages with friends or colleagues.
>>Prove exactly what was at a Web address at a specific date."
ReplyDelete
Replies
ZenoSat Oct 31, 08:42:00 pm 2009
There's an add-on for Firefox called Resurrect Pages that gives easy access to several caches versions of the current page as well as to archive.org.

Also, try ScreenGrab! for easy capture of pages as a graphic - far better than PrtScr!
ReplyDelete
Replies

Add comment

Comment policy: I enthusiastically welcome corrections and I entertain polite disagreement ;) Because of the nature of this blog it attracts a LOT - 5 a day at the moment - of spam comments (I write about spam practices,misleading marketing and unevidenced quackery) and so I'm more likely to post a pasted version of your comment, removing any hyperlinks.

Comments written in ALL CAPS LOCK will be deleted and I won't publish any pro-homeopathy comments, that ship has sailed I'm afraid (it's nonsense).

Stuff that occurs to me