Pages

Saturday, 20 February 2010

Finding things that aren't there any more on the internet, and storing things that are

EDIT 21 July 2010: New and improved: now with added Mckeith - see bits below in poo brown.
I started editing this but didn't finish it - I wanted to add in a couple of useful things sent to me by chums on Twitter which relate to the sort of clever tips used in the McKeith case (see below).

Here they are, until I put them in order:

Antisocial networking
http://www.nickfitz.co.uk/2010/07/14/antisocial-networking/
via @EvidenceMatters

@herring67 suggested using the Bing cache in addition to the Google one.

Finding things that aren't there any more on the internet, and storing things that are 

Page is still there
1. FreezePage - http://www.freezepage.com
"Free Web service for freezing Web pages. Save, share and prove what is on the Web at a specific point of time." hat tip to @Zeno001

Page is not there
1. Google cache
You've searched for a website via Google, or typed its address in directly and it's not there - if, on the Google results page - there's the word 'Cache' below the address, click on that and you'll get the 'last known' version that Google crawled and cached before the page went down. Google is not the only search engine that has a cache which can do this. More on the cache from Google's own guide.

2. Google database
If your page of choice has disappeared and isn't cached, or available in the Internet Archive (see point 4) then you could try searching for remembered phrases, in quotation marks, to see if information from the page is stored elsewhere.

3. Who has linked to that page?
This is part of the advanced features of Google, but is pretty straightforward - simply type the word link: before the URL, eg link:www.targeturl.com

3b. Dead Url - finding the missing link
This assumes you still have a copy of the URL, if not you could try http://deadurl.com to try and piece together what it should be.

If you find that someone's linked to it you could ask them if they stored a copy.

Added 14 Nov 2018: Wikipedia has a useful section on Link Rot, see note 1 in particular. There are some alternatives to try and uncover a missing link.

4. Wayback Machine from the Internet Archive - http://www.archive.org/web/web.php
I think this displays websites six months after storing them so, for example, a page stored on 1 Jan will be visible on 1 June but not before (I might be wrong about this).

Type your URL in and off you go. You'll be given a list of years, months and days on which the archive stored copies - basically have a browse around.

Edit: 10 April 2014 - @zeno001 has just told me about this one http://archive-org.com/

4a. Search Engine Showdown
Another list of cached options can be found here http://www.searchengineshowdown.com/others/archive.shtml

5. Ask for help :)

Updated
Sneakery: @gilliamckeith - a case study
The last few days on Twitter (June 2010) didn't go particularly well for Gillian McKeith. In a highly diverting chain of events her official account tweeted some unwise tweets which drew the attention of Twitter. I imagine it as a sort of ring / Sauron type of thing. Google has plenty of blogged accounts, including Jack of Kent's.


Deleting an unwise tweet seems like a sensible move, and it was the one her Official Twitter account took. Unfortunately deleted tweets are not fully deleted for some time - they might not show up in your tweetstream but they are cached on Google (it's doing real-time searching now so once you press 'tweet' it's out there). People favourite tweets (if I favourite a tweet containing a link it's sent to Delicious via Packrati.us), people retweet tweets, they are picked up by other aggregators (topsy, tweetmeme) etc. etc.

Finding deleted tweets:
The least techy method is to search for the tweet on Google and look at the cache. Searching for gillianmckeith twitter soon after the great deletion of 2010 would have brought up many of her tweets and clicking on the cached link underneath any of them would show the tweet. As a few days have passed since the excitements then fewer of the tweets are still accessible.

But one is still viewable at the time I'm writing this - Freezepage (see point 1 above) is very useful in capturing a page while it's still there.

Whenever a tweet says something like
"10:22 AM Jul 8th via web in reply to twittername"
be aware that there are two links there ('10:22 AM Jul 8th' and 'in reply to twittername')

Clicking on the time link in this tweet would take you to a page with just that tweet on it, clicking on the 'reply to' link will take you to a page containing just the original tweet.

Finding deleted links on pages:
Things got rather interesting when the @gillianmckeith Twitter account indicated that it wasn't the real account after all. Most people following had wondered about that the previous day, if not before, and it was already fairly well confirmed. There were many clues - in the official McKeith pages there were several blue icons directing her visitors to "follow me on Twitter" . Hovering over the icon flashes up the http://twitter.com/gillianmckeith in the status bar, an example of her newsletter with the Twitter icon present and linking to that account can be seen here http://www.freezepage.com/1279125352BWNIMGKGOL

No comments:

Post a Comment

Comment policy: I enthusiastically welcome corrections and I entertain polite disagreement ;) Because of the nature of this blog it attracts a LOT - 5 a day at the moment - of spam comments (I write about spam practices,misleading marketing and unevidenced quackery) and so I'm more likely to post a pasted version of your comment, removing any hyperlinks.

Comments written in ALL CAPS LOCK will be deleted and I won't publish any pro-homeopathy comments, that ship has sailed I'm afraid (it's nonsense).