*Google indexes / crawls web pages periodically and saves a cached version of the page as it appears when the crawl happens. If a page changes between two crawls then Google clears its previously cached version and replaces with the new one. At that point it's much harder to recover the older material but you might find it on the Wayback Machine (although stuff usually takes about six months to appear).
This is a 'worked example' from something I've just tried to do - it might not work in every situation. It really helps if you have a Gmail / Google account and use Google Drive (formerly known as Google Docs).
1. My source page was a job advert and job description that closed on Friday 19 July 2012 and the page now returns a '404 not found', as both page and job is no longer available.
The URL of the page was
http://www.britishscienceassociation.org/
which gives enough information to find the cached version two days later.
A quick way of doing this is just to paste that URL into Google. In this case you get (predictably) a single result.
2. You need to hover your mouse to the right hand side of the search result until the >> arrows show up (highlighted with a vertical yellow oval), click on those and the panel appears on the right hand side. Click on Cached (highlighted with horizontal yellow oval). You can then access the text of interest if needed.
3. Capturing the file might work here - it depends on whether or not the file has been removed as well as the page. Often the page itself isn't removed and only the link pointing to it is, but the page can be deleted of course. Neither situation necessarily means that the file that was originally uploaded with the page has been deleted too though, they're separate things.
However in this case clicking on the link for the file didn't work, so my next plan was to look for the cache of the PDF by searching for the URL of the PDF, which was
http://www.britishscienceassociation.org/NR/rdonlyres/28AC5D8C-C584-4E23-959A-62D1C243B236/0/JobDescriptionCRESTRoleJune2012final.pdf <-- this live link no longer works.
4. Searching for the file's URL in Google brings up the following - click on 'Quick View' - this will bring up the cached version of the file in a Google Docs format. I've never tried doing this without being logged into Google, not sure how well it will work if you don't use Google Drive / Docs.
5. To acquire a PDF copy first 'Save in Google Docs' then click on the 'documents list' notification that appears. Click on the name of the file to open it as a Google Doc and then click on File >> Email as an attachment. That's not the only way of doing it though (as of 21 Aug I managed this a slightly different way).
Although there's a File menu option when you first save the doc the 'email as attachment' option isn't there (I checked) so you have to do this extra step first.
Hopefully you'll have a copy of your searched-for file, but there are no guarantees alas!
What a lifesaver..great article!
ReplyDelete