Stuff that occurs to me

All of my 'how to' posts are tagged here. The most popular posts are about blocking and private accounts on Twitter, also the science communication jobs list. None of the science or medical information I might post to this blog should be taken as medical advice (I'm not medically trained).

Think of this blog as a sort of nursery for my half-baked ideas hence 'stuff that occurs to me'.

Contact: @JoBrodie Email: jo DOT brodie AT gmx DOT com

Science in London: The 2016 scientific society talks in London blog post

Saturday, 21 July 2012

How to find deleted files using Google Cache

Note: there is a window of opportunity for being able to access a recently deleted page or file and once Google Cache has reindexed* the page this window closes. You'd probably not be able to do this if a week or more had elapsed.

*Google indexes / crawls web pages periodically and saves a cached version of the page as it appears when the crawl happens. If a page changes between two crawls then Google clears its previously cached version and replaces with the new one. At that point it's much harder to recover the older material but you might find it on the Wayback Machine (although stuff usually takes about six months to appear).

This is a 'worked example' from something I've just tried to do - it might not work in every situation. It really helps if you have a Gmail / Google account and use Google Drive (formerly known as Google Docs).

1. My source page was a job advert and job description that closed on Friday 19 July 2012 and the page now returns a '404 not found', as both page and job is no longer available. 

The URL of the page was
http://www.britishscienceassociation.org/web/AboutUs/_CRESTExpansionCoordinator2012.htm
which gives enough information to find the cached version two days later. 

A quick way of doing this is just to paste that URL into Google. In this case you get (predictably) a single result.




2. You need to hover your mouse to the right hand side of the search result until the >> arrows show up (highlighted with a vertical yellow oval), click on those and the panel appears on the right hand side. Click on Cached (highlighted with horizontal yellow oval). You can then access the text of interest if needed.

3. Capturing the file might work here - it depends on whether or not the file has been removed as well as the page. Often the page itself isn't removed and only the link pointing to it is, but the page can be deleted of course. Neither situation necessarily means that the file that was originally uploaded with the page has been deleted too though, they're separate things.

However in this case clicking on the link for the file didn't work, so my next plan was to look for the cache of the PDF by searching for the URL of the PDF, which was 
http://www.britishscienceassociation.org/NR/rdonlyres/28AC5D8C-C584-4E23-959A-62D1C243B236/0/JobDescriptionCRESTRoleJune2012final.pdf <-- this live link no longer works.

4. Searching for the file's URL in Google brings up the following - click on 'Quick View' - this will bring up the cached version of the file in a Google Docs format. I've never tried doing this without being logged into Google, not sure how well it will work if you don't use Google Drive / Docs.



You can keep a copy of the file in your Google Drive / Docs folder using the 'Save in Google Docs' link, you can share it with others using the obvious link. The best bit is that you can send yourself (or anyone else) a PDF copy.

5. To acquire a PDF copy first 'Save in Google Docs' then click on the 'documents list' notification that appears. Click on the name of the file to open it as a Google Doc and then click on File >> Email as an attachment. That's not the only way of doing it though (as of 21 Aug I managed this a slightly different way).

Although there's a File menu option when you first save the doc the 'email as attachment' option isn't there (I checked) so you have to do this extra step first.

Hopefully you'll have a copy of your searched-for file, but there are no guarantees alas!


1 comment:

Comment policy: I enthusiastically welcome corrections and I entertain polite disagreement ;) Because of the nature of this blog it attracts a LOT - 5 a day at the moment - of spam comments (I write about spam practices,misleading marketing and unevidenced quackery) and so I'm more likely to post a pasted version of your comment, removing any hyperlinks.

Comments written in ALL CAPS LOCK will be deleted and I won't publish any pro-homeopathy comments, that ship has sailed I'm afraid (it's nonsense).