Pages

Tuesday, 12 February 2013

How to back up or download your blog that is hosted on free sites eg Blogger

Short version: get hold of a copy of the XML file for your blog. This contains all of the information (posts, site hierarchy, comments) in one convenient web file that can be used to relaunch your blog in case of disaster.



This post is for people like me, who have their blog hosted on sites like Blogger, Wordpress.com or Posterous. I've no idea about setting up, or backing up, a self-hosted blog (where you interact with a server). If someone who knows more about that fancies contributing a few lines for me to add here... that would be nice.

Things you can do now while your blog is still up and running
Remember, backing up your blog isn't a one-time thing but something you might want to do every few days or weeks (depending on how frequently you blog). Probably there's some way to automate this process but if there is, I don't know it.

1. Backing up your blog

Blogger (Google)
To download blog, click on the 'Design' icon at the top of your blog (being logged in) and then click on the Settings | Other menu, then click "Export Blog" and choose the Download option. You will then be saving an .xml file which you can keep safe until you need to import it into another blog host.
From http://support.google.com/blogger/bin/answer.py?hl=en&answer=97416

Posterous
http://posterous.com/#backup - then click on the Request Backup button for your blog(s).

WordPress
Go to your blog(s)' site admin area (stick /wp-admin/ after the end of your blog's address).
In the menu on the left click on Tools | Export and choose the free option which gives you an xml file
 
Tumblr
I am less familiar with this one so have taken advice from Google but it seems that the following option may help:
• HTTrack (this will back up any website to be honest, I use it for other purposes)
• See also all Tumblr posts tagged with Tumblr backup

2. Mirroring your blog while it's still up and running

I have a copy of this very blog at Blogger.com over at Wordpress.com though it's private as it's confusing to have two copies of the same information, I also have the xml files of blogs I'd be annoyed to lose.

You can mirror your blog in two ways I think.
(1) Authorise the two blog hosts to talk to each other and let the software do the importing for you
(2) Import a blog using your xml file (obviously this ought to work whether or not your original blog is up and running or suspended).

Importing into Wordpress from another site
• Create account and new blog to house your content
• Stick /wp-admin/import.php on the end of your blog's address to go to the admin / import site (or use Tools | Import) - you can choose to import from several different blogging platforms including Blogger, Posterous or Tumblr.
• You'll then be asked to authorise Wordpress to interact with your original blog and you can then import the posts.

Importing into Blogger from another site (not sure if this will work)
Not sure if this will work using the xml setting as it says 'import from a Blogger xml' rather than just any old xml - I can't see how one xml differs from any other but 'more research needed'. Apparently there is a Wordpress to Blogger conversion tool if your 'another site' is Wordpress.

• Create an account and set up a new blog at http://www.blogger.com/home.
• If doing so takes you to a different page, go back to the link above (home)
• To the right of your new blog is an orange pencil symbol (create new post) a couple of other icons and a 'view blog' icon. Click on the small down arrow between them to bring up the options, choose Settings | Other and then Import blog from the options along the top
• Upload your xml file and hope for the best.

See also Blogger's own advice on importing and exporting blogs (also mentioned above).

Recovering your posts when it's a bit late for the above
1. Stick your blog's URL into Google and capture what you can from Google's cache. You can also remove the http:// bit and replace it with cache: to do this.

The result will be pages of search results and so will involve quite a bit of labour in capturing them all, you can use 'File / Save page as...' to save them to your hard drive.

Edit: Alan Heness has suggested the following, accessed by using a Web Cache extension for the Chrome browser. Some of these I confess I've never heard of :)

Google's cache
Yahoo's cache
Bing's cache
CoralCDN
WebCite
GigaBlast
Wayback Machine

2. Once on a cached page make note of any table of contents or archives for each month. Google will return your blog's posts in no particular order making it difficult to know if you've got everything. It's much easier to know what you're searching for and you can also get hold of a cached copy of your month by month archives - the purpose of this is to bring up a page with the title and links of your posts for that month. The links themselves won't work if your blog is down but, again, stick cache: in front of them to see if you can grab a cached copy from Google.

3. Use the Internet Archive's Wayback Machine to find older posts if Google cache doesn't have them. Note also that Google Cache might not have very recently published posts.

Background
This post arose after a few posts and entire blogs I enjoy reading were apparently taken down including the Retraction Watch website (see also ArsTechnica on the story), the 21st Floor and Josephine Jones' blog (the latter turned out to be a glitch, I've done it myself when importing my blog to Wordpress - I triggered their spam warning and had to ask them to restore it, which they did within a couple of hours although it looks like someone is keen to see it taken down [copy]).

In the case of Retraction Watch (RW) a seemingly mistaken DMCA (Digital Millennium Copyright Act) 'takedown' request was made to the organisation hosting RW's blog (Wordpress) stating that RW had posted material that belonged to someone else. Nonsense as this may be it seems that the effect of DMCA takedowns are more like 'shoot first, ask questions later'. Similarly blogs can be terminated for violation of terms of service.

Sometimes blogs get it wrong and post stuff that isn't theirs to post, so fair enough, other times it's just a blunt tool to remove perfectly valid but perhaps inconvenient information.

Chilling Effects is a site where people can upload their DMCA notices - the site is collecting and commenting on them (not all are unreasonable). They also have an online DMCA counter claim if you believe your material has been removed in error.

Be careful about putting in a claim or counter-claim - you may be committing a crime if you're wrong.

If you spot any mistakes or omissions in methods listed above please let me know, thanks.

2 comments:

  1. Maybe useful to know.
    In Blogger, from Safari 5.1.7, Mac OS 10.6.8
    The Export blog buttons correctly invoked a window with a Download Blog button. But THIS Download blog button evoked instead a window to add RSS feeds. I've reported this to Google. I expect it might be taken care of within the next few years.
    In the meantime, I tried the export operation using Firefox instead of Safari and was able to download the .xml file to my Desktop.

    ReplyDelete
    Replies
    1. Thanks Hélène - admittedly I've only ever tried it in Firefox and it all worked perfectly but good to be aware that if there's a problem it might be worth trying another browser.

      Delete

Comment policy: I enthusiastically welcome corrections and I entertain polite disagreement ;) Because of the nature of this blog it attracts a LOT - 5 a day at the moment - of spam comments (I write about spam practices,misleading marketing and unevidenced quackery) and so I'm more likely to post a pasted version of your comment, removing any hyperlinks.

Comments written in ALL CAPS LOCK will be deleted and I won't publish any pro-homeopathy comments, that ship has sailed I'm afraid (it's nonsense).