by Brian Tomasik
First written: 2013 and 29 Feb. 2016; last update: 15 Oct. 2017
This page explains how I back up my most important data on a regular basis, including data from Google, Facebook, and my websites.
- 1 Summary
- 2 Introduction
- 3 Website backups
- 4 Backing up all my data
- 5 How to collect data for backup
- 6 (Optional) Geomagnetic storms and EMPs
- 7 My backup schedule
- 8 Footnotes
An important fraction of my extended mind is composed of information on Google, Facebook, my websites, etc. I expect the risk of losing such data is fairly low, but it might happen if someone hacked the accounts and deleted the data, or if I lost all ability to log in to the accounts, or if I accidentally deleted the accounts, or if the sites had major data-storage failures. Since it's cheap to back up the data, it seems reasonable to do so periodically.
My websites are the most dense form of important data that I don't want to lose. One copy of them lives directly on the servers. Most pages also have one or more backups on Internet Archive. Many iterations of particular pages can be found in my email, since I send myself pages after editing them as a crude form of version control. However, it's helpful to have additional backups besides these.
For additional redundancy, I formerly used Dropmysite, which backed up monthly all the websites I maintained. It also allowed me to download zip copies of each site's WordPress database file (which contains essay texts) and WordPress "uploads" folder (which contains images and PDFs). I then uploaded these to the cloud. Note that a WordPress database .sql file contains potentially sensitive information, so if you do this, you should encrypt the file.
As of 2017, I've stopped using Dropmysite to save money and because I'm now just doing website backups by hand using website-downloading tools: HTTrack on Windows or SiteSucker on Mac. I do these downloads every few months and then upload to the cloud. Because these copies of the websites are downloaded from the public web, they don't contain any sensitive information and so can be stored unencrypted (and even unzipped).
I occasionally send copies of my website backups to a few friends for them to back up so that even if all of my accounts went away simultaneously, someone else would still have the data. Since these sites are downloaded from the public web, there's no need to take precautions to keep the data private.
Finally, every few years, I plan to create paper copies of my websites, as discussed here. I completed my first paper backup of my websites in 2017.
Backing up all my data
Every ~6 months, I do a full backup of all my important data, including emails, Facebook conversations, calendars, etc. Some files contain sensitive information, like email conversations that the interlocutors may not want to become public. Therefore, I encrypt the sensitive files from these downloads.
I store my data both in the cloud and on discs at home for redundancy. This leaves at least one extant copy of the data in the event of a variety of possible disasters: house fire, theft, losing/breaking physical discs, cloud data loss, account hacking, and so on.
How to collect data for backup
This section describes how to pull your data from various places.
Google Takeout allows you to download roughly all of your Google content to a collection of zip files.
The Gmail .mbox file may be huge (mine is ~6 GB), and it may not be able to be moved onto a flash drive at that size. To fix this problem, I use this program to split the .mbox file into smaller pieces.
If you have multiple Google accounts (e.g., one for an organization that you're involved with), remember to do Takeout for all of them if you have permission to do so.
Google Takeout exports both the contents of a Google Doc (as a .docx file) and its comment threads (as an .html file). The export doesn't seem to include version history for the Google Doc, even though the online Google Doc does have version history.a
YouTube backups are included in Google Takeout, but if you want to back videos up individually, you can download your own videos.
Google Calendar is also included in Google Takeout, but if you want to download it individually, you can go to "Settings" -> "Calendars" -> "Export calendars".
Facebook also has a takeout feature. Click the down arrow in the upper right of any Facebook page -> "Settings" -> "General" -> "Download a copy of your Facebook data."
One benefit of having the download is that it contains all your Facebook messages in a single text file (mine is ~25 MB uncompressed as of Oct. 2014), which allows you to search for specific old content more easily than using Facebook's interface, especially if you don't remember what conversation thread it was in.
Unfortunately, Facebook groups discussions aren't backed up in this process. There are tools for backing up Facebook groups that I hope to explore eventually.
If you have a shared Google Drive folder that you don't own, you can download its contents by clicking the down arrow on the folder path name -> "Download", which will download the whole folder as a zip file. Make sure you have permission to store the data.
You can search for some systematic ways to back up your GitHub content, but if you only have a few repositories, a simple solution is just to clone all of them and then back up those folders.
Stray documents on your desktop
I try to keep all important data in the cloud in case my laptop crashes, but you can also make sure that any data on your desktop that's not already stored with Google gets included in your backup.
(Optional) Geomagnetic storms and EMPs
This discussion has moved to its own page.
My backup schedule
Here's a summary of how I do periodic backups.
Do every ~2 months:
Use SiteSucker to download the latest versions of important websites. I enter the following urls one by one:
- http://reducing-suffering.org/ - http://briantomasik.com/ - http://briantomasik.com/wp-content/uploads/ - http://www.simonknutsson.com/ - https://foundational-research.org/ - https://casparoesterheld.com/ - http://www.wallowinmaya.com/ - http://prioritizationresearch.com/ - http://s-risks.org/
The "wp-content/uploads" part of briantomasik.com is needed in order to force SiteSucker to get that content for some reason.
Once this completes, quickly spot check the downloads to make sure the needed content was retrieved. Zip the folders, name the zip file with the date of download, and upload to the cloud. I also keep at least one copy of my website downloads unencrypted on the cloud just in case zip files for some reason fail to uncompress in the future. Plain text is relatively robust against data corruption.
Let me know if you'd like me to add your site to the list of sites I download periodically.
Do every ~6 months:
- Send the latest zip file of the backed-up websites to at least 3 friends for them to back up as well.
- Back up Facebook data.
- Back up Google data. When doing Google Takeout, I only care about including the following products: Blogger, Calendar, Contacts, Drive, and Mail. Because Google sometimes confusingly intermingles these products within the different download chunks, you might download all the small products first (Blogger, Calendar, Contacts) and then do Drive and Mail separately. I also care about YouTube videos, but I back those up in real time whenever I create a new video and thus don't need to do so using Google Takeout.
- If I've been active on other platforms recently (such as GitHub), download that content as well.
Currently I don't back up shared Google Drive folders because other people are on top of that.
Every ~5 years
Create new paper printouts of my websites.
- In July 2017, I verified this for myself as follows. I created a test Google Doc that contained a very large volume of text and let the Google Doc save itself. Then I deleted most of the text, leaving only a tiny amount of text, and then let the Google Doc save itself again. Then I ran Google Takeout. The downloaded .docx file was smaller in kilobytes than the amount of text I had added to the Google Doc originally, so the downloaded doc must not have contained the full version-history data anywhere. Moreover, the downloaded .docx was smaller than a compressed version of the original text using 7-Zip's most compact compression, so it's unlikely the .docx file was hiding the version-history content in compressed format. The large volume of text was still visible in the online version history for the Google Doc, though. (back)