A General Theory of Backups

In working with people who are new to web design and development, I have often found that backing up a website is one of the last things on their mind. They’re more interested in the fun stuff. Backing up your website is more like a chore. It’s the computer equivalent of cleaning the litter box.

Of course, backing up your website is important, because websites get hacked, servers go down, and sometimes you want to change webhosts. A clean, solid backup makes all of these things a lot easier to deal with.

In this article, I’m going to outline a general theory of backups—the when, the where, and the what. In a later series of posts, I will talk about the how.

Contents

1 Backup Early and Often
2 Know What to Backup
3 Store Your Backups in a Secure Location
4 Know How to Recover From a Backup
5 Conclusion

Backup Early and Often

It does no good to back up your site after it’s been hacked or a server has gone down, but unfortunately, this is exactly when most people start thinking about backups. Get in the habit of backing up your website while it’s still in development, and once it’s in production, you’ll automatically have a workflow for backing it up.

By the same token, develop a schedule for backing up your website that makes sense for how you use it. If you are adding, deleting, or changing content on a daily basis, then back it up on a daily basis. If you have a site that you never modify, back it up at least once a quarter. (Yes, I know a one-time backup could work here. But backups get lost, corrupted, or sometimes fail. Having redundant backups gives you a buffer between an ideal situation and reality. Buffers are a good thing.) Develop a schedule that works for you and then stick to it.

Know What to Backup

Every website is unique but in general, each falls into one of three broad categories:

Type 1: These are sites that consist entirely of custom-written code, generally consisting of html and css, and often php files. (I’ve created a lot of these over the years. Why overthink things when all you need are five or six pages?)

Type 2: These sites are based on flat-file web applications that consist of a set of files (mostly php, with a sprinkling of css, js, html, and xml files generally) but don’t rely on a database to store content. Instead, all the content is stored as files, which are often .xml files, but could be anything. In general, you don’t need to back up the entire website, just the content files. In fact, you really shouldn’t backup the entire website, because if your site is infected, you’re also backing up the infection. You can always download a clean copy of the original web app from its website. DokuWiki, GetSimple, and Shaarli are good examples of these.

Type 3: Database-driven web apps that consist of a set of files (again, the majority are often php) but with a database (generally MySQL, but there are others) to store content. WordPress, Moodle, and MediaWiki are all good examples. And like type 2 websites, you don’t need and really shouldn’t back up the entire web application. Just back up the portion that changes as you interact with it, such as the directory where images or other files are stored, and any settings files. Again, you can always download a clean copy of the original web app when it’s time to recover.

I urge you not to back up the entire web app for another reason: doing so puts more strain on your server, because you’re backing up more data than you need to. You will also pay more to store it. Why store multiple copies of files that are readily available elsewhere? The takeaway here is to make your backups as efficiently as possible to save time, server resources, and money.

Store Your Backups in a Secure Location

Don’t store your backups on your webserver. Ever.

I know, it’s a fairly simple process to compress all your site files and save the resulting zip file or tarball on your web server in a place that is not readily available (i.e., it’s above the public_html directory). But I said “readily available” for a reason. While that may seem like a safe place, it’s not. Here’s why.

First, the average visitor to your site won’t be able to access your backups. But hackers aren’t your average visitor. If they can gain write access to the publicly available portion of your website, it’s not that much more work for them to get to the parts of your server that aren’t publicly available. It’s a bit of an inconvenience for them, at most.

Imagine this scenario: a hacker gets into your website, infects it with some sort of malware, notices that you have a bunch of backups sitting there, and then deletes them. If you login and see that your backups are missing, your first instinct may be to backup your site immediately, which is a good instinct, except that now you’re backing up a site that’s already infected. Not good. And if your backups are automated, from this point on, you’re automatically backing up an infected site. Again, not good.

Second, servers fail. Servers aren’t magic boxes; they’re just computers that need electricity and maintenance and eventually, replacing. Server failure is rare, but it does happen. Web hosting companies do upgrade servers, which sometimes means that they’re upgrading the software on it (minimal risk to you, but still a risk) and occasionally means that they are replacing it, which means that your data needs to be transferred to an entirely new machine. Moving data around is always a risk, and the best way to manage that risk is by having a backup of it. Imagine the day when you get an email from your web host telling you that the server your website is on got fried, and while they’re awfully sorry, they do have a backup from 4 months ago. Of course you’ve made a lot of changes in the intervening months, but that’s okay, because you backup your website on a daily basis. You can get to those backups easily because they’re on the same server as your website. Except that server is now toast and so are your backups.

Oops.

Keep your backups safe by not keeping them on your server. In a pinch, you can always download them to your local machine (which you backup on a regular basis, right), but there are better ways of handling this. I said earlier that I would be writing a series of posts that show you how to write server scripts to back up your websites on a regular basis. I’m also going to show you how to automatically move those backups to a third-party cloud solution, such as Amazon Web Services.

Know How to Recover From a Backup

It’s not enough to just have a backup of your website if you have no idea how to recover from that backup file. I first ran into this issue with WordPress, where there are plenty of backup plugins, but as far as I know, none that will automatically restore that backup for you. A lot of people found this out, much to their chagrin, after their sites were hacked. A backup file is absolutely worthless if you don’t know what to do with it.

In the series of posts to follow this one, I’ll also show you how to restore your site from a backup file.

Conclusion

In talking to people, I have come to the conclusion that a lot (i.e., almost all) of people perform backups only because they feel it is something they should do, but they don’t become aware of the consequences of a proper backup, including knowing how to get it and how to restore from it, until after they experience an issue.

My goal is to move you beyond the “oh, crap—get the duct tape!” mode of being a webmaster and to get you to act proactively with regard to site backups and security. Remember, it’s not a matter of if your website will get hacked or a server will get fried; it’s a matter of when. And when that day comes, my goal is to ensure you have the skills, confidence, and backups to deal with it.

Except for material released under a Creative Commons License: ©2025 Kenneth John Odle All Rights ReservedPermalink for this article:
https://techblog.kjodle.net/2018/12/01/a-general-theory-of-backups/

Up Past Midnight

Getting it done Linux style