Open Thread: How to Prevent Data Loss

Tweet

Earlier today social bookmarking site Ma.gnolia suffered what is undoubtedly the worst nightmare for any web startup: a massive, and possibly irreversible loss of data. The site is currently offline, and a note from company founder Larry Halff says that the problem will take days to evaluate.

“Early on the West-coast morning of Friday, January 30th, Ma.gnolia experienced every web service’s worst nightmare: data corruption and loss,” wrote Halff. “For Ma.gnolia, this means that the service is offline and members’ bookmarks are unavailable, both through the website itself and the API. As I evaluate recovery options, I can’t provide a certain timeline or prognosis as to to when or to what degree Ma.gnolia or your bookmarks will return; only that this process will take days, not hours.”

It should be pretty clear to anyone that this is a very, very bad thing. Even if Ma.nolia is to somehow recover all the user data it lost, there has very likely been irreparable damage done to their reputation and to the confidence that users have in their service. We expect that the cloud will go down from time to time, but we also expect that it will be back up quickly and with all of our data intact.

Long time members of SitePoint will remember that I went through something similar — albeit on a much smaller scale — about four years ago. Looking back on it now, I can laugh and marvel at now naive I was. But now, as I prepare to launch a software as a service application in the next couple of months, I also realize how much I still have to learn about making sure that customer data is securely and reliably backed up.

So I’d love to get a discussion going here about backup strategies. How do you manage backups of your web sites? Do you do it manually either to another server or locally using DVD-Rs or external hard drives? Do you pay for a third party back up product? Do you use a cloud storage service like Amazon S3 or Mosso Cloud Files?

Let us know in the comments.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.mockriot.com/ Josh Catone

    The current plan for my site is to host all development and production code in private repositories on GitHub, and back them up nightly on Amazon S3.

    User data (mainly images) will be stored on S3 as well, and possibly backed up to another storage service, such as Mosso (maybe the other way around — I haven’t evaluated the pricing structures of either yet to figure out which would be better for our needs). I’m not sure if it is necessary to use both, though, since they both replicate files across multiple data centers.

    Any thoughts? Would love to hear what everyone else does.

  • Nathanael Padgett

    While I don’t have a web service offered to customers, I am looking to stretch in that direction as an automatic revenue generator one day. For now though, I perform backups manually inside of PHPmyAdmin and directly download files via FTP on a regular basis. I

  • http://art4eye.com -T-

    I have set up a script that gives me nightly database backups. After 30 days, a monthly backup is created and the daily ones are deleted.

    I also have a script set up to create an additional monthly backup that goes off-site. In case of the possibility of both servers die on me.

    Every now and again I download the backups and put them on an external hard drive, just in case.

    This is easy for me as I have a pretty small site. Database is only a couple of hundred MiB. I can’t even imagine the costs of doing something like this for facebook etc. With all the user uploaded content. Images etc.

  • http://www.mockriot.com/ Josh Catone

    I can’t even imagine the costs of doing something like this for facebook etc. With all the user uploaded content. Images etc.

    Facebook gets 5 million videos and 800 million new photos uploaded each months, not to mention all the text data created by hundreds of millions of fan pages, wall postings, status update, user profiles, etc.

    And if you think that’s bad, take a look at this top ten list of the world’s largest databases from 2007: http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

    (not on the list: Facebook, MySpace, or Archive.org … all of which I would imagine are pretty enormous)

  • eretz

    We run all our production servers with Rackspace, who are without a doubt the best people to keep a mission critical configuration online, and bring it back up if the unexpected occurs.

    Though this is *not* a DR strategy, all our servers have RAID1 drives. We didn’t once, but it is not worth having a machine go down because the statistically inevitable just happened (unless configured in an entirely redundant environment). Hardware fails, something which happened to us on a brand new config and was handled flawlessly thanks to chipkill and fantastic support. That said the most awesome part of Rackspace is the support, planning, assurance of hardware, fast bandwidth, etc, being able to spend less time planning for disasters and more time forging ahead with deployments at maximum efficiency.

    Our DR strategy starts with virtualization, and we backup all our Server 2008 Hyper-V VMs at the VHD level via Rackspace’s unmetered Managed Backup service. It is a luxury to be able to store daily full backups at terabytes of data per month for no additional cost, something I don’t think we’d do otherwise. Before we had this configured, we tried Amazon S3+JungleDisk – it was slow. When Rackspace first introduced Managed Backup years back, they had various pricing plans and we only backed up files. Now they offer it unmetered at no additional charge per server – and it is indispensible. However, it isn’t a silver bullet, as backing up such large amounts of data is a two-edged sword as it can take some time to restore, though we haven’t had downtime yet.

    Most recently our data now also resides on their uNAS (Utility NAS) facility which gives us unlimited on-demand data storage, with billing based on the daily usage. It is very flexible in terms of being able to create VM snapshots and store them on uNAS for immediately accessible restores. This in turn is backed up to Mosso’s Cloud Files. Mosso is a Rackspace company and resides in the same DC, so it is a much better fit than S3 for us, and they don’t charge Rackspace Managed customers for bandwidth transfer. It could also help us serve static content via the built in CDN if we needed to, as well as scale out the less complex web services to their Cloud Sites offering. Both Cloud Files and Sites have great support, though we still love the managed offering and we couldn’t/wouldn’t leave it in a hurry, so cloud services are complimentary rather than a replacement for us.

  • Andy

    Want to try out Amazon S3 for personal backup? Try CloudBerry Explorer for Amazon S3 Freeware http://cloudberrylab.com/

  • sys admin

    Our sites run on 2 dedicated servers. We have a 3rd server on a dsl line that is behind a nat with no forwarded ports. This machine has passwordless root on both dedicated servers.

    server c does a differential rsync pull of server a
    server c does a differential rsync push of server a’s backup to server b
    server c does a differential rsync pull of server b
    server c does a differential rsync push of server b’s backup to server a

    This means that a is backed up on b and c
    This means that b is backed up on a and c

    Note1: RAID5 is for fools, I’ve seen RAID arrays wipe out. Note 2: The only real backup is a 1:1 offsite at 2 different locations
    Note 3: Do not allow your public-facing machines to access their data backups, if a bad guy gets to your server he can also wipe out your backup.
    Note 4: Do not allow public-facing machines to access other public-facing machines
    Note 5: Really hide your private machine and monitor the hell out of it.

  • http://www.dangrossman.info Dan Grossman

    All code sits on SVN repositories hosted by CVSDude.com, which is live mirrored to backup servers. That mirrored copy, plus the copy on the servers, plus my local copies for local development means there are 4 copies of all my websites at any given time.

    My sites are hosted across 4 physical servers. Databases back themselves up nightly, and the backups get synced to one of the other 3 servers, so there are two copies of each backup file somewhere within the 4 servers.

  • sys admin

    Regarding mysql backups, all you have to do is back up the /usr/local/mysql/var directory (or wherever the var directory is on your system, I compile mysql from source)

    To recover, all you have to do is re-create the mysql dsn with the right username and password, then overwrite the tree for the dsn under /usr/local/mysql/var

    Easier than any other backup scheme for mysql data

  • http://www.sitepoint.com/ mmj

    A lot of people’s backup strategies take care of some problems, but not all. Ideally, a backup strategy for any data not worth losing needs to be able to cope with:

    1. Your building burns to the ground
    2. You find out all your data became corrupted/lost, and you have done backups since

    For the first, you need off-site backup. This ensures that if an entire building is burgled, burned down, flooded, etc then the data is recoverable.

    For the second, you need some sort of history of backups. Incremental backups are good because they allow history but save space, though of course you’ll need to think about how easy it is to restore from backup.

    Common backup myth: “RAID is backup”
    RAID is not backup. It provides the ability to replace a failed drive without taking down the system, but that is “availability”, not backup. For example, it is not intended to protect against either of the two above scenarios. If the building burns down, it’s all lost. If data is corrupted, it’s all lost (instantly). A faulty disk controller or power supply, or a power surge which your power supply can’t cope with, can ruin the entire RAID set. RAID is good if you need high availability, but even if you have RAID you still need backup.

  • eretz

    Entirely agreed with mmj on this point and re-affirmation of the “RAID is backup” myth.

    Another point probably worth mentioning here is backup MONITORING (is it actually working) and TESTING (frequently testing restores to check if there could be any glitches in the restore). Two areas we’ve seen some painful lessons learned. How many times have people *thought* they’ve had backups, and lo and behold come D-day they find the restore just didn’t work.

    We handled care of both in terms of history with full backups every time (which go back around 14 days), as incremental can require every backup tape in a set to complete a restore and take significant time. If something happened the day before yesterday, we haven’t overwritten our backup with the disaster. It’s also worth considering realistically how long it might take to spot a potential data loss/failure/corruption/compromise and extend ones history to a feasible window which could pay dividends later on.

    I also second sys admin’s comment re: allowing machines to access their data backups (something for the S3-crowd), we have a solution for this only allowing one container to be accessible at a time (e.g. we have a container per backup which cannot be accessed outside the specific backup window).

    Another point here is encryption. It’s all well and good having security on physical servers, etc, but client backups can be an astoundingly weak attack surface in many cases. Passwords on backup containers, encryption on tapes. And please tell me nobody still backs up via vanilla FTP (I’ve seen it happen)!

  • Ben D.

    I’m not sure if it is necessary to use both, though, since they both replicate files across multiple data centers.

    If S3 is inaccessible when you need to restore, then it really doesn’t matter how many data centers your data is replicated across. And S3 does become inaccessible from time to time.

    My painful experience with backup is that, as soon as one thing goes wrong, another thing will. So the more redunancy, the better.

  • Fred Hamranhansenhansen

    To get around hard drive failure, I’m heavily into Drobo and Time Machine. Both are ridiculously simple to use and both do their thing basically in real time. As you create documents, they’re being automatically stored on a second disk either almost immediately or within the hour, with no interaction from the user at all. If there is any kind of backup task that must be done, you’re simply not going to make as many backups.

    The Drobo is basically the ultimate USB/FireWire hard disk enclosure. It houses 4 3.5″ SATA disk mechanisms in 4 VCR-like slots in the front. Drobo uses the disks to create a “storage pool” that appears as one volume on the computer. At any time, one of the disks can fail and Drobo will prompt you to eject it and slot in a new one and that’s that. The volume does not have to be unmounted to do this, you can even be in the middle of a disk copy and it will continue. Your data is abstracted away from hard disk failure completely.

    Time Machine is great because it’s hourly and automatic and even painlessly easy to setup. You just plug a huge disk into a Mac and Time Machine asks if it can claim that disk for its own. You say yes and then in System Preferences you can tell Time Machine to backup additional volumes aside from the Startup disk. So attach your Web server and any other volumes and one Mac can create an hourly browsable backup with versioning, completely automatic. You can then browse “back in time” on all of your storage from that one Mac’s Finder. For the cost of a backup disk and less than 5 minutes setup you can backup a lot of data this way.

    Finally, I use an offsite backup to just store one copy of everything in a mountain in case of disaster. A key thing here is to do a test recovery once you’re backed up.

  • F.Danials

    I don’t currently have my dream site running live, but when this day finally arrives, I plan to use RAID (Mirroring), so if one HD Fails I’ve got a complete copy to use straight away.

  • Eric Ferraiuolo

    My MacBook:
    Hourly Time Machine backups to an USB attached external drive to my AirPort Extreme. Daily diff backups of the really important stuff to S3 via JungleDisk.

    My Slice (at Slicehost):
    Daily and Weekly snapshots using Slicehost’s backup service. Along with Daily Duplicity diffs GPG 2048bit encrypted sent to S3.

  • Ben D.

    I plan to use RAID (Mirroring), so if one HD Fails I’ve got a complete copy to use straight away

    You are joking, right?

  • eretz

    Eric Ferraiuolo: I don’t have experience of Slicehost as we’re primarily windows and they don’t offer Server 2008 (yet). However, they were recently acquired by Rackspace. How do you find their backup service?

    Interestingly enough JungleDisk was also bought by Rackspace at the same time, and should work smoothly with Mosso’s Cloud Files soon as well as S3 soon. They have a well supported API for the major languages (from C# to python, which I found quite good quick custom backup scripts).

    Ben D: I hope so too, but I’ve seen enterprises sold “RAID is backup” – it was pretty chilling!

    eretz

  • mikefarrow

    In my view the days of backup are over, they worked for standalone machines which ran for eight hours a day with the weekends off.

    As has been mentioned above, unless you are restoring and testing your backup on a regular basis you are no different to Ma.gnolia.

    If you are operating 24/7 in the cloud with a proper business model, by the time you restore and recover the world will have moved on.

    Payments will be half processed, which half? Transactions will be made but not on the backup, but which ones? The World will be Twittering about your misshaps, will you recover? Unlikely!

    Unless you are making offsite backups at a transaction level, for each and every transaction, with the ability to switch DNS and state, you will be dead before you know you have been hit.

    Let’s be honest most Web operations exist on luck, which in a venture fueled boom and no money changing hands was just about possible.

    But in a cash poor market with endless global competitors, forget it, stop hoping and start engineering solutions before someone else offers your customers one.

  • http://www.rebeccahaden.com rhaden

    Mozy? File Replication Pro? Printing out the really important stuff and putting it into a safe deposit box? We’ve got a choice of spreading around into multiple clouds (I’m unable to avoid a mental image of everyone in robes with harps, reclining on puffy clouds at varying levels) or relying on fallible hardware.

    In old-fashioned paper file drawers, people meticulously saved all kinds of stuff which was actually never referenced again. Chances are, we’re doing the same. Maybe identifying what’s really most important and implementing different levels of care would be sensible.

  • http://www.lucidsurf.com LucidSurf

    On a practical note, a simple (not very secure) way of ensuring your database gets backed up is by employing the services of a humble CRON job. Here’s an example on an Apache server which will back your database up at any predefined interval and email it to you.
    FIRST CRON (backs up): date=`date -I` ; mysqldump -uxxx -pxxxx dbname > /home/mysite/public_html/backups/xbackup_tablename_$date.sql
    SECOND CRON (emails it to you):date=`date -I` ; /usr/sbin/sendmail email@email.com < /home/mysite/public_html/backups/xbackup_tablename_$date.sql

    This is a quick and easy (yet unsecured, as email is not encrypted) method for backing up your database. You’ll receive an email containing the full SQL which can be used to restore the database should the unthinkable happen. The larger your database, the larger your email will be. I post this only as an example for the smaller sites out there, not as an enterprise solution (obviously)!

  • Jeroen

    We had BIG problems with RAID-1 when one of the disks didn’t fail completely, but had some ‘bad sector’ issues. Since files are read from 1 disk and written to 2 disks, and there’s no read-after-write, we had many corrupted files.

    E.g. you open a large document from the bad disk and don’t notice that several pages are corrupted, and save it back to both RAID drives. Not good!

    In my opinion RAID-1 is only useful when one of the mirror disks crashes and dies right away.

  • http://www.yourversion.com dolsen

    At my social discovery and bookmarking startup YourVersion, we focus on remote backup of 3 important components: the code base, the database, and our server configuration files. Our code is stored remotely at a hosted code repository service (which also performs backups). We have automated daily backups of our DB being pushed to a cloud storage service and do the same (less frequently) for our server configuration files.
    In the unlikely event that a server were to have a major issue, our goal (at this point at least) is “the ability to recover”, i.e., making sure we have all the files we need to get a replacement box ready. We’re not currently using cloud computing, but if “speed of recovery” was very important to a company, the ability that cloud computing offers to have “pre-canned” machine images (e.g., AMIs) and to quickly bring up new instances would be beneficial in reducing the recovery time.
    Dan Olsen
    CEO and Founder, YourVersion
    discover your version of the web

  • http://www.kinetasystems.com NikLP

    Dropbox, Cron … but I need something else for automated system backups, so thanks for the above advice :)

  • Farside

    I have a solution that is similar to the one LucidSurf uses. Daily I run these cron jobs:

    /bin/sh /usr/home/myusername/mysqlbackup.sh
    /bin/sh /usr/home/myusername/zipmysqlbackup.sh

    The first sh file dumps my databases, use one such line for each database:
    mysqldump -uxxx -pxxxx --opt dbname > /usr/home/myusername/backups/backup_dbname.sql

    The second zips it into the zip directory:
    cd /usr/home/myusername/backup/zip
    tar -zcvf db_backups.tgz ../*.sql

    Then I use Cobian Backup (http://www.educ.umu.se/~cobian/index.htm) to daily download my zip file via FTP, and keep separate versions for the last 4 days, so I have a copy on my server and one locally. As LucidSurf said, It’s not the most elegant solution, but it’s cheap.

  • Ryu

    for MySQL i’ve used the PHP-Tool MySQLDumper several Times, its nice thing so try it :)
    http://www.mysqldumper.de/en/

  • BSBC

    About backup strategies, My web site is small now and I can mange data lose easily because I have it on my PC . However with increasing data I fell some how afraid of losing data either with small mistake like saving wrong data over new one or by losing links between pages..I will start using database as my web site topics increasing but it is too early for me … Best way form my point is saving data frequently in DVD or any external drive but not flash it is not safe..
    Using other server is not efficient and cost money and you may not use it..
    I like my web sit it steel small

  • Anonymous

    A complete business model should also include an option for end-user managed backups. That allows prudent customers to control their own data.

  • Michael Linehan

    All these technical solutions —- I am very interested in the educational/psychological aspects. It makes sense when a one-person business owner who barely knows how to turn on the computer doesn’t have a back-up strategy. But how the heck can a company like this be so clueless, and what can be done about that phenomenon?
    Part of that maybe becomes, how can back-up solution companies get through the fog surrounding such people’s brains?

  • essexboyracer

    For the small site owners here, mainly running cPanel (which as far as the version i’m on doesn’t have automated email backups). you could try something along the lines of LucidSurf but use openSSL to encrypt the archive, then email or whatever thus making it a tad more secure.

    $command .= 'openssl enc -aes-256-cbc -salt -in ' . $document_root . 'backup_.tgz -out ' . $compath . '/username_backup.enc
    $output = shell_exec($command);

  • picohax

    (repeating for emphasis)
    Once you have your own personal copy of all your data (any and all data that helps keep your business running) make sure to convert it into long term storage media – like CD/DVD or tape.
    Use multisession burning for CDs and DVDs if your data is significantly less than a gigabyte, but backups need to be more frequent than weekly.
    Disks, flash drives, external storage – basically electronic storage of any kind can get screwed unexpectedly. Optical storage is the cheapest and safest option for small businesses or personal data.
    Get a good DVD burner and a pack of 25-100 CDs/DVDs.
    It helps a lot.

    Make a written backup policy that every employee must be able to recite like their favorite music/number or dearest prayer. No kidding. Only when you say that, will your employees take it seriously.

    “Don’t know backup steps? Lose one day’s pay every month.”

  • Dorsey

    My comments assume that you’re using a reputable hosting service with all of the customary security and that performs full server backups nightly. We use two systems, not for fall-over (our host provides that), but rather for data and site backup. One system is exposed to the world, while the other is private to us. We run automysqlbackup.sh from SourceForge to create DB backups nightly during a low-load period, and then FTP the whole shooting match to our other server.

    The proof is in the pudding (to use another aphorism): we had to recover a DB for one of our sites this past Fall, and there it was in all it’s glory. We suffered a few tense moments before the relief of full recovery took hold, and there was joy throughout the land.

    Dorsey

  • Eric Ferraiuolo

    eretz: Slicehost’s backup service uses disk images/snapshots. So I find it useful if I were to mess something up I could restore my slice from a previous snapshot.

    Yes, you’re correct, both Slicehost and JungleDisk were aquired by Rackspace at the same time; I wrote my thoughts about that on my blog: Don’t Sell Your Small Giant