In the beginning of time, we had tape backup. In the days before Terabyte hard drives, it was the only medium that would hold the large amount of data that constituted a backup. It did have some problems though…
- It was very slow
- It was, for the most part, a manual process. You had to change the tapes every day and establish a rotation by which you would take tapes off-site in case the building burned down.
- It had a nasty habit of letting you think you had a good backup, but when you went to use it, there was nothing on the tape.
- You had to replace the tapes every year.
- Since it was so slow, you had to wait until the end of the day to do the backup overnight, so if your server crashed at 4:59 p.m. you lost whole days worth of work.
- The software to run the tape backups was expensive and difficult to use because you had to make small changes each day, wait for a backup to run, and then see if the changes worked.
- The tape drives themselves were expensive and prone to failure
- Restoration of a failed server generally took 3-5 days to get the new server, load and configure the OS, install the apps, retrieve the data from tape, and restore it.
- Depending on how many tapes you had, restoring iterative files (getting something from 3 months ago) was not possible because that tape had been over-written 2 months ago.
A few years ago, we came up with multiple redundancy off-site backup with a mirrored hard drive enclosure using “the Cloud”. This brought us several improvements….
- Backup was now automated. You just told it what to do and it sent you an e-mail saying it’s done.
- Restores of individual files became much faster since you could now copy files directly from the local backup drives or the cloud to the server.
- No more tapes to switch out.
- You could now restore iterative data. Most Cloud backup providers keep 6 months worth of data.
There were still some problems though…
- Rebuild time for a catastrophically failed server was reduced to 2-4 days, but it was still 2-4 days to get the equipment, load and configure the OS, install the apps, and restore the data from the local enclosure or cloud storage. So nobody’s working for 2-4 days.
- You still had to wait until the end of the day to do your backup, so if your server died at 4:59 p.m. you still lost a whole days worth of work.
We are now in the age of Virtualization and BDRs (Backup and Disaster Recovery Servers). With these advancements we have resolved all of the problems of tape backup and MROSBWMHDE…
A virtualized server doesn’t care what hardware it is running on. If you have a virtualized backup of your server, you can copy it from an HP to Dell like a common data file and fire it up in minutes, assuming you have another server lying around.
The BDR is the final piece of the puzzle. There is a high speed connection between your main server and the BDR. The BDR is taking snapshots of your entire server as often as every 15 minutes without any perceptible performance effect on your main server. Instead of waiting until the end of the day, the BDR is sending its data off-site into the cloud continually and it is storing an entire virtualized copy of your server in the cloud along with the ability to get iterative copies of that server and its data.
So, let’s discuss failure and recovery scenarios and the options for handling them.
Your main server has a catastrophic hardware failure.
- As soon as a technical resource is available, we spin up the virtualized copy of your main server inside the BDR (it is a server after all).
- Your employees are pointed at that server, so now instead of them not working for 3-5 days, they are back to work in as little as a 1-3 hours using the BDR as a temporary server while we get the new hardware and put the main server back in order.
- Instead of losing 1 days worth of work/data, we’ve lost maybe 15 minutes to an hour of data.
- When your new server hardware comes in, instead of taking 1-3 days of labor to get it configured, setup, and data restored, we copy the latest version of the virtualized server onto it from the BDR, fire it up, and switch the users back to the main server in a day or less. The only choke point is waiting for the new hardware to arrive, but it doesn’t really matter because everyone is still working from the copy of your server running on the BDR while all of the repairs are going on.
You have a fire. The main server and BDR are destroyed.
- Since we have a virtualized copy of your server in the Cloud, we spin up a copy of your server in the cloud. Your users connect to it and are working again in a matter of hours. It will be slow because everything has to go through your broadband connections instead of your LAN and MPLS, but it will function.
- The cloud provider charges a good amount to implement this option, but everyone is working the same or next day, as opposed to days or weeks of down time.
The price to purchase and implement a BDR is a little more than the old tape drive and Backup Exec software used to cost, but not much more. And the benefits in “time to full recovery” and “user down time during recovery” are astounding.
Please contact me at dave@microdoctor.com David Daichednt, VP of Operations at Micro Doctor Inc. if you would like to know more.