reviewing the why? how? and where? of
today's enterprise disk backup techniques
by Andrew Brewerton - Technical Director (Europe) -
BakBone Software
- published July 14, 2009
|
Why backup?
|
It is an inescapable
fact that computers and their disk storage go wrong - not often, but they do.
Even when they do not, they can be stolen, destroyed by fires or other
disasters, damaged by power failures, or suffer one of a host of other events
that cause data to be lost, such as a user deleting a file by mistake or saving
the wrong version. When that happens, you need a backup.
Replication and mirroring do fill some very important data protection
needs, because if one whole system is lost, the other should be able to take
over. However, a replica or mirror is not a backup, because if data on one half
of a pair is corrupted or deleted, it will also be deleted or corrupted on the
other half.
Backup is all
about saving data at a specific time and in a consistent state. It allows you to
go back to a specific recovery point, for example to recover an earlier version
of a corrupted or deleted file or database, to restore a crashed system to its
previous working state, or to take a copy off-site for safekeeping. A backup
traditionally has been stored on non-volatile and removable media, such as
tape, due to that need to
move copies off-site. Increasingly though, this role is being filled by
non-removable media at a different site - either your own secondary site, or a
facility managed by a service provider - with data networks replacing lorries as
the transport mechanism.
When it comes to restoring data, disk's big advantage over tape is
that it is random-access rather than sequential access. That means that if you
only need one file or a few files back, it will be faster and easier to find and
recover from disk.
What backup and recovery methods you use will depend on 2 factors
- the recovery point objective (RPO), i.e. how much data the
organization can afford to lose or re-create, and
- the recovery time objective (RTO), which is how long you have to
recover the data before its absence causes business continuity problems.
For instance, if the RPO is 24 hours, daily backups to tape could be
acceptable, and any data created or changed since the failure must be manually
recovered. An RTO of 24 hours similarly means the organization can manage
without the system for a day.
If the RPO and RTO were seconds rather than hours, the backup
technology would not only have to track data changes as they happened, but it
would also need to restore data almost immediately. Only disk-based continuous
data protection (CDP) schemes could do that.
Ways to use disk
Most current disk-based backup technologies fall into 1 of 4 basic
groups, and can be implemented either as an appliance, or as software which
writes to a dedicated partition on a NAS system or other storage array:
Virtual tape library (VTL)
One of the first backup
applications for disk was to emulate a tape drive. This technique has been used
in mainframe tape libraries
for many years, with the emulated tape acting as a kind of cache - the backup
application writes a tape volume to disk, and this is then copied or cloned to
real tape in the background.
Using a VTL means there is no need to change your software or
processes - they just run a lot faster. However, it is still largely oriented
towards system recovery, and the restore options are pretty much the same as
from real tape. Generally, the virtual tapes can still be cloned to real tapes
in the background for longer-term storage; this process is known as D2D2T, or
disk-to-disk-to-tape.
Simpler VTLs take a portion of the file space, create files
sequentially and treat it as tape, so your save-set is the same as real tape.
That can waste space though, as it allocates the full tape capacity on disk even
if the tape volume is not full
More advanced VTLs get around this problem by layering on storage
virtualization technologies. In particular this means thin provisioning, which
allocates a logical volume of the desired capacity but does not physically write
to disk unless there is actual data to write, and it has the ability to take
capacity from anywhere, e.g. from a
SAN, from local disk, and
even from NAS.
Disk-to-disk (D2D)
Typically this involves backing up
to a dedicated disk-based appliance or a low-cost
SATA array, but this
time the disk is acting as disk, not as tape. Most backup applications now
support this. It makes access to individual files easier, although system
backups may be slower than streaming to a VTL.
An advantage of not emulating tape is that you are no longer bound by
its limitations. D2D systems work as random-access storage, not sequential,
which allows the device to send and receive multiple concurrent streams, for
example, or to recover individual files without having to scan the entire backup
volume.
D2D can also be as simple as using a
removable disk
cartridge instead of tape. The advantage here is backup and recovery speed,
while the disk cartridge can be stored or moved offsite just as a tape cartridge
would be.
|
 |
Snapshot
This
takes a point-in-time copy of your data at scheduled intervals, and is almost
instant. However, unless it is differential (which is analogous to an
incremental backup) or includes some form of compression, data reduction or
de-duplication technology, each snapshot will require the same amount of disk
storage as the original.
Differential snapshot technologies are good for roll-backs and file
recovery, but may be dependent on the original copy, so are less useful for
disaster recovery.
Many NAS vendors offer tools which can snapshot data from a NAS
server or application server on one site to a NAS server at a recovery location.
However, in recent years snapshot technology has become less dependent on the
hardware - it used to be mainly an internal function of a disk array or NAS
server, but more and more software now offers snapshot capabilities.
Continuous data protection (CDP)
Sometimes called
real-time data protection, this captures and replicates file-level
changes as they happen, allowing you to wind the clock back on a file or system
to almost any previous point in time. The changes are stored at byte or block
level with metadata that notes which blocks changed and when, so there is often
no need to reconstruct the file for recovery - the CDP system simply gives you
back the version that existed at your chosen time. Any changes made since then
will need to be recovered some other way, for example via journaling within the
application.
CDP is only viable on disk, not tape, because it relies on having
random access to its stored data.
Depending on how the CDP process
functions, one potential drawback is that the more granular you make your CDP
system, the more it impacts performance of the system and application. So
technologies that do not rely solely on snapshot technology offer an advantage.
In addition, it can be necessary to roll forward or backward to find the version
you want. One option here is to use CDP to track and store changes at very
granular level, then convert the backed-up data to point-in-time snapshots for
easier recovery.
Beyond data protection, a well designed CDP solution can bring other
advantages, such as a lower impact on the application and server. It also moves
less data over the network than file-based protection schemes, as it sends only
the changed bytes.
Coherency and recovery
In order to be useful, a backup has to be coherent - a copy of
something that is in the middle of being updated cannot reliably be restored.
With traditional backup methods, applications would be taken offline
for backup, usually overnight, but newer backup methods such as snapshots and
CDP are designed to work at any time.
Snapshots provide a relatively coarse temporal granularity, so are
more likely to produce a complete and coherent backup. However, they will miss
any updates made since the last snapshot. The fine-grained approach of CDP is
less likely to lose data, but it may be harder to bring the system back to a
coherent state.
How you achieve a coherent backup will depend on the application or
data.
For instance, with unstructured file systems you need to find a
known-good file version - typically the last closed or saved version. For files
that can stay open a long time, you need to initiate a file system flush and
create a pointer to that in the metadata.
To recover data, you would then find the right point in the CDP
backup, wait for the data to copy back to the application server and then
reactivate the application. However, that means that the more data you have, and
the slower your network is, the longer recovery will take.
Fortunately, technologies are emerging to speed up this process. These
provide the application with an outline of the restored data that is enough to
let it start up, even though all the data has not yet truly been restored; a
software agent running alongside the application then watches for data requests
and reprioritizes the restoration process accordingly - in effect it streams
the data back as it is called for.
Schemes such as this can have applications up and running in less than
10 minutes, as the quickly recovered shell-file is just a few megabytes. Of
course it does still take time to fully restore the application, but it does
allow users to start using it again immediately.
One other issue that may affect the choice of snapshots or CDP is the
level of interdependency within the application and its files. If there is too
much interdependency, it will be more difficult to find a consistent recovery
point. A potential solution is to choose software that is application-aware and
can apply granular recovery intelligently, because it knows the dependencies
involved.
Power and efficiency issues
One thing that must be said in tape's favor is that its power
consumption for offline data storage is very low - potentially as low as the
cost of the air-conditioning for the shelf space to keep the cartridges on.
Removable disk cartridges can match that of course, but only for traditional
backup processes with their attendant delays.
To use newer backup processes such as snapshots and CDP requires the
disk storage to be online. D2D hardware developers have therefore come up with
schemes such as
MAID
(massive array of idle disks), which reduces power consumption by putting hard
disks into a low-power state when they are not being accessed.
MAID-type systems from the likes of
COPAN,
Hitachi Data Systems and
Nexsan Technologies, and
related technologies such as
Adaptec's IPM
(intelligent power management)
RAID controllers,
therefore allow banks of disk drives to operate in different power states at
varying times.
For instance, they can automate drives to go into standby mode or even
spin down completely during idle periods. If a drive is accessed while powered
down, the controller will spin it back up; alternatively the administrator can
define peak IT activity periods when drives will never be spun down. The
controller also monitors drives that have been powered down for a while, to make
sure they still work OK. Conversely, when drives do need to be accessed these
storage arrays implement staggered spin-up techniques. This is to avoid
overloading an array's power supply by trying to power up all its drives at the
same time.
It is claimed that these power management techniques can be
configured to reduce a drive's power consumption by up to 70%, without
sacrificing performance. Higher reductions are possible, but may come at the
cost of added latency and/or lower throughput.
Deduplication
There is more to using disks for backup than merely speed. A big
advantage of disk over tape is that disk storage is random-access, whereas tape
can only be read sequentially. That makes it feasible to reprocess the data on
disk once it has been backed up, and as well as snapshots and CDP, that has
enabled another key innovation in backup: de-duplication.
This is a compression or data reduction technique which takes a whole
data set or stream, looks for repeated elements, and then stores or sends only
the unique data. Obviously, some data sets contain more duplication than others
- for example, virtual servers created from templates will be almost identical.
It is not unusual for users to report compression ratios of 10:1 or more, while
figures of 50:1 have been reported in some cases (by
ExaGrid ).
In the past, de-duplication has typically been built into storage systems or
hardware appliances, and has therefore been hardware-dependent. That is changing
now though, with the emergence of backup software that includes de-duplication
features and is hardware-independent.
The technology is also being used for backups between data centers, or
between branch offices and headquarters, as it reduces the amount of data that
must be sent over a WAN connection.
D2D in branch offices and remote offices
There are many challenges involved in backing-up branch offices
and remote offices.
Who changes the tapes and takes them off-site, for
instance?
Plus, local data volumes are growing and more sites now run
applications locally, not just file-and-print, so what do you do when the backup
window becomes too small?
One possibility is to backup or replicate to headquarters, preferably
using CDP or de-duplication technology to reduce the load on the WAN by sending
only the changed data blocks. The drawback with anything online or consolidated
is how long it takes to restore a failed system, however. Even if you have the
skills on hand and a fast connection, it can take an enormous time to restore
just a few hundred gigabytes of data.
D2D is the obvious next step - it can be installed as a VTL, so it
functions the same way as tape but faster, but it also gives you a local copy of
your files for recovery purposes. That local copy will probably answer 90 to 95%
of recovery needs.
Add asynchronous replication to headquarters, and you can store one
generation of backups locally with more consolidated at the data center. Layer
de-duplication on top, and there is less data to backup from the branch office
and therefore less bandwidth consumed.
Consolidating backups at the data center can bring other benefits too,
in particular it enables information to be searched and archived more readily.
It also takes the backup load off the branch offices as their backups are simply
for staging and fast local recovery, so they no longer need to be retained.
Should the entire branch or remote office be lost, there are
techniques to speed up the process of restoring a whole server or storage
system. An example is the use of external
USB hard drives, sent by
courier and used to 'seed' the recovered system. Even faster though are
data-streaming technologies. This virtualizes the recovery process, presenting
the application with an image of its data and streaming the underlying data back
as it is called for.
|
. |
Editor's comments:-
thanks Andrew for bringing us up to date with the range of processes and
technologies available in the disk backup market.
See also:-
BakBone
- editor mentions on StorageSearch.com
StorageSearch.com has had a long
affinity with enterprise disk backup.
In 2001 - in an article
called - the Next Decade in
Storage - StorageSearch.com precisely foretold the reasons why D2d would
eventually replace tape backup. And in March 2002 - StorageSearch.com
became the 1st major storage publication to create a dedicated web page for
D2d news and vendors. Since
2006 - this subject has always been in the top 5 vertical subject
directories viewed by our readers.
There are thousands of articles on
StorageSearch.com about the storage market, here below are the most popular in
recent months. |
| |
.. |
 |
. |
|
. |
|
. |
|
. |
|
|
|
. |
|
. |
| |