Backup Site

Posted by admin | Uncategorized | Thursday 20 November 2008 3:34 am
A backup site is a location where an organization can easily relocate following a disaster, such as fire, flood, terrorist threat or other disruptive event. This is an integral part of the disaster recovery plan and wider business continuity planning of an organisation.

A backup site can be another location operated by the organisation, or contracted via a company that specializes in disaster recovery services. In some cases, an organisation will have an agreement with a second organisation to operate a joint backup site.

There are three types of backup sites, including cold sites, warm sites, and hot sites. The differences between the types are determined by the costs and effort required to implement each. Another term used to describe a backup site is a work area recovery site

Cold Sites
A cold site is the most inexpensive type of backup site for an organization to operate. It does not include backed up copies of data and information from the original location of the organization, nor does it include hardware already set up. The lack of hardware contributes to the minimal startup costs of the cold site, but requires additional time following the disaster to have the operation running at a capacity close to that prior to the disaster.

Hot Sites

A hot site is a duplicate of the original site of the organization, with full computer systems as well as near-complete backups of user data. Real time synchronization between the two sites may be used to completely mirror the data environment of the original site using wide area network links and specialized software. Following a disruption to the original site, the hot site exists so that the organization can relocate with minimal losses to normal operations. Ideally, a hot site will be up and running within a matter of hours or even less. Personnel may still have to be moved to the hot site so it is possible that the hot site may be operational from a data processing perspective before staff has relocated. The capacity of the hot site may or may not match the capacity of the original site depending on the organizations requirements. This type of backup site is the most expensive to operate. Hot sites are popular with organizations that operate real time processes such as financial institutions, government agencies and ecommerce providers

Choosing

Choosing the type is mainly decided by an organisations cost vs. benefit strategy. Hot sites are traditionally more expensive than cold sites since much of the equipment the company needs has already been purchased and thus the operational costs are higher. However if the same organisation loses a substantial amount of revenue for each day they are inactive then it may be worth the cost. Another advantage of a hot site is that it can be used for operations prior to a disaster happening.

The advantages of a cold site are simple–cost. It requires much fewer resources to operate a cold site because no equipment has been bought prior to the disaster. The downside with a cold site is the potential cost that must be incurred in order to make the cold site effective. The costs of purchasing equipment on very short notice may be higher and the disaster may make the equipment difficult to obtain.

When contracting services from a commercial provider of backup site capability organisations should take note of contractual usage provision and invocation procedures, providers may sign up more than one organisation for a given site or facility, often depending on various service levels. This is a reasonable proposition as it is unlikely that all organisations using the service are likely to need it at the same time and it allows the provider to offer the service at an affordable cost. However, in a large scale incident that affects a wide area it is likely that these facilities will become over subscribed.

source: http://en.wikipedia.org/wiki/Backup_site

Backup Software

Posted by admin | Uncategorized | Thursday 20 November 2008 3:32 am

Backup software is a computer program used to perform a complete back up of a file, data, database, system or server. The back up software enables you to make an exact duplicate of everything contained on the original source. This software must also be used to perform a recovery of the data or system in the event of a disaster.

Choosing the right back up software is very important. Different software uses different methods to back up data. For example, some software will back up only to tape, while others will back up to disk. Some software uses a file-by-file method to back up. Other software, back up an entire volume or drive using drive image technology.

Some things to consider when choosing back up software are the length of time it will take to back up your systems and data, the reliability of the hardware on which your back up will be stored and the experience of the developers of the software.

Key features of backup software

There are several features of backup software that make it more effective in backing up data.

Volumes

Voluming allows the ability to compress and split backup data into separate parts for storage on smaller, removable media such as CDs. It was often used because CDs were easy to transport off-site and inexpensive compared to hard drives or servers.

However, the recent increase in hard drive capacity and decrease in drive cost has made voluming a far less popular solution. The introduction of small, portable, durable USB drives, and the increase in broadband capacity has provided easier and more secure methods of transporting backup data off-site.

Data compression

Since hard drive space has cost, compressing the data will reduce the size allowing for less drive space to be used to save money.

Remote backup

Several factors have contributed to a surge in the use of remote or offsite backup of data to geographically distant sites.

1. The rapid growth of data and its importance to business.
2. The rapid adoption of high-speed broadband internet.
3. The falling price of disk drive technology.
4. The rise of risks such as hackers, hurricanes, viruses, hardware failure.

These structural changes present opportunities for young startups, which are serving this growing market with next-generation backup technologies that automatically backup data to offsite data centers (sometimes called vaults) via the Internet. Many banks, stock exchanges, and other large institutions often do this to ensure data integrity.

Access to open files

Many backup solutions offer a plug-in for access to exclusive, in use, and locked files.

Incremental backups

Backup solutions generally support incremental backups in addition to full backups, so only material that is newer or changed compared to the backed up data is actually backed up, in order to dramatically increase the speed of the backup process.

Schedules

Backup schedules are usually supported to reduce maintenance of the backup tool and increase the reliability of the backups.

Encryption

To prevent data theft some backup software offers cryptography features to protect the backup.

source: http://en.wikipedia.org/wiki/Backup_software

Backup

Posted by admin | Uncategorized | Thursday 20 November 2008 3:32 am
In information technology, backup refers to making copies of data so that these additional copies may be used to restore the original after a data loss event. These additional copies are typically called “backups.” Backups are useful primarily for two purposes. The first is to restore a state following a disaster (called disaster recovery). The second is to restore small numbers of files after they have been accidentally deleted or corrupted.

Since a backup system contains at least one copy of all data worth saving, the data storage requirements are considerable. Organizing this storage space and managing the backup process is a complicated undertaking. A data repository model can be used to provide structure to the storage. In the modern era of computing there are many different types of data storage devices that are useful for making backups. There are also many different ways in which these devices can be arranged to provide geographic redundancy, data security, and portability.

Before data is sent to its storage location, it is selected, extracted, and manipulated. Many different techniques have been developed to optimize the backup procedure. These include optimizations for dealing with open files and live data sources as well as compression, encryption, and de-duplication, among others. Many organizations and individuals try to have confidence that the process is working as expected and work to define measurements and validation techniques. It is also important to recognize the limitations and human factors involved in any backup scheme.

Due to a considerable overlap in technology, backups and backup systems are frequently confused with archives and fault-tolerant systems. Backups differ from archives in the sense that archives are the primary copy of data and backups are a secondary copy of data. Archives are the primary copy of the item, usually put away for future use, while backups are the secondary copy, kept on hand to replace the original item. Backup systems differ from fault-tolerant systems in the sense that backup systems assume that a fault will cause a data loss event and fault-tolerant systems assume a fault will not.

Storage, the base of a backup system

Data repository models

Any backup strategy starts with a concept of a data repository. The backup data needs to be stored somehow and probably should be organized to a degree. It can be as simple as a sheet of paper with a list of all backup tapes and the dates they were written or a more sophisticated setup with a computerized index, catalog, or relational database. Different repository models have different advantages. This is closely related to choosing a backup rotation scheme.

Unstructured

An unstructured repository may simply be a stack of floppy disks or CD-R/DVD-R media with minimal information about what was backed up and when. This is the easiest to implement, but probably the least likely to achieve a high level of recoverability.

Full + Incrementals

A Full + Incremental repository aims to make storing several copies of the source data more feasible. At first, a full backup (of all files) is taken. After that an incremental backup (of only the files that have changed since the previous full or incremental backup) can be taken. Restoring whole systems to a certain point in time would require locating the full backup taken previous to that time and all the incremental backups taken between that full backup and the particular point in time to which the system is supposed to be restored. This model offers a high level of security that something can be restored and can be used with removable media such as tapes and optical disks. The downside is dealing with a long series of incrementals and the high storage requirements.

Full + Differential

A full + differential backup differs from a full + incremental in that after the full backup is taken, each partial backup captures all files created or changed since the full backup, even though some may have been included in a previous partial backup. Its advantage is that a restore involves recovering only the last full backup and then overlaying it with the last differential backup. The downside would be that it takes more storage than the Full + Incrementals.

Variable Dump Level

With variable dump levels (0-9), a 0 level dump is a full backup. Other levels backup all files that have changed since the last backup of a lower level. This allows planning strategies that achieve a compromise between the advantages of incrementals and differentials.

Mirror + Reverse Incrementals

A Mirror + Reverse Incrementals repository is similar to a Full + Incrementals repository. The difference is instead of an aging full backup followed by a series of incrementals, this model offers a mirror that reflects the system state as of the last backup and a history of reverse incrementals. One benefit of this is it only requires an initial full backup. Each incremental backup is immediately applied to the mirror and the files they replace are moved to a reverse incremental. This model is not suited to use removable media since every backup must be done in comparison to the mirror.

Continuous data protection

This model takes it a step further and instead of scheduling periodic backups, the system immediately logs every change on the host system. This is generally done by saving byte or block-level differences rather than file-level differences.[5] It differs from simple disk mirroring in that it enables a roll-back of the log and thus restore of old image of data.

Storage media

Regardless of the repository model that is used, the data has to be stored on some data storage medium somewhere.

Magnetic tape

Magnetic tape has long been the most commonly used medium for bulk data storage, backup, archiving, and interchange. Tape has typically had an order of magnitude better capacity/price ratio when compared to hard disk, but recently the ratios for tape and hard disk have become a lot closer.[6] There are myriad formats, many of which are proprietary or specific to certain markets like mainframes or a particular brand of personal computer. Tape is a sequential access medium, so even though access times may be poor, the rate of continuously writing or reading data can actually be very fast. Some new tape drives are even faster than modern hard disks.

Hard disk

The capacity/price ratio of hard disk has been rapidly improving for many years. This is making it more competitive with magnetic tape as a bulk storage medium. The main advantages of hard disk storage are low access times, availability, capacity and ease of use.[7] External disks can be connected via local interfaces like SCSI, USB, FireWire, or eSATA, or via longer distance technologies like Ethernet, iSCSI, or Fibre Channel. Some disk-based backup systems, such as Virtual Tape Libraries, support data de-duplication which can dramatically reduce the amount of disk storage capacity consumed by daily and weekly backup data.

Optical disc

A recordable CD can be used as a backup device. One advantage of CDs is that they can be restored on any machine with a CD-ROM drive. In addition, recordable CD’s are relatively cheap. Another common format is recordable DVD. Many optical disk formats are WORM type, which makes them useful for archival purposes since the data can’t be changed. Other rewritable formats can also be utilized such as CD-RW or DVD-RAM. The newer HD-DVDs and Blu-ray Discs dramatically increase the amount of data possible on a single optical storage disk, though, as yet, the hardware may be cost prohibitive for many people. Additionally the physical lifetime of the optical disk has become a concern as it is possible for some optical disks to degrade and lose data within a couple years.

Floppy disk

During the 1980s and early 1990s, many personal/home computer users associated backup mostly with copying floppy disks. The low data capacity of a floppy disk makes it an unpopular and obsolete choice today.

Solid state storage

Also known as flash memory, thumb drives, USB flash drives, CompactFlash, SmartMedia, Memory Stick, Secure Digital cards, etc., these devices are relatively costly for their low capacity, but offer excellent portability and ease-of-use.
Remote backup service
As broadband internet access becomes more widespread, remote backup services are gaining in popularity. Backing up via the internet to a remote location can protect against some worst-case scenarios such as fires, floods, or earthquakes which would destroy any backups in the immediate vicinity along with everything else. There are, however, a number of drawbacks to remote backup services. First, internet connections (particularly domestic broadband connections) are generally substantially slower than the speed of local data storage devices, which can be a problem for people who generate or modify large amounts of data. Secondly, users need to trust a third party service provider with both privacy and integrity of backed up data. The risk associated with putting control of personal or sensitive data in the hands of a third party can be managed by encrypting sensitive data so that its contents cannot be viewed without access to the secret key.

Managing the data repository

Regardless of the data repository model or data storage media used for backups, a balance needs to be struck between accessibility, security and cost.

On-line

On-line backup storage is typically the most accessible type of data storage, which can begin restore in milliseconds time. A good example would be an internal hard disk or a disk array (maybe connected to SAN). This type of storage is very convenient and speedy, but is relatively expensive. On-line storage is vulnerable to being deleted or overwritten, either by accident, or in the wake of a data-deleting virus payload.

Near-line

Near-line storage is typically less accessible and less expensive than on-line storage, but still useful for backup data storage. A good example would be a tape library with restore times ranging from seconds to a few minutes. A mechanical device is usually involved in moving media units from storage into a drive where the data can be read or written.

Off-line

Off-line storage is similar to near-line, except it requires human interaction to make storage media available. This can be as simple as storing backup tapes in a file cabinet. Media access time can be anywhere from a few seconds to more than an hour.

Off-site vault

To protect against a disaster or other site-specific problem, many people choose to send backup media to an off-site vault. The vault can be as simple as the System Administrator’s home office or as sophisticated as a disaster hardened, temperature controlled, high security bunker that has facilities for backup media storage.
Backup site, Disaster Recovery Center or DR Center
In the event of a disaster, the data on backup media will not be sufficient to recover. Computer systems onto which the data can be restored and properly configured networks are necessary too. Some organizations have their own data recovery centers that are equipped for this scenario. Other organizations contract this out to a third-party recovery center. Note that because DR site is itself a huge investment, backup is very rarely considered preferred method of moving data to DR site. More typical way would be remote disk mirroring, which keeps the DR data as up-to-date as possible.

Selection, extraction and manipulation of data

Selection and extraction of file data

Deciding what to back up at any given time is a harder process than it seems. By backing up too much redundant data, the data repository will fill up too quickly. Backing up an insufficient amount of data can eventually lead to the loss of critical information.

Copying files

Making copies of files is the simplest and most common way to perform a backup. A means to perform this basic function is included in all backup software and all operating systems.

Partial file copying

Instead of copying whole files, one can limit the backup to only the blocks or bytes within a file that have changed in a given period of time. This technique can use substantially less storage space on the backup medium, but requires a high level of sophistication to reconstruct files in a restore situation. Some implementations require integration with the source filesystem.

Filesystem dump

Instead of copying files within a filesystem, a copy of the whole filesystem itself can be made. This is also known as a raw partition backup and is related to disk imaging. The process usually involves unmounting the filesystem and running a program like dump. This type of backup has the possibility of running faster than a backup that simply copies files. A feature of some dump software is the ability to restore specific files from the dump image.

Identification of changes

Some filesystems have an archive bit for each file that says it was recently changed. Some backup software looks at the date of the file and compares it with the last backup, to determine whether the file was changed.

Versioning file system

A versioning filesystem keeps track of all changes to a file and makes those changes accessible to the user. Generally this gives access to any previous version, all the way back to the file’s creation time. An example of this is the Wayback versioning filesystem for Linux.

Selection and extraction of live data

If a computer system is in use while it is being backed up, the possibility of files being open for reading or writing is real. If a file is open, the contents on disk may not correctly represent what the owner of the file intends. This is especially true for database files of all kinds. The term fuzzy backup can be used to describe a backup of live data that looks like it ran correctly, but does not represent the state of the data at any single point in time. This is because the data being backed up changed in the period of time between when the backup started and when it finished. For databases in particular, fuzzy backups are worthless.

Snapshot backup

A snapshot is an instantaneous function of some storage systems that presents a copy of the filesystem as if it was frozen in a specific point in time, often by a copy-on-write mechanism. An effective way to back up live data is to temporarily quiesce it (e.g. close all files), take a snapshot, and then resume live operations. At this point the snapshot can be backed up through normal methods. [10] While a snapshot is very handy for viewing a filesystem as it was at a different point in time, it is hardly an effective backup mechanism by itself.

Open file backup

Many backup software packages feature the ability to handle open files in backup operations. Some simply check for openness and try again later. File locking is useful for regulating access to open files.
When attempting to understand the logistics of backing up open files, one must consider that the backup process could take several minutes to back up a large file such as a database. In order to back up a file that is in use, it is vital that the entire backup represent a single-moment snapshot of the file, rather than a simple copy of a read-through. This represents a challenge when backing up a file that is constantly changing. Either the database file must be locked to prevent changes, or a method must be implemented to ensure that the original snapshot is preserved long enough to be copied, all while changes are being preserved. Backing up a file while it is being changed, in a manner that causes the first part of the backup to represent data before changes occur to be combined with later parts of the backup after the change results in a corrupted file that is unusable, as most large files contain internal references between their various parts that must remain consistent throughout the file.

Cold database backup

During a cold backup, the database is closed or locked and not available to users. The datafiles do not change during the backup process so the database is in a consistent state when it is returned to normal operation.

Hot database backup

Some database management systems offer a means to generate a backup image of the database while it is online and usable (”hot”). This usually includes an inconsistent image of the data files plus a log of changes made while the procedure is running. Upon a restore, the changes in the log files are reapplied to bring the database in sync.

Selection and extraction of metadata

Not all information stored on the computer is stored in files. Accurately recovering a complete system from scratch requires keeping track of this non-file data too.

System description

System specifications are needed to procure an exact replacement after a disaster.

Boot sector

The boot sector can sometimes be recreated more easily than saving it. Still, it usually isn’t a normal file and the system won’t boot without it.

Partition layout

The layout of the original disk, as well as partition tables and filesystem settings, is needed to properly recreate the original system.

File metadata

Each file’s permissions, owner, group, ACLs, and any other metadata need to be backed up for a restore to properly recreate the original environment.

System metadata

Different operating systems have different ways of storing configuration information. Windows keeps a registry of system information that is more difficult to restore than a typical file.

Manipulation of data

It is frequently useful to manipulate the data being backed up to optimize the backup process. These manipulations can improve backup speed, restore speed, data security, and media usage.

Compression

Various schemes can be employed to shrink the size of the source data to be stored so that uses less storage space. Compression is frequently a built-in feature of tape drive hardware.

De-duplication

When multiple similar systems are backed up to the same destination storage device, there exists the potential for much redundancy within the backed up data. For example, if 20 Windows workstations were backed up to the same data repository, they might share a common set of system files. The data repository only needs to store one copy of those files to be able to restore any one of those workstations. This technique can be applied at the file level or even on raw blocks of data, potentially resulting in a massive reduction in required storage space. Deduplication can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach also reduces bandwidth required to send backup data to its target media. The process can also occur at the target storage device, sometimes referred to as inline or back-end deduplication;

Duplication
Sometimes backup jobs are duplicated to a second set of storage media. This can be done to rearrange the backup images to optimize restore speed, to have a second copy at a different location or on a different storage medium.

Encryption

High capacity removable storage media such as backup tapes present a data security risk if they are lost or stolen. Encrypting the data on these media can mitigate this problem, but presents new problems. First, encryption is a CPU intensive process that can slow down backup speeds. Second, once data has been encrypted, it can not be effectively compressed and the data compression function of many tape drives is ineffective. For this reason and since redundant data makes cryptanalytic attacks easier, many encryption implementations compress the data before encrypting it. Third, the security of the encrypted backups is only as effective as the security of the key management policy.

Staging

Sometimes backup jobs are copied to a staging disk before being copied to tape. This process is sometimes referred to as D2D2T, an acronym for Disk to Disk to Tape. This can be useful if there is a problem matching the speed of the final destination device with the source device as is frequently faced in network-based backup systems. It can also serve as a centralized location for applying other data manipulation techniques.

Managing the backup process

It is important to understand that backup is a process. As long as new data is being created and changes are being made, backups will need to be updated. Individuals and organizations with anything from one computer to thousands (or even millions) of computer systems all have requirements for protecting data. While the scale is different, the objectives and limitations are essentially the same. Likewise, those who perform backups need to know to what extent they were successful, regardless of scale.

Objectives

Recovery Point Objective (RPO)

The point in time that the restarted infrastructure will reflect. Essentially, this is the roll-back that will be experienced as a result of the recovery. The most desirable RPO would be the point just prior to the data loss event. Making a more recent recovery point achievable requires increasing the frequency of synchronization between the source data and the backup repository.

Recovery Time Objective (RTO)

The amount of time elapsed between disaster and restoration of business functions.

Data security

In addition to preserving access to data for its owners, data must be restricted from unauthorized access. Backups must be performed in a manner that does not compromise the original owner’s undertaking. This can be achieved with data encryption and proper media handling policies.

Limitations

An effective backup scheme will take into consideration the limitations of the situation.

Backup window

The period of time when backups are permitted to run on a system is called the backup window. This is typically the time when the system see the least usage and the backup process will have the least amount of interference with normal operations. The backup window is usually planned with users’ convenience in mind. If a backup extends past the defined backup window, a decision is made whether it is more beneficial to abort the backup or to increase the backup window.

Performance impact

All backup schemes have some performance impact on the system being backed up. For example, for the period of time that a computer system is being backed up, the hard drive is busy reading files for the purposes of the backup, and its full bandwidth is no longer available for other tasks. Such impacts should be analyzed.

Costs of hardware, software, labor

All types of storage media have a finite capacity with a real cost. Matching the correct amount of storage capacity (over time) with the backup needs is an important part of the design of a backup scheme. Any backup scheme has some labor requirement, but complicated schemes have considerably higher labor requirements. The cost of commercial backup software can also be considerable.

Network Bandwidth

Distributed backup systems can be impacted by limited network bandwidth.

Implementation

Meeting the defined objectives in the face of the above limitations can be a difficult task. The tools and concepts below can make that task more achievable.

Scheduling

Using a Job scheduler can greatly improve the reliability and consistency of backups by removing part of the human element. Many backup software packages include this functionality.

Authentication

Over the course of regular operations, the user accounts and/or system agents that perform the backups need to be authenticated at some level. The power to copy all data off of or onto a system requires unrestricted access. Using an authentication mechanism is a good way to prevent the backup scheme from being used for unauthorized activity.
Chain of trust
Removable storage media are physical items and must only be handled by trusted individuals. Establishing a chain of trusted individuals (and vendors) is critical to defining the security of the data.

Measuring the process

To ensure that the backup scheme is working as expected, the process needs to include monitoring key factors and maintaining historical data.

Backup validation

(also known as “Backup Success Validation”) The process by which owners of data can get information regarding how their data was backed up. This same process is also used to prove compliance to regulatory bodies outside of the organization, for example, an insurance company might be required under HIPAA to show “proof” that their patient data are meeting records retention requirements[16]. Disaster, data complexity, data value and increasing dependence upon ever-growing volumes of data all contribute to the anxiety around and dependence upon successful backups to ensure business continuity. For that reason, many organizations rely on third-party or “independent” solutions to test, validate, and optimize their backup operations (backup reporting).

Reporting

In larger configurations, reports are useful for monitoring media usage, device status, errors, vault coordination and other information about the backup process.

Logging

In addition to the history of computer generated reports, activity and change logs are useful for monitoring backup system events.

Validation

Many backup programs make use of checksums or hashes to validate that the data was accurately copied. These offer several advantages. First, they allow data integrity to be verified without reference to the original file: if the file as stored on the backup medium has the same checksum as the saved value, then it is very probably correct. Second, some backup programs can use checksums to avoid making redundant copies of files, to improve backup speed. This is particularly useful for the de-duplication process.

Monitored Backup

Backup processes are monitored by a third party monitoring center. This center alerts users to any errors that occur during automated backups. Monitored backup requires software capable of pinging the monitoring center’s servers in the case of errors.

Lore

Advice

* The more important the data that is stored on the computer the greater the need is for backing up this data.
* A backup is only as useful as its associated restore strategy.
* Storing the copy near the original is unwise, since many disasters such as fire, flood and electrical surges are likely to cause damage to the backup at the same time.
* Automated backup and scheduling should be considered, as manual backups can be affected by human error.
* Backups will fail for a wide variety of reasons. A verification or monitoring strategy is an important part of a successful backup plan.
* It is good to store backed up archives in open/standard formats. This helps with recovery in the future when the software used to make the backup is obsolete. It also allows different software to be used.

Events

* In 1997, during a fire at the headquarters of Credit Lyonnais, a major bank in Paris, system administrators ran into the burning building to rescue backup tapes because they didn’t have offsite copies. Crucial bank archives and computer data were lost. [17] [18]
* Privacy Rights Clearinghouse has documented [19] 16 instances of stolen or lost backup tapes (among major organizations) in 2005 & 2006. Affected organizations included Bank of America, Ameritrade, Citigroup, and Time Warner.
* On 3 January 2008, an email server crashed at TeliaSonera, a major Nordic telecom company and internet service provider. It was subsequently discovered that the last serviceable backup set was from 15 December 2007. Three hundred thousand customer email accounts were affected.

source: http://en.wikipedia.org/wiki/Backup