BytesOfProgress

Wiki


Backups & Snapshots

What is the difference?

A backup is a complete copy of your data, and it can also include the operating system if you perform a full system or disk image backup, but typical data backups may not include the OS. This is done for long-term storage and recovery in case of data loss or system failure. It takes more time and space because it duplicates everything to another location, like an external hard drive (stored off-site) or the cloud.

A snapshot is an image of your system at a specific moment. It’s used for short-term recovery and testing, allowing you to quickly revert to a previous state. Snapshots are faster to create and take up less disk space because they only record changes made since the last snapshot and are usually stored on the same device as the original data.


Backup

There are different types of backups designed for various needs and scenarios. These include full backups, incremental backups, and differential backups.


Full Backup

A full backup involves copying all the data from a system to a backup storage medium. This includes all files, folders, applications, and optionally, the operating system. The advantage of a full backup is that it’s easy to restore because everything is contained in one backup set. Full backups take significant amounts of time and storage space, which can be a exclusion criteria.


Incremental Backup

An incremental backup only saves the changes made since the last backup, whether it was a full or an incremental backup. This method is faster and requires less storage space compared to full backups because it only includes new or modified data. However, restoring from incremental backups can be more complex and time-consuming, as it involves using the last full backup plus all subsequent incremental backups in the correct sequence.

How to restore incremental backups? First restore the full backup, after that restore the incremental backups in the oder they were made:

Full ---> incr-01 ---> incr-02 ---> incr-03


Differential Backup

A differential backup saves all changes made since the last full backup. This method strikes a balance between full and incremental backups. It is faster to create than a full backup and simpler to restore than incremental backups, as it only requires the last full backup and the most recent differential backup. Differential backups take more time and space than incremental backups but less than full backups.

How to restore differential backups? First restore the full backup, then the latest differential backup:

Full ---> latest diff


Creating & storing backups the right way

Backups can be stored in various locations, each with its own advantages and disadvantages. External hard drives are portable and easy to use but can be lost or damaged. Network Attached Storage (NAS) offers centralized storage accessible from multiple devices but can be costly and requires network setup. Cloud storage provides scalability and remote access but relies on internet connectivity and may have ongoing costs. Traditional magnetic tape is cost-effective for large volumes of data and suitable for long-term storage, though it has slower read/write speeds and requires tape management.

Effective backup strategies are essential for ensuring data integrity and availability. The Grandfather-Father-Son (GFS) strategy involves rotating backups on a daily, weekly, and monthly basis, providing multiple recovery points and historical versions. The 3-2-1 rule recommends keeping three copies of your data on two different types of media, with one copy stored offsite. Scheduling backups during low activity periods, known as backup windows, helps minimize impact on system performance.

Data corruption during backups can occur if data is being written to or modified while the backup is running. This happens because the backup process might capture data in an inconsistent state, leading to errors or unusable backups. For instance, if a file is being updated during the backup, the backup may only capture parts of the file, resulting in corruption. To prevent this, many backup systems use techniques like snapshots or file system freezing, which temporarily pause write operations to ensure a consistent state. Additionally, scheduling backups during periods of low activity and using backup software that handles open files properly can help minimize the risk of data corruption.


Snapshot

A snapshot is a point-in-time image of a system or data set, capturing its state at a specific moment. Unlike backups, which create a complete copy of the data, snapshots only record the state of the data and the differences or changes made since the last snapshot. This makes them an efficient tool for quickly reverting to a previous state, testing, or short-term recovery.

When the first snapshot is taken, it captures the state of all data at that moment, serving as the baseline. After this initial snapshot, subsequent snapshots only record changes made to the data since the previous snapshot. These changes can include modifications, deletions, or additions.

Because snapshots only store changes, they are much more storage-efficient than full backups. They use a technique called "copy-on-write" or "redirect-on-write." When data changes, the system first copies the original data to the snapshot storage and then writes the new data to the original location.

Snapshots depend on the original data, meaning if the underlying data storage fails, snapshots alone cannot recover the lost data because they are not full copies. Snapshots are not designed for long-term storage or disaster recovery and are best used in addition to traditional backups to ensure data protection. As time goes on, saving lots of changes can fill up storage space, so it's important to sometimes clean up old snapshots to keep storage use in check.

In dynamic environments, scheduling snapshots frequently—such as hourly or several times a day—ensures up-to-date recovery points. Implementing a retention policy to manage the number of snapshots retained and regularly deleting old snapshots helps free up storage space and reduce complexity.

Snapshots are widely used in various scenarios. In Hypervisors they capture the state of virtual machines, enabling quick rollback and testing. Databases use snapshots to capture consistent states for replication and testing without interrupting operations. Developers use snapshots to test new features or configurations, allowing them to revert to a known good state if something goes wrong. Snapshots are also useful for system updates and patching, as they provide a quick way to revert if an update causes issues.




back