Backup Retention Process

This section describes Trilio retention process for backup images.

Deprecated Documentation

This document is deprecated and no longer supported. For accurate, up-to-date information, please refer to the documentation for the latest version of Trilio.

# Backup Retention Process

A Kubernetes application may have multiple PVs, but the Trilio retention policy works at PV level. In this document, we will discuss Trilio's implementation of retention policy. Before we dive into the internals of a retention processes, there are a few things to keep in mind about Trilio backup images:

  1. All backup images are QCOW2 images. Full backups are base images and incremental backups are overlay file. Overlay files have backing file reference to previous backups. The number of overlay files depends on number of incremental backups with the latest overlay file represents the latest backup file.

  2. Trilio supports forever incrementals. Incremental backups are efficient from both network bandwidth usage perspective and data storage perspective. Once a full backup is taken, user does not need to take another full backup, improving overall backup process efficiency.

  3. A Synthetic full backup is the process of creating full backup image at the backup target by combining one or more backup images. This feature avoids the need to take full backups. There by, improving the backup process performance. Trilio supports synthetic full backups.

PV Retention Policy

The following section describes the retention policy per PV. Let's discuss the retention policy in following scenarios.

Backups to Retain:3, Forever Incremental

Let's assume that the number of snapshots to retain is three (3) and the backup policy is set to "forever incremental". Four days backup looks as follows:

Day 1:

Day 2:

Day 3:

Day 4:

Backups to retain: Three (3), Full backups after every two backups

Day 1:

Day 2:

Day 3:

Day 4:

Day 5:

In the last two scenarios, the retention policy is implemented on a PV basis. For a complex application that has many more PVs, we may get into some interesting scenarios.

Whether to perform a full backup or incremental backup is first determined by the backup policy that is chosen at the time of backup job creation. The first backup always full and the subsequent backup type depends on the backup policy. If a user chooses a full backup after every three backups, the backup types look like:

F ← I ← I F ← I ← I F ← I ← I

When F is for Full and I is for Incremental.

However if a new PV is added to an application in between two backups, the backup for the newly added PV must be full backup and incremental for all existing PVs. The retention policy for each PV still follows above algorithm.

Whether to perform a full backup or incremental backup also depends on whether a CSI snapshot exists for the given PV. For newly added PVs, Trilio creates first CSI snapshot to ensure a full backup. The above check also works if a user deletes the Trilio generated CSI snapshot for a given PV. In this case the next backup for the PV must be full backup.

Retention Policy Pseudo Code

Backup in this discussion is the application backup that Trilio takes. A PV snapshot is what CSI accomplishes. We also assume that backup the CRD will have following fields. In actuality it may have many more, but the retention algorithm relies on these fields.

TrilioBackup:
  name:  # name of the snapshot
  size: # size of the snapshot in bytes. This includes all PVs backup size. Editable by Trilio
  backup_type: # full or incremental. Only editable by Trilio
  pvs:  # list of PVs
    pv1: # PV1
      type: # type of backup, full or incremental. Only editable by Trilio
      size: # size of backup. Only editable by Trilio
      backup_location: # location of the backup file
      pv_snapshot: #csi snapshot of the PV that correspond to this backup

Assuming that the latest backup id is day4 and number of backups to retain is three(3): iterate through backups and identify the list of backups that are more than three (3). backups in an err state are not counted into the backups to retain. The oldest backup to retain is backup_to_retain. Any backups older than backup_to_retain should be merged with backup_to_retain. Let this list be backups_to_merge in time sorted order. Backups in the list are merged with the top of the list.

Assuming that there are five (5) backups and backup retention policy is set to three(3). The following diagram represents backup_to_retain and backups_to_merge.

delete_backup = backup_to_retain is full

backups_to_commit = [backup_to_retain]

for backup in backups_to_merge:
    if delete_backup:
       delete backup
     else:
       if backup is full:
          delete_backup = True
          backups_to_commit.append(backup)

for backup in backups_to_commit:
    for pv in backup:
        qemu-img commit pv_disk_image

for pv in backup_to_retain:
    for backup in backups_to_commit:
        for pv1 in backup:
            if pv1.id is not pv.id:
                continue
            if pv1 is full backup:
                rename pv1 disk image to pv disk image
                break

mark backup_to_retain as full backup
update the backup size to full backup size

for pv in backup_to_retain:
    mark pv backup image as full
    update pv size to full backup image size

for backup in snaps_to_commit:
    delete backup

Delete a Backup

Deleting backups can be bit tricky. When a backup is deleted, Trilio generally does not try to delete corresponding images in the backup media. When the retention policy is engaged, the retention algorithm will consolidate the backup images.

Assuming here is the scenario:

The above application has three PVs. The PVs are added at different intervals of time and you can see PV1 has four (4) backups and PV2 has three (3) backups and PV3 has two (2) backups.

Now let's say a user decides to delete the Day1 backup. We will mark the corresponding Trilio backup object as deleted but will not delete the underlying backup image. Deleting the day1 backup of PV1 will break the chain and day2 backup becomes unusable. Instead we will leave the chain unchanged during backup delete operation. When the next retention algorithm is executed and for example the retention policy is set to four (4), the retention algorithm commits the day2 overlay file to the day1 and then renames day1 backup to day2.

If a user chooses to delete the day4 snapshot, then Trilio deletes backup images from the backup media.

Errored Backups

If a backup of the application fails, then that backup images should not appear in any of the PVs backup images chain. The backup of an application may fail for various reasons and at various points of the backup process. In the above example, one of the PV backup failed to upload the data to the backup media. In that case none of the PVs backup chains should contain any backup images from this backup. Furthermore, the CSI snapshot of the PV of the last known good backup should be preserved and any CSI snapshots that were created for the current backup job should be "cleaned up" (eliminated). When the next backup is scheduled, incremental backups are generated with respect to the last known good backup job of the PV CSI snapshots.

Last updated