Backup and Restore Details

Details and specifics on how TrilioVault for Kubernetes handles backup and restore processes are discussed in this section.

Backup Details

The following few sections provide details around overall backup process and handling of metadata and data objects.

High-Level backup process

  1. Backup Controller

    1. Reconciles on Backup CRD

    2. Spawns Metamover job

      1. Identifies data components (persistent volumes) to backup

      2. Snapshots metadata

      3. Uploads metadata to target

    3. Creates Data snapshot of each Persistent Volume

    4. Creates PV from snapshot

    5. Spawns Datamover pod

      1. PV attached to Datamover pod

      2. Converts PV data to QCOW2 image

      3. Calculate delta between backups

      4. Uploads delta to target

      5. PV detached and deleted

Full Backup Process - Animation

Full Backup Animation

Incremental Backup Process - Animation

Incremental Backup Animation

Container Storage Interface (CSI)

TrilioVault relies on CSI snapshot functionality to capture a point-in-time copy of the volume data. CSI snapshots generate storage back-end volume snapshots. These snapshots are internal to a storage back-end and cannot be accessed from Kubernetes cluster. It needs a volume construct to read and write from CSI snapshots. CSI supports volumes from the snapshot functionality to create volume from a snapshot and TrilioVault converts the data from a snapshot volume to QCOW2 image.

Applications Backup - Metadata and Data

TrilioVault's unit of backup is one or more Kubernetes applications. A TrilioVault backup job can either be Helm release, Operator instance or Label- based selectors or any combination of these. The TrilioVault backup process parses each application's metadata and discovers the persistent volumes defined for each application. Application metadata backup is a straight forward process, which involves copying application YAML files to a backup media. However, persistent volumes require special handling for the following reasons:

  1. Persistent volumes are actively accessed by the applications and data is continuously changing

  2. Persistent volumes can be sparsely written. A 1TB volume may only have 10GB application data

  3. Persistent volumes can be large and changes between two backups can be very small in comparison to the size of PV

Any backup solution must handle data backup from persistent volumes very efficiently without impacting the performance and scale of Kubernetes clusters. Trilio's approach has been proven in other cloud environments including OpenStack and Red Hat Virtualization (RHV). It includes leveraging the CSI Snapshot feature to capture get point-in-time copies of data and then uses a QCOW2 image format to store backup images. The following diagram describes TrilioVault's backup processes in details.

‚Äč

Back processes

Backup Image Format

TrilioVault backup images are QCOW2 images. QCOW2 images have the following properties that makes them ideal for storing backup data of persistent volumes.

  1. QCOW2 images are sparse friendly. Even if the volume size is 1TB and actual data is 10G, backup image of the persistent volume is only 10GB

  2. QCOW2 images can be linked together. The bottom image is called a "base image" and all other images are called "overlay files". The latest data is usually positioned on the top level of the overlay file. Overlay files usually represent changed data. However, each overlay file can be accessed as a full volume with the data at that moment.

qemu-img is a Linux tool to manage QCOW2 images. TrilioVault uses a modified qemu-img to generate QCOW2 images. The full backup of QCOW2 image is the base image. Subsequent backups are incrementals and are overlay files each points to its previous backup.

Restore Process

High-level Restore Process

  1. Restore Controller

    1. Reconciles on Restore CRD

    2. Validates if restore can be preformed

    3. Creates PVs

    4. Spawns Data Mover job

      1. Converts QCOW2 to PV data (directly from backup image, no staging)

    5. Spawns meta processor job

      1. Restores metadata from backup images

Restore Process - Animation

Restore Process Animation

Restore Operation

The TrilioVault's restore process involves recreating the application artifacts from the backup images. These artifacts include PODs, PVs, Config Maps, secrets and others. Once the application is restored, TrilioVault spawns data mover Pods to copy data from the backup media to restored application PVs.

Each QCOW2 image, overlay file or base image, are fully formed images. Even if your overlay file only contain delta changes at the time of backup, theqemu-img convert command traverses the backup chain and "hydrates" the entire volume contents to PV. It does not require any staging area. The data goes directly from backup media to PV.

Restore Process