LogoLogo
5.0.X
5.0.X
  • About Trilio for Kubernetes
    • Welcome to Trilio For Kubernetes
    • Version 5.0.X Release Highlights
    • Compatibility Matrix
    • Marketplace Support
    • Features
    • Use Cases
  • Getting Started
    • Getting Started with Trilio on Red Hat OpenShift (OCP)
    • Getting Started with Trilio for Upstream Kubernetes (K8S)
    • Getting Started with Trilio for AWS Elastic Kubernetes Service (EKS)
    • Getting Started with Trilio on Google Kubernetes Engine (GKE)
    • Getting Started with Trilio on VMware Tanzu Kubernetes Grid (TKG)
    • More Trilio Supported Kubernetes Distributions
      • General Installation Prerequisites
      • Rancher Deployments
      • Azure Cloud AKS
      • Digital Ocean Cloud
      • Mirantis Kubernetes Engine
      • IBM Cloud
    • Licensing
    • Using Trilio
      • Overview
      • Post-Install Configuration
      • Management Console
        • About the UI
        • Navigating the UI
          • UI Login
          • Cluster Management (Home)
          • Backup & Recovery
            • Namespaces
              • Namespaces - Actions
              • Namespaces - Bulk Actions
            • Applications
              • Applications - Actions
              • Applications - Bulk Actions
            • Virtual Machines
              • Virtual Machine -Actions
              • Virtual Machine - Bulk Actions
            • Backup Plans
              • Create Backup Plans
              • Backup Plans - Actions
            • Targets
              • Create New Target
              • Targets - Actions
            • Hooks
              • Create Hook
              • Hooks - Actions
            • Policies
              • Create Policies
              • Policies - Actions
          • Monitoring
          • Guided Tours
        • UI How-to Guides
          • Multi-Cluster Management
          • Creating Backups
            • Pause Schedule Backups and Snapshots
            • Cancel InProgress Backups
            • Cleanup Failed Backups
          • Restoring Backups & Snapshots
            • Cross-Cluster Restores
            • Namespace & application scoped
            • Cluster scoped
          • Disaster Recovery Plan
          • Continuous Restore
      • Command-Line Interface
        • YAML Examples
        • Trilio Helm Operator Values
    • Upgrade
    • Air-Gapped Installations
    • Uninstall
  • Reference Guides
    • T4K Pod/Job Capabilities
      • Resource Quotas
    • Trilio Operator API Specifications
    • Custom Resource Definition - Application
  • Advanced Configuration
    • AWS S3 Target Permissions
    • Management Console
      • KubeConfig Authenticaton
      • Authentication Methods Via Dex
      • UI Authentication
      • RBAC Authentication
      • Configuring the UI
    • Resource Request Requirements
      • Fine Tuning Resource Requests and Limits
    • Observability
      • Observability of Trilio with Prometheus and Grafana
      • Exported Prometheus Metrics
      • Observability of Trilio with Openshift Monitoring
      • T4K Integration with Observability Stack
    • Modifying Default T4K Configuration
  • T4K Concepts
    • Supported Application Types
    • Support for Helm Releases
    • Support for OpenShift Operators
    • T4K Components
    • Backup and Restore Details
      • Immutable Backups
      • Application Centric Backups
    • Retention Process
      • Retention Use Case
    • Continuous Restore
      • Architecture and Concepts
  • Performance
    • S3 as Backup Target
      • T4K S3 Fuse Plugin performance
    • Measuring Backup Performance
  • Ecosystem
    • T4K Integration with Slack using BotKube
    • Monitoring T4K Logs using ELK Stack
    • Rancher Navigation Links for Trilio Management Console
    • Optimize T4K Backups with StormForge
    • T4K GitHub Runner
    • AWS RDS snapshots using T4K hooks
    • Deploying Trilio For Kubernetes with Openshift ACM Policies
  • Krew Plugins
    • T4K QuickStart Plugin
    • Trilio for Kubernetes Preflight Checks Plugin
    • T4K Log Collector Plugin
    • T4K Cleanup Plugin
  • Support
    • Troubleshooting Guide
    • Known Issues and Workarounds
    • Contacting Support
  • Appendix
    • Ignored Resources
    • OpenSource Software Disclosure
    • CSI Drivers
      • Installing VolumeSnapshot CRDs
      • Install AWS EBS CSI Driver
    • T4K Product Quickview
    • OpenShift OperatorHub Custom CatalogSource
      • Custom CatalogSource in a restricted environment
    • Configure OVH Object Storage as a Target
    • Connect T4K UI hosted with HTTPS to another cluster hosted with HTTP or vice versa
    • Fetch DigitalOcean Kubernetes Cluster kubeconfig for T4K UI Authentication
    • Force Update T4K Operator in Rancher Marketplace
    • Backup and Restore Virtual Machines running on OpenShift
    • T4K For Volumes with Generic Storage
    • T4K Best Practices
Powered by GitBook
On this page
  • Backup Details
  • High-Level backup process
  • Container Storage Interface (CSI)
  • Applications Backup - Metadata and Data
  • Backup Image Format
  • Restore Details
  • High-level Restore Process
  • Restore Process - Animation
  • Restore Operation
  • Container Images Backup and Restore

Was this helpful?

  1. T4K Concepts

Backup and Restore Details

Details and specifics on how Trilio for Kubernetes handles backup and restore processes are discussed in this section.

Backup Details

The following sections provide details about the overall backup process and metadata and data object handling.

High-Level backup process

  1. Backup Controller

    1. Reconciles on Backup CRD

    2. Spawns Metamover job

      1. Identifies data components (persistent volumes) to backup

      2. Snapshots metadata

      3. Uploads metadata to target

      4. Uploads containers images to the backup target

    3. Execute application hooks to quiesce the application. We have the option to run hooks in parallel or sequential mode.

    4. Once pre-hooks are executed, data snapshots are triggered in parallel, and data uploads are run in parallel. For data consistency, we only have hooks as an option.

    5. Creates PV(s) from snapshot(s)

    6. Spawns Datamover pod(s)

      1. PV attached to Datamover pod

      2. Converts PV data to a QCOW2 image

      3. Calculate the delta between backups

      4. Uploads delta to target

      5. PV detached and deleted

Container Storage Interface (CSI)

Trilio relies on CSI snapshot functionality to capture a point-in-time copy of the volume data. CSI snapshots generate storage back-end volume snapshots. These snapshots are internal to a storage back-end and cannot be accessed from the Kubernetes cluster. It needs a volume to construct to read and write from CSI snapshots. CSI supports volumes from the snapshot functionality to create volume from a snapshot, and Trilio converts the data from a snapshot volume to a QCOW2 image.

Applications Backup - Metadata and Data

Trilio's unit of backup is one or more Kubernetes applications. A Trilio backup job can either be Helm release, Operator instance, label-based selectors, or any combination. The Trilio backup process parses each application's metadata and discovers the persistent volumes defined for each application. Application metadata backup is a straightforward process that involves copying application YAML files to a backup media. However, persistent volumes require special handling for the following reasons:

  1. The applications actively access persistent volumes, and data is continuously changing.

  2. Persistent volumes can be sparsely written. A 1TB volume may only have 10GB of application data.

  3. Persistent volumes can be large, and changes between two backups can be very small compared to the size of PV.

Any backup solution must handle data backup from persistent volumes very efficiently without impacting the performance and scale of Kubernetes clusters. Trilio's approach has been proven in other cloud environments, including OpenStack and Red Hat Virtualization (RHV). It includes leveraging the CSI Snapshot feature to capture point-in-time copies of data and then uses a QCOW2 image format to store backup images. The following diagram describes Trilio's backup processes in detail.

Backup Image Format

Trilio backup images are QCOW2 images. QCOW2 images have the following properties that make them ideal for storing backup data of persistent volumes.

  1. QCOW2 images are sparsely friendly. Even if the volume size is 1TB and the actual data is 10G, the backup image of the persistent volume is only 10GB.

  2. QCOW2 images can be linked together. The bottom image is called a "base image," and all other images are called "overlay files." The latest data is usually positioned on the top level of the overlay file. Overlay files usually represent changed data. However, each overlay file can be accessed as a full volume with the data.

qemu-img is a Linux tool to manage QCOW2 images. Trilio uses a modified qemu-img to generate QCOW2 images. The full backup of the QCOW2 image is the base image. Subsequent backups are incremental and overlay files each point to its previous backup.

Restore Details

High-level Restore Process

  1. Restore Controller

    1. Reconciles on Restore CRD

    2. Validates, if restore operation can be performed

    3. Creates PVs

    4. Spawns Data Mover job

      1. Converts QCOW2 to PV data (directly from a backup image, no staging)

    5. Spawns meta processor job

      1. Restores metadata from backup images

Restore Process - Animation

Restore Operation

Trilio's restore process involves recreating the application artifacts from the backup images. These artifacts include PODs, PVs, Config Maps, secrets, and others. Once the application is restored, Trilio spawns data mover Pods to copy data from the backup media to restored application PVs.

Each QCOW2 image, either an overlay file or a base image, is a fully formed image. Even if your overlay file only contains delta changes at the time of backup, theqemu-img convert command traverses the backup chain and "hydrates" the entire volume contents to PV. It does not require any staging area. The data goes directly from backup media to PV.

Trilio provides a plethora of flags to control/mutate the objects restored as part of the restore plan.

Container Images Backup and Restore

For any application in Kubernetes, the container images are an essential building block on which the entire application comes into a running state. Kubernetes pulls those images from a registry for the containers to use. Starting from 2.10.x, we added support for the backup and restoration of the container images, which addresses the scenario where an image is either deprecated or deleted from the registry. The image backup and restore feature enables us to have self-reliant backups that T4K can restore in any environment without depending upon the registries of the backup.

InnerWorkings of Image Backup and Restore

This feature introduces backing up the application's container images and storing those with the backup data. T4K stores these images on target as QCOW2 images. Users can restore the images if any images get deleted or the registry becomes inaccessible.

Backup

  1. Image backup is enabled by default for all the backups. Whenever the user triggers the backup, if the application contains the images, all these images will be backed up and stored on the target.

  2. Users can also take incremental backups for the images.

  3. Users can disable the image backup at the backup plan level by enabling the skipImageBackup flag.

Restore

  1. Image restore is not enabled by default. The user has to enable it by providing actionFlags.imageRestore as true.

  2. The user also has to provide the restore registry, which should contain the following input:

  • registry: registry to restore the backup container images

  • repository: repository in which the restored images should go.

  • registryAuthSecret: authentication secret of type kubernetes.io/dockerconfigjson to push the images to the restore registry.

  1. Even when the user has enabled the image restore, the image restore will only happen when the original image is not accessible.

  2. If the user already has the images in the restore registry with the same name but the image is different. We, by default, generate a new tag for the restored image. If the user wants to override the existing image, he can enable restoreFlags.overrideImageIfExist.

  3. All the backup images that are not accessible will be restored to the registry that the user provided in the restore CR.

  4. The restored application will point to the new registry to pull the images.

PreviousT4K ComponentsNextImmutable Backups

Was this helpful?

Back processes
Restore Process Animation
Restore Process