Example runbook for Disaster Recovery using S3
This runbook will demonstrate how to set up Disaster Recovery with Trilio for a given scenario.
The chosen scenario is following an actively used Trilio customer environment.
Scenario
Two OpenStack clouds are available: "OpenStack Cloud at Production Site" and "OpenStack Cloud at DR Site." Both clouds have Trilio installed as an OpenStack service to provide backup functionality for OpenStack workloads. Each OpenStack cloud is configured to use its unique S3 bucket for storing backup images. The contents of the S3 buckets are synchronized using the aws sync command. The syncing process is set up independently from Trilio.

This scenario will cover the Disaster Recovery of a single Workload and a complete Cloud. All processes are done be the Openstack administrator.
Prerequisites for the Disaster Recovery process
This runbook will assume that the following is true:
Both OpenStack clusters have Trilio installed with a valid license.
It's important to note that the mapping of OpenStack cloud domains and projects at the production site to domains and projects of OpenStack cloud at the DR (Disaster Recovery) site is not done automatically by Trilio. This means that domains and projects are not matched based on their names alone.
Additionally, the user carrying out the Disaster Recovery process must have admin access to the cloud at the DR site.
Admin must create the following artifacts at the DR site:
Domains and Projects
Tenant Networks, Routers, Ports, Floating IPs, and DNS Zones
Disaster recovery of a single workload
In this scenario, admin can recover a single workload at DR site. To do this, follow the high-level process outlined below:
Sync the workload directories to the DR site s3 bucket
Ensure the correct mount paths are available.
Reassign the workload.
Restore the workload.
Copy a given workload backup images to S3 bucket at DR site
Identify workload prefix
Assuming that the workload id is ac9cae9b-5e1b-4899-930c-6aa0600a2105, the workload prefix on the S3 bucket will be
Sync the Backup Images
Using AWS S3 sync command, sync workload backup images to DR site.
Ensure Backup Images at DR site
After successfully synchronizing workload backup images to DR site, you can verify the integrity of backup images. Login to any datamover container at DR site and cd to S3 fuse mount directory.
Use qemu-img tool to explore backup images.
Reassign the workload
The metadata for each workload includes a user ID and project ID. These IDs are irrelevant at the DR site and cloud admin must change them to valid user and project IDs.
Discover orphaned workloads at DR site
Orphaned workloads are those in the S3 bucket that don't belong to any projects in the current cloud. The orphaned workloads list must include the newly synced workload.
Assign the workload to new domain/project
The cloud administrator must assign the identified orphaned workloads to their new projects.
The following provides the list of all available projects viewable by the admin user in the target domain.
Trustee Role
Ensure the that new project has the correct trustee role assigned.
Reassign the workload to the target project
Reassign the workload to the target project. Please refer to reassign workloads documentation for additional options.
Verify the workload is available at the desired target_project
After the workload has been assigned to the new project, please verify the workload is managed by the Target Trilio and is assigned to the right project and user.
Restore the workload
The workload can be restored using Horizon following the procedure described here.
This runbook will use CLI for demonstration. The CLI must be executed with the project credentials.
Get list of workload snapshots
Get the list of workload snapshots and identify the snapshot you want to restore
Get Snapshot Details with network details for the desired snapshot
Get Snapshot Details with disk details for the desired Snapshot
Create the json payload to restore a snapshot
The selective restore is using a restore.json file for the CLI command. The restore.json includes all the necessary mappings to the current project.
Run the selective restore
The user who has the backup trustee role can restore the snapshot to DR cloud
Verify the restore
To verify the success of the restore from a Trilio perspective the restore status is checked.
Recovering all production workloads at DR site
The high-level process for disaster recovery in the production cloud includes the following steps:
Ensure that Trilio is configured to use the S3 bucket at the DR site
Replicate the production S3 bucket to the DR site S3 bucket
Reassign workloads to domains/projects at the DR site
Sync DR site S3 bucket with Production S3 bucket
Ensure backup images integrity
Trilio backups are qcow2 files and can be inspected using qemu-img tool. On one of the datamover containers at DR site, cd to s3 fuse mount and navigate to one of the workloads snapshots directory and perform the following operation on a VM disk.
Reassign the workloads to DR cloud
Trilio workloads have clear ownership. When a workload is moved to a different cloud it is necessary to change the ownership. The ownership can only be changed by OpenStack administrators.
List all orphaned workloads on the S3 fuse mount
An orphaned workload is one on the S3 bucket that is not assigned to any existing project in the cloud.
List available projects in a domain
The orphaned workloads need to be assigned to their new projects. The following provides the list of all available projects in a given domain.
Make sure users in the project has the backup trustee role assigned.
Reassign the workload to the target project
Reassign the workload to the target project.
Verify the workload is available in the project
After the workload has been assigned to the new project verify the workload is assigned to the right project and user.
Restore the workload
The reassigned workload can be restored using Horizon following the procedure described here.
We will use CLI in this runbook.
Get the list of snapshots of a workload
List all Snapshots of the workload
Get Snapshot Details with network details for the desired snapshot
Get Snapshot Details with disk details for the desired Snapshot
Prepare the selective restore by creating the restore.json file
The selective restore is using a restore.json file for the CLI command. The restore.json file captures all the details regarding restore operation include mapping of VMs to available zones, mapping volumes types to existing volume types and network mappings.
Run the selective restore
To do the actual restore use the following command:
Verify the restore
To verify the success of the restore from a Trilio perspective the restore status is checked.
Last updated
Was this helpful?
