Example runbook for Disaster Recovery using NFS

This runbook will demonstrate how to set up Disaster Recovery with Trilio for a given scenario.

The chosen scenario is following an actively used Trilio customer environment.

Scenario

There are two Openstack clouds available "Openstack Cloud A" and Openstack Cloud B". "Openstack Cloud B" is the Disaster Recovery restore point of "Openstack Cloud A" and vice versa. Both clouds have an independent Trilio installation integrated. These Trilio installations are writing their Backups to NFS targets. "Trilio A" is writing to "NFS A1" and "Trilio B" is writing to "NFS B1". The NFS Volumes used are getting synced against another NFS Volume on the other side. "NFS A1" is syncing with "NFS B2" and "NFS B1" is syncing with "NFS A2". The syncing process is set up independently from Trilio and will always favor the newer dataset.

This scenario will cover the Disaster Recovery of a single Workload and a complete Cloud. All processes are done be the Openstack administrator.

Prerequisites for the Disaster Recovery process

This runbook will assume that the following is true:

  • "Openstack Cloud A" and "Openstack Cloud B" both have an active Trilio installation with a valid license

  • "Openstack Cloud A" and "Openstack Cloud B" have free resources to host additional VMs

  • "Openstack Cloud A" and "Openstack Cloud B" have Tenants/Projects available that are the designated restore points for Tenant/Projects of the other side

  • Access to a user with the admin role permissions on domain level

  • One of the Openstack clouds is down/lost

For ease of writing will this runbook assume from here on, that "Openstack Cloud A" is down and the Workloads are getting restored into "Openstack Cloud B".

In the case of the usage of shared Tenant networks, beyond the floating IP, the following additional requirement is needed: All Tenant Networks, Routers, Ports, Floating IPs, and DNS Zones are created

Disaster Recovery of a single Workload

A single Workload can do a Disaster Recovery in this Scenario, while both Clouds are still active. To do so the following high-level process needs to be followed:

  1. Copy the Workload directories to the configured NFS Volume

  2. Make the right Mount-Paths available

  3. Reassign the Workload

  4. Restore the Workload

  5. Clean up

Copy the Workload directories to the configured NFS Volume

This process only shows how to get a Workload from "Openstack Cloud A" to "Openstack Cloud B". The vice versa process is similar.

As only a single Workload is to be recovered it is more efficient to copy the data of that single Workload over to the "NFS B1" Volume, which is used by "Trilio B".

Mount "NFS B2" Volume to a Trilio VM

It is recommended to use the Trilio VM as a connector between both NFS Volumes, as the nova user is available on the Trilio VM.

# mount <NFS B2-IP/NFS B2-FQDN>:/<VOL-Path> /mnt

Identify the Workload on the "NFS B2" Volume

Trilio Workloads are identified by their ID und which they are stored on the Backup Target. See below example:

workload_ac9cae9b-5e1b-4899-930c-6aa0600a2105

In the case that the Workload ID is not known can available Metadata inside the Workload directories be used to identify the correct Workload.

/…/workload_<id>/workload_db <<< Contains User ID and Project ID of Workload owner
/…/workload_<id>/workload_vms_db <<< Contains VM IDs and VM Names of all VMs actively protected be the Workload

Copy the Workload

The identified workload needs to be copied with all subdirectories and files. Afterward, it is necessary to adjust the ownership to nova:nova with the right permissions.

# cp /mnt/workload_ac9cae9b-5e1b-4899-930c-6aa0600a2105 /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0=/workload_ac9cae9b-5e1b-4899-930c-6aa0600a2105
# chown -R nova:nova /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0=/workload_ac9cae9b-5e1b-4899-930c-6aa0600a2105
# chmod -R 644 /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0=/workload_ac9cae9b-5e1b-4899-930c-6aa0600a2105

Make the Mount-Paths available

Trilio backups are using qcow2 backing files, which make every incremental backup a full synthetic backup. These backing files can be made visible using the qemu-img tool.

#qemu-img info bd57ec9b-c4ac-4a37-a4fd-5c9aa002c778
image: bd57ec9b-c4ac-4a37-a4fd-5c9aa002c778
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 516K
cluster_size: 65536

backing file: /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0=/workload_ac9cae9b-5e1b-4899-930c-6aa0600a2105/snapshot_1415095d-c047-400b-8b05-c88e57011263/vm_id_38b620f1-24ae-41d7-b0ab-85ffc2d7958b/vm_res_id_d4ab3431-5ce3-4a8f-a90b-07606e2ffa33_vda/7c39eb6a-6e42-418e-8690-b6368ecaa7bb
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

The MTAuMTAuMi4yMDovdXBzdHJlYW0= part of the backing file path is the base64 hash value, which will be calculated upon the configuration of a Trilio installation for each provided NFS-Share.

This hash value is calculated based on the provided NFS-Share path: <NFS_IP>/<path> If even one character in the NFS-Share path is different between the provided NFS-Share paths a completely different hash value is generated.

Workloads, that have been moved between NFS-Shares, require that their incremental backups can follow the same path as on their original Source Cloud. To achieve this it is necessary to create the mount path on all compute nodes of the Target Cloud.

Afterwards a mount bind is used to make the workloads data accessible over the old and the new mount path. The following example shows the process of how to successfully identify the necessary mount points and create the mount bind.

Identify the base64 hash values

The used hash values can be calculated using the base64 tool in any Linux distribution.

# echo -n 10.10.2.20:/NFS_A1 | base64
MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl

# echo -n 10.20.3.22:/NFS_B2 | base64
MTAuMjAuMy4yMjovdXBzdHJlYW1fdGFyZ2V0

Create and bind the paths

Based on the identified base64 hash values the following paths are required on each Compute node.

/var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl

and

/var/triliovault-mounts/MTAuMjAuMy4yMjovdXBzdHJlYW1fdGFyZ2V0

In the scenario of this runbook is the workload coming from the NFS_A1 NFS-Share, which means the mount path of that NFS-Share needs to be created and bound to the Target Cloud.

#mkdir /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl
#mount --bind 
/var/triliovault-mounts/MTAuMjAuMy4yMjovdXBzdHJlYW1fdGFyZ2V0/ /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl

To keep the desired mount past a reboot it is recommended to edit the fstab of all compute nodes accordingly.

#vi /etc/fstab
/var/triliovault-mounts/MTAuMjAuMy4yMjovdXBzdHJlYW1fdGFyZ2V0/ 		/ var/triliovault-mounts/ MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl	none		bind 		0 0

Reassign the workload

Trilio workloads have clear ownership. When a workload is moved to a different cloud it is necessary to change the ownership. The ownership can only be changed by Openstack administrators.

Add admin-user to required domains and projects

To fulfill the required tasks an admin role user is used. This user will be used until the workload has been restored. Therefore, it is necessary to provide this user access to the desired Target Project on the Target Cloud.

# source {customer admin rc file}  
# openstack role add Admin --user <my_admin_user> --user-domain <admin_domain> --domain <target_domain>  
# openstack role add Admin --user <my_admin_user> --user-domain <admin_domain> --project <target_project> --project-domain <target_domain>  
# openstack role add <Backup Trustee Role> --user <my_admin_user> --user-domain <admin_domain> --project <destination_project> --project-domain <target_domain>

Discover orphaned Workloads from NFS-Storage of Target Cloud

Each Trilio installation maintains a database of workloads that are known to the Trilio installation. Workloads that are not maintained by a specific Trilio installation, are from the perspective of that installation, orphaned workloads. An orphaned workload is a workload accessible on the NFS-Share, that is not assigned to any existing project in the Cloud the Trilio installation is protecting.

# workloadmgr workload-get-orphaned-workloads-list --migrate_cloud True    
+------------+--------------------------------------+----------------------------------+----------------------------------+  
|     Name   |                  ID                  |            Project ID            |  User ID                         |  
+------------+--------------------------------------+----------------------------------+----------------------------------+  
| Workload_1 | 6639525d-736a-40c5-8133-5caaddaaa8e9 | 4224d3acfd394cc08228cc8072861a35 |  329880dedb4cd357579a3279835f392 |  
| Workload_2 | 904e72f7-27bb-4235-9b31-13a636eb9c95 | 637a9ce3fd0d404cabf1a776696c9c04 |  329880dedb4cd357579a3279835f392 |  
+------------+--------------------------------------+----------------------------------+----------------------------------+

List available projects on Target Cloud in the Target Domain

The identified orphaned workloads need to be assigned to their new projects. The following provides the list of all available projects viewable by the used admin-user in the target_domain.

# openstack project list --domain <target_domain>  
+----------------------------------+----------+  
| ID                               | Name     |  
+----------------------------------+----------+  
| 01fca51462a44bfa821130dce9baac1a | project1 |  
| 33b4db1099ff4a65a4c1f69a14f932ee | project2 |  
| 9139e694eb984a4a979b5ae8feb955af | project3 |  
+----------------------------------+----------+ 

List available users on the Target Cloud in the Target Project that have the right backup trustee role

To allow project owners to work with the workloads as well will they get assigned to a user with the backup trustee role that is existing in the target project.

# openstack role assignment list --project <target_project> --project-domain <target_domain> --role <backup_trustee_role>
+----------------------------------+----------------------------------+-------+----------------------------------+--------+-----------+
| Role                             | User                             | Group | Project                          | Domain | Inherited |
+----------------------------------+----------------------------------+-------+----------------------------------+--------+-----------+
| 9fe2ff9ee4384b1894a90878d3e92bab | 72e65c264a694272928f5d84b73fe9ce |       | 8e16700ae3614da4ba80a4e57d60cdb9 |        | False     |
| 9fe2ff9ee4384b1894a90878d3e92bab | d5fbd79f4e834f51bfec08be6d3b2ff2 |       | 8e16700ae3614da4ba80a4e57d60cdb9 |        | False     |
| 9fe2ff9ee4384b1894a90878d3e92bab | f5b1d071816742fba6287d2c8ffcd6c4 |       | 8e16700ae3614da4ba80a4e57d60cdb9 |        | False     |
+----------------------------------+----------------------------------+-------+----------------------------------+--------+-----------+

Reassign the workload to the target project

Now that all informations have been gathered the workload can be reassigned to the target project.

# workloadmgr workload-reassign-workloads --new_tenant_id {target_project_id} --user_id {target_user_id} --workload_ids {workload_id} --migrate_cloud True    
+-----------+--------------------------------------+----------------------------------+----------------------------------+  
|    Name   |                  ID                  |            Project ID            |  User ID                         |  
+-----------+--------------------------------------+----------------------------------+----------------------------------+  
| project1  | 904e72f7-27bb-4235-9b31-13a636eb9c95 | 4f2a91274ce9491481db795dcb10b04f | 3e05cac47338425d827193ba374749cc |  
+-----------+--------------------------------------+----------------------------------+----------------------------------+ 

Verify the workload is available at the desired target_project

After the workload has been assigned to the new project it is recommended to verify the workload is managed by the Target Trilio and is assigned to the right project and user.

# workloadmgr workload-show ac9cae9b-5e1b-4899-930c-6aa0600a2105
+-------------------+------------------------------------------------------------------------------------------------------+
| Property          | Value                                                                                                |
+-------------------+------------------------------------------------------------------------------------------------------+
| availability_zone | nova                                                                                                 |
| created_at        | 2019-04-18T02:19:39.000000                                                                           |
| description       | Test Linux VMs                                                                                       |
| error_msg         | None                                                                                                 |
| id                | ac9cae9b-5e1b-4899-930c-6aa0600a2105                                                                 |
| instances         | [{"id": "38b620f1-24ae-41d7-b0ab-85ffc2d7958b", "name": "Test-Linux-1"}, {"id":                      |
|                   | "3fd869b2-16bd-4423-b389-18d19d37c8e0", "name": "Test-Linux-2"}]                                     |
| interval          | None                                                                                                 |
| jobschedule       | True                                                                                                 |
| name              | Test Linux                                                                                           |
| project_id        | 2fc4e2180c2745629753305591aeb93b                                                                     |
| scheduler_trust   | None                                                                                                 |
| status            | available                                                                                            |
| storage_usage     | {"usage": 60555264, "full": {"usage": 44695552, "snap_count": 1}, "incremental": {"usage": 15859712, |
|                   | "snap_count": 13}}                                                                                   |
| updated_at        | 2019-11-15T02:32:43.000000                                                                           |
| user_id           | 72e65c264a694272928f5d84b73fe9ce                                                                     |
| workload_type_id  | f82ce76f-17fe-438b-aa37-7a023058e50d                                                                 |
+-------------------+------------------------------------------------------------------------------------------------------+

Restore the workload

The reassigned workload can be restored using Horizon following the procedure described here.

This runbook will continue on the CLI only path.

Prepare the selective restore by getting the snapshot information

To be able to do the necessary selective restore a few pieces of information about the snapshot to be restored are required. The following process will provide all necessary information.

List all Snapshots of the workload to restore to identify the snapshot to restore

# workloadmgr snapshot-list --workload_id ac9cae9b-5e1b-4899-930c-6aa0600a2105 --all True

+----------------------------+--------------+--------------------------------------+--------------------------------------+---------------+-----------+-----------+
|         Created At         |     Name     |                  ID                  |             Workload ID              | Snapshot Type |   Status  |    Host   |
+----------------------------+--------------+--------------------------------------+--------------------------------------+---------------+-----------+-----------+
| 2019-11-02T02:30:02.000000 | jobscheduler | f5b8c3fd-c289-487d-9d50-fe27a6561d78 | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |      full     | available | Upstream2 |
| 2019-11-03T02:30:02.000000 | jobscheduler | 7e39e544-537d-4417-853d-11463e7396f9 | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |  incremental  | available | Upstream2 |
| 2019-11-04T02:30:02.000000 | jobscheduler | 0c086f3f-fa5d-425f-b07e-a1adcdcafea9 | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |  incremental  | available | Upstream2 |
+----------------------------+--------------+--------------------------------------+--------------------------------------+---------------+-----------+-----------+

Get Snapshot Details with network details for the desired snapshot

# workloadmgr snapshot-show --output networks 7e39e544-537d-4417-853d-11463e7396f9

+-------------------+--------------------------------------+
| Snapshot property | Value                                |
+-------------------+--------------------------------------+
| description       | None                                 |
| host              | Upstream2                            |
| id                | 7e39e544-537d-4417-853d-11463e7396f9 |
| name              | jobscheduler                         |
| progress_percent  | 100                                  |
| restore_size      | 44040192 Bytes or Approx (42.0MB)    |
| restores_info     |                                      |
| size              | 1310720 Bytes or Approx (1.2MB)      |
| snapshot_type     | incremental                          |
| status            | available                            |
| time_taken        | 154 Seconds                          |
| uploaded_size     | 1310720                              |
| workload_id       | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |
+-------------------+--------------------------------------+

+----------------+---------------------------------------------------------------------------------------------------------------------+
|   Instances    |                                                        Value                                                        |
+----------------+---------------------------------------------------------------------------------------------------------------------+
|     Status     |                                                      available                                                      |
| Security Group | [{u'name': u'Test', u'security_group_type': u'neutron'}, {u'name': u'default', u'security_group_type': u'neutron'}] |
|     Flavor     |                         {u'ephemeral': u'0', u'vcpus': u'1', u'disk': u'1', u'ram': u'512'}                         |
|      Name      |                                                     Test-Linux-1                                                    |
|       ID       |                                         38b620f1-24ae-41d7-b0ab-85ffc2d7958b                                        |
|                |                                                                                                                     |
|     Status     |                                                      available                                                      |
| Security Group | [{u'name': u'Test', u'security_group_type': u'neutron'}, {u'name': u'default', u'security_group_type': u'neutron'}] |
|     Flavor     |                         {u'ephemeral': u'0', u'vcpus': u'1', u'disk': u'1', u'ram': u'512'}                         |
|      Name      |                                                     Test-Linux-2                                                    |
|       ID       |                                         3fd869b2-16bd-4423-b389-18d19d37c8e0                                        |
|                |                                                                                                                     |
+----------------+---------------------------------------------------------------------------------------------------------------------+

+-------------+----------------------------------------------------------------------------------------------------------------------------------------------+
|   Networks  | Value                                                                                                                                        |
+-------------+----------------------------------------------------------------------------------------------------------------------------------------------+
|  ip_address | 172.20.20.20                                                                                                                                 |
|    vm_id    | 38b620f1-24ae-41d7-b0ab-85ffc2d7958b                                                                                                         |
|   network   | {u'subnet': {u'ip_version': 4, u'cidr': u'172.20.20.0/24', u'gateway_ip': u'172.20.20.1', u'id': u'3a756a89-d979-4cda-a7f3-dacad8594e44', 
u'name': u'Trilio Test'}, u'cidr': None, u'id': u'5f0e5d34-569d-42c9-97c2-df944f3924b1', u'name': u'Trilio_Test_Internal', u'network_type': u'neutron'}      |
| mac_address | fa:16:3e:74:58:bb                                                                                                                            |
|             |                                                                                                                                              |
|  ip_address | 172.20.20.13                                                                                                                                 |
|    vm_id    | 3fd869b2-16bd-4423-b389-18d19d37c8e0                                                                                                         |
|   network   | {u'subnet': {u'ip_version': 4, u'cidr': u'172.20.20.0/24', u'gateway_ip': u'172.20.20.1', u'id': u'3a756a89-d979-4cda-a7f3-dacad8594e44',
u'name': u'Trilio Test'}, u'cidr': None, u'id': u'5f0e5d34-569d-42c9-97c2-df944f3924b1', u'name': u'Trilio_Test_Internal', u'network_type': u'neutron'}      |
| mac_address | fa:16:3e:6b:46:ae                                                                                                                            |
+-------------+----------------------------------------------------------------------------------------------------------------------------------------------+

Get Snapshot Details with disk details for the desired Snapshot

[root@upstreamcontroller ~(keystone_admin)]# workloadmgr snapshot-show --output disks 7e39e544-537d-4417-853d-11463e7396f9

+-------------------+--------------------------------------+
| Snapshot property | Value                                |
+-------------------+--------------------------------------+
| description       | None                                 |
| host              | Upstream2                            |
| id                | 7e39e544-537d-4417-853d-11463e7396f9 |
| name              | jobscheduler                         |
| progress_percent  | 100                                  |
| restore_size      | 44040192 Bytes or Approx (42.0MB)    |
| restores_info     |                                      |
| size              | 1310720 Bytes or Approx (1.2MB)      |
| snapshot_type     | incremental                          |
| status            | available                            |
| time_taken        | 154 Seconds                          |
| uploaded_size     | 1310720                              |
| workload_id       | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |
+-------------------+--------------------------------------+

+----------------+---------------------------------------------------------------------------------------------------------------------+
|   Instances    |                                                        Value                                                        |
+----------------+---------------------------------------------------------------------------------------------------------------------+
|     Status     |                                                      available                                                      |
| Security Group | [{u'name': u'Test', u'security_group_type': u'neutron'}, {u'name': u'default', u'security_group_type': u'neutron'}] |
|     Flavor     |                         {u'ephemeral': u'0', u'vcpus': u'1', u'disk': u'1', u'ram': u'512'}                         |
|      Name      |                                                     Test-Linux-1                                                    |
|       ID       |                                         38b620f1-24ae-41d7-b0ab-85ffc2d7958b                                        |
|                |                                                                                                                     |
|     Status     |                                                      available                                                      |
| Security Group | [{u'name': u'Test', u'security_group_type': u'neutron'}, {u'name': u'default', u'security_group_type': u'neutron'}] |
|     Flavor     |                         {u'ephemeral': u'0', u'vcpus': u'1', u'disk': u'1', u'ram': u'512'}                         |
|      Name      |                                                     Test-Linux-2                                                    |
|       ID       |                                         3fd869b2-16bd-4423-b389-18d19d37c8e0                                        |
|                |                                                                                                                     |
+----------------+---------------------------------------------------------------------------------------------------------------------+

+-------------------+--------------------------------------------------+
|       Vdisks      |                      Value                       |
+-------------------+--------------------------------------------------+
| volume_mountpoint |                     /dev/vda                     |
|    restore_size   |                     22020096                     |
|    resource_id    |       ebc2fdd0-3c4d-4548-b92d-0e16734b5d9a       |
|    volume_name    |       0027b140-a427-46cb-9ccf-7895c7624493       |
|    volume_type    |                       None                       |
|       label       |                       None                       |
|    volume_size    |                        1                         |
|     volume_id     |       0027b140-a427-46cb-9ccf-7895c7624493       |
| availability_zone |                       nova                       |
|       vm_id       |       38b620f1-24ae-41d7-b0ab-85ffc2d7958b       |
|      metadata     | {u'readonly': u'False', u'attached_mode': u'rw'} |
|                   |                                                  |
| volume_mountpoint |                     /dev/vda                     |
|    restore_size   |                     22020096                     |
|    resource_id    |       8007ed89-6a86-447e-badb-e49f1e92f57a       |
|    volume_name    |       2a7f9e78-7778-4452-af5b-8e2fa43853bd       |
|    volume_type    |                       None                       |
|       label       |                       None                       |
|    volume_size    |                        1                         |
|     volume_id     |       2a7f9e78-7778-4452-af5b-8e2fa43853bd       |
| availability_zone |                       nova                       |
|       vm_id       |       3fd869b2-16bd-4423-b389-18d19d37c8e0       |
|      metadata     | {u'readonly': u'False', u'attached_mode': u'rw'} |
|                   |                                                  |
+-------------------+--------------------------------------------------+

Prepare the selective restore by creating the restore.json file

The selective restore is using a restore.json file for the CLI command. This restore.json file needs to be adjusted according to the desired restore.

{
   u'description':u'<description of the restore>',
   u'oneclickrestore':False,
   u'restore_type':u'selective',
   u'type':u'openstack',
   u'name':u'<name of the restore>'
   u'openstack':{
      u'instances':[
         {
            u'name':u'<name instance 1>',
            u'availability_zone':u'<AZ instance 1>',
            u'nics':[ #####Leave empty for network topology restore
            ],
            u'vdisks':[
               {
                  u'id':u'<old disk id>',
                  u'new_volume_type':u'<new volume type name>',
                  u'availability_zone':u'<new cinder volume AZ>'
               }
            ],
            u'flavor':{
               u'ram':<RAM in MB>,
               u'ephemeral':<GB of ephemeral disk>,
               u'vcpus':<# vCPUs>,
               u'swap':u'<GB of Swap disk>',
               u'disk':<GB of boot disk>,
               u'id':u'<id of the flavor to use>'
            },
            u'include':<True/False>,
            u'id':u'<old id of the instance>'
         } #####Repeat for each instance in the snapshot
      ],
      u'restore_topology':<True/False>,
      u'networks_mapping':{
         u'networks':[ #####Leave empty for network topology restore
            
         ]
      }
   }
}

Run the selective restore

To do the actual restore use the following command:

# workloadmgr snapshot-selective-restore --filename restore.json {snapshot id}

Verify the restore

To verify the success of the restore from a Trilio perspective the restore status is checked.

[root@upstreamcontroller ~(keystone_admin)]# workloadmgr restore-list --snapshot_id 5928554d-a882-4881-9a5c-90e834c071af

+----------------------------+------------------+--------------------------------------+--------------------------------------+----------+-----------+
|         Created At         |       Name       |                  ID                  |             Snapshot ID              |   Size   |   Status  |
+----------------------------+------------------+--------------------------------------+--------------------------------------+----------+-----------+
| 2019-09-24T12:44:38.000000 | OneClick Restore | 5b4216d0-4bed-460f-8501-1589e7b45e01 | 5928554d-a882-4881-9a5c-90e834c071af | 41126400 | available |
+----------------------------+------------------+--------------------------------------+--------------------------------------+----------+-----------+

[root@upstreamcontroller ~(keystone_admin)]# workloadmgr restore-show 5b4216d0-4bed-460f-8501-1589e7b45e01
+------------------+------------------------------------------------------------------------------------------------------+
| Property         | Value                                                                                                |
+------------------+------------------------------------------------------------------------------------------------------+
| created_at       | 2019-09-24T12:44:38.000000                                                                           |
| description      | -                                                                                                    |
| error_msg        | None                                                                                                 |
| finished_at      | 2019-09-24T12:46:07.000000                                                                           |
| host             | Upstream2                                                                                            |
| id               | 5b4216d0-4bed-460f-8501-1589e7b45e01                                                                 |
| instances        | [{"status": "available", "id": "b8506f04-1b99-4ca8-839b-6f5d2c20d9aa", "name": "temp", "metadata":   |
|                  | {"instance_id": "c014a938-903d-43db-bfbb-ea4998ff1a0f", "production": "1", "config_drive": ""}}]     |
| name             | OneClick Restore                                                                                     |
| progress_msg     | Restore from snapshot is complete                                                                    |
| progress_percent | 100                                                                                                  |
| project_id       | 8e16700ae3614da4ba80a4e57d60cdb9                                                                     |
| restore_options  | {"description": "-", "oneclickrestore": true, "restore_type": "oneclick", "openstack": {"instances": |
|                  | [{"availability_zone": "US-West", "id": "c014a938-903d-43db-bfbb-ea4998ff1a0f", "name": "temp"}]},   |
|                  | "type": "openstack", "name": "OneClick Restore"}                                                     |
| restore_type     | restore                                                                                              |
| size             | 41126400                                                                                             |
| snapshot_id      | 5928554d-a882-4881-9a5c-90e834c071af                                                                 |
| status           | available                                                                                            |
| time_taken       | 89                                                                                                   |
| updated_at       | 2019-09-24T12:44:38.000000                                                                           |
| uploaded_size    | 41126400                                                                                             |
| user_id          | d5fbd79f4e834f51bfec08be6d3b2ff2                                                                     |
| warning_msg      | None                                                                                                 |
| workload_id      | 02b1aca2-c51a-454b-8c0f-99966314165e                                                                 |
+------------------+------------------------------------------------------------------------------------------------------+

Clean up

After the Desaster Recovery Process has been successfully completed it is recommended to bring the TVM installation back into its original state to be ready for the next DR process.

Delete the workload

Delete the workload that got restored.

# workloadmgr workload-delete <workload_id>

Remove the database entry

The Trilio database is following the Openstack standard of not deleting any database entries upon deletion of the cloud object. Any Workload, Snapshot or Restore, which gets deleted, is marked as deleted only.

To allow the Trilio installation to be ready for another disaster recovery it is necessary to completely delete the entries of the Workloads, which have been restored.

Trilio does provide and maintain a script to safely delete workload entries and all connected entities from the Trilio database.

This script can be found here: https://github.com/trilioData/solutions/tree/master/openstack/CleanWlmDatabase

Remove the admin user from the project

After all restores for the target project have been achieved it is recommended to remove the used admin user from the project again.

# source {customer admin rc file}  
# openstack role remove Admin --user <my_admin_user> --user-domain <admin_domain> --domain <target_domain>  
# openstack role remove Admin --user <my_admin_user> --user-domain <admin_domain> --project <target_project> --project-domain <target_domain>  
# openstack role remove <Backup Trustee Role> --user <my_admin_user> --user-domain <admin_domain> --project <destination_project> --project-domain <target_domain>

Disaster Recovery of a complete cloud

This Scenario will cover the Disaster Recovery of a full cloud. It is assumed that the source cloud is down or lost completly. To do the disaster recovery the following high-level process needs to be followed:

  1. Reconfigure the Target Trilio installation

  2. Make the right Mount-Paths available

  3. Reassign the Workload

  4. Restore the Workload

  5. Reconfigure the Target Trilio installation back to the original one

  6. Clean up

Reconfigure the Target Trilio installation

Before the Desaster Recovery Process can start is it necessary to make the backups to be restored available for the Trilio installation. The following steps need to be done to completely reconfigure the Trilio installation.

During the reconfiguration process will all backups of the Target Region be on hold and it is not recommended to create new Backup Jobs until the Desaster Recovery Process has finished and the original Trilio configuration has been restored.

Add NFS B2 to the Trilio Appliance Cluster

To add the NFS-Vol2 to the Trilio Appliance cluster the Trilio can either be fully reconfigured to use both NFS Volumes or it is possible to edit the configuration file and then restart all services. This procedure describes how to edit the conf file and restart the services. This needs to be repeated on every Trilio Appliance.

Edit the workloadmgr.conf

# vi /etc/workloadmgr/workloadmgr.conf

Look for the line defining the NFS mounts

vault_storage_nfs_export = <NFS_B1/NFS_B1-FQDN>:/<VOL-B1-Path>

Add NFS B2 to that as comma-seperated list. Space is not necessary, but can be set.

vault_storage_nfs_export = <NFS-IP/NFS-FQDN>:/<VOL-1-Path>,<NFS-IP/NFS-FQDN>:/<VOL—2-Path>

Write and close the workloadmgr.conf

Restart the wlm-workloads service

# systemctl restart wlm-workloads

Add NFS B2 to the Trilio Datamovers

Trilio is integrating natively into the Openstack deployment tools. When using the Red Hat director or JuJu charms it is recommended to adapt the environment files for these orchestrators and update the Datamovers through them.

To add the NFS B2 to the Trilio Datamovers manually the tvault-contego.conf file needs to be edited and the service restarted.

Edit the tvault-contego.conf

# vi /etc/tvault-contego/tvault-contego.conf

Look for the line defining the NFS mounts

vault_storage_nfs_export = <NFS_B1-IP/NFS_B1-FQDN>:/<VOL-B1-Path>

Add NFS B2 to that as comma-seperated list. Space is not necessary, but can be set.

vault_storage_nfs_export = <NFS_B1-IP/NFS-FQDN>:/<VOL-B1-Path>,<NFS_B2-IP/NFS-FQDN>:/<VOL—B2-Path>

Write and close the tvault-contego.conf

Restart the tvault-contego service

# systemctl restart tvault-contego

Make the Mount-Paths available

Trilio backups are using qcow2 backing files, which make every incremental backup a full synthetic backup. These backing files can be made visible using the qemu-img tool.

#qemu-img info bd57ec9b-c4ac-4a37-a4fd-5c9aa002c778
image: bd57ec9b-c4ac-4a37-a4fd-5c9aa002c778
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 516K
cluster_size: 65536

backing file: /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0=/workload_ac9cae9b-5e1b-4899-930c-6aa0600a2105/snapshot_1415095d-c047-400b-8b05-c88e57011263/vm_id_38b620f1-24ae-41d7-b0ab-85ffc2d7958b/vm_res_id_d4ab3431-5ce3-4a8f-a90b-07606e2ffa33_vda/7c39eb6a-6e42-418e-8690-b6368ecaa7bb
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

The MTAuMTAuMi4yMDovdXBzdHJlYW0= part of the backing file path is the base64 hash value, which will be calculated upon the configuration of a Trilio installation for each provided NFS-Share.

This hash value is calculated based on the provided NFS-Share path: <NFS_IP>/<path> If even one character in the NFS-Share path is different between the provided NFS-Share paths a completely different hash value is generated.

Workloads, that have been moved between NFS-Shares, require that their incremental backups can follow the same path as on their original Source Cloud. To achieve this it is necessary to create the mount path on all compute nodes of the Target Cloud.

Afterwards a mount bind is used to make the workloads data accessible over the old and the new mount path. The following example shows the process of how to successfully identify the necessary mount points and create the mount bind.

Identify the base64 hash values

The used hash values can be calculated using the base64 tool in any Linux distribution.

# echo -n 10.10.2.20:/NFS_A1 | base64
MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl

# echo -n 10.20.3.22:/NFS_B2 | base64
MTAuMjAuMy4yMjovdXBzdHJlYW1fdGFyZ2V0

Create and bind the paths

Based on the identified base64 hash values the following paths are required on each Compute node.

/var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl

and

/var/triliovault-mounts/MTAuMjAuMy4yMjovdXBzdHJlYW1fdGFyZ2V0

In the scenario of this runbook is the workload coming from the NFS_A1 NFS-Share, which means the mount path of that NFS-Share needs to be created and bound to the Target Cloud.

#mkdir /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl
#mount --bind 
/var/triliovault-mounts/MTAuMjAuMy4yMjovdXBzdHJlYW1fdGFyZ2V0/ /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl

To keep the desired mount past a reboot it is recommended to edit the fstab of all compute nodes accordingly.

#vi /etc/fstab
/var/triliovault-mounts/MTAuMjAuMy4yMjovdXBzdHJlYW1fdGFyZ2V0/ 		/ var/triliovault-mounts/ MTAuMTAuMi4yMDovdXBzdHJlYW1fc291cmNl	none		bind 		0 0

Reassign the workload

Trilio workloads have clear ownership. When a workload is moved to a different cloud it is necessary to change the ownership. The ownership can only be changed by Openstack administrators.

Add admin-user to required domains and projects

To fulfill the required tasks an admin role user is used. This user will be used until the workload has been restored. Therefore, it is necessary to provide this user access to the desired Target Project on the Target Cloud.

# source {customer admin rc file}  
# openstack role add Admin --user <my_admin_user> --user-domain <admin_domain> --domain <target_domain>  
# openstack role add Admin --user <my_admin_user> --user-domain <admin_domain> --project <target_project> --project-domain <target_domain>  
# openstack role add <Backup Trustee Role> --user <my_admin_user> --user-domain <admin_domain> --project <destination_project> --project-domain <target_domain>

Discover orphaned Workloads from NFS-Storage of Target Cloud

Each Trilio installation maintains a database of workloads that are known to the Trilio installation. Workloads that are not maintained by a specific Trilio installation, are from the perspective of that installation, orphaned workloads. An orphaned workload is a workload accessible on the NFS-Share, that is not assigned to any existing project in the Cloud the Trilio installation is protecting.

# workloadmgr workload-get-orphaned-workloads-list --migrate_cloud True    
+------------+--------------------------------------+----------------------------------+----------------------------------+  
|     Name   |                  ID                  |            Project ID            |  User ID                         |  
+------------+--------------------------------------+----------------------------------+----------------------------------+  
| Workload_1 | 6639525d-736a-40c5-8133-5caaddaaa8e9 | 4224d3acfd394cc08228cc8072861a35 |  329880dedb4cd357579a3279835f392 |  
| Workload_2 | 904e72f7-27bb-4235-9b31-13a636eb9c95 | 637a9ce3fd0d404cabf1a776696c9c04 |  329880dedb4cd357579a3279835f392 |  
+------------+--------------------------------------+----------------------------------+----------------------------------+

List available projects on Target Cloud in the Target Domain

The identified orphaned workloads need to be assigned to their new projects. The following provides the list of all available projects viewable by the used admin-user in the target_domain.

# openstack project list --domain <target_domain>  
+----------------------------------+----------+  
| ID                               | Name     |  
+----------------------------------+----------+  
| 01fca51462a44bfa821130dce9baac1a | project1 |  
| 33b4db1099ff4a65a4c1f69a14f932ee | project2 |  
| 9139e694eb984a4a979b5ae8feb955af | project3 |  
+----------------------------------+----------+ 

List available users on the Target Cloud in the Target Project that have the right backup trustee role

To allow project owners to work with the workloads as well will they get assigned to a user with the backup trustee role that is existing in the target project.

# openstack role assignment list --project <target_project> --project-domain <target_domain> --role <backup_trustee_role>
+----------------------------------+----------------------------------+-------+----------------------------------+--------+-----------+
| Role                             | User                             | Group | Project                          | Domain | Inherited |
+----------------------------------+----------------------------------+-------+----------------------------------+--------+-----------+
| 9fe2ff9ee4384b1894a90878d3e92bab | 72e65c264a694272928f5d84b73fe9ce |       | 8e16700ae3614da4ba80a4e57d60cdb9 |        | False     |
| 9fe2ff9ee4384b1894a90878d3e92bab | d5fbd79f4e834f51bfec08be6d3b2ff2 |       | 8e16700ae3614da4ba80a4e57d60cdb9 |        | False     |
| 9fe2ff9ee4384b1894a90878d3e92bab | f5b1d071816742fba6287d2c8ffcd6c4 |       | 8e16700ae3614da4ba80a4e57d60cdb9 |        | False     |
+----------------------------------+----------------------------------+-------+----------------------------------+--------+-----------+

Reassign the workload to the target project

Now that all informations have been gathered the workload can be reassigned to the target project.

# workloadmgr workload-reassign-workloads --new_tenant_id {target_project_id} --user_id {target_user_id} --workload_ids {workload_id} --migrate_cloud True    
+-----------+--------------------------------------+----------------------------------+----------------------------------+  
|    Name   |                  ID                  |            Project ID            |  User ID                         |  
+-----------+--------------------------------------+----------------------------------+----------------------------------+  
| project1  | 904e72f7-27bb-4235-9b31-13a636eb9c95 | 4f2a91274ce9491481db795dcb10b04f | 3e05cac47338425d827193ba374749cc |  
+-----------+--------------------------------------+----------------------------------+----------------------------------+ 

Verify the workload is available at the desired target_project

After the workload has been assigned to the new project it is recommended to verify the workload is managed by the Target Trilio and is assigned to the right project and user.

# workloadmgr workload-show ac9cae9b-5e1b-4899-930c-6aa0600a2105
+-------------------+------------------------------------------------------------------------------------------------------+
| Property          | Value                                                                                                |
+-------------------+------------------------------------------------------------------------------------------------------+
| availability_zone | nova                                                                                                 |
| created_at        | 2019-04-18T02:19:39.000000                                                                           |
| description       | Test Linux VMs                                                                                       |
| error_msg         | None                                                                                                 |
| id                | ac9cae9b-5e1b-4899-930c-6aa0600a2105                                                                 |
| instances         | [{"id": "38b620f1-24ae-41d7-b0ab-85ffc2d7958b", "name": "Test-Linux-1"}, {"id":                      |
|                   | "3fd869b2-16bd-4423-b389-18d19d37c8e0", "name": "Test-Linux-2"}]                                     |
| interval          | None                                                                                                 |
| jobschedule       | True                                                                                                 |
| name              | Test Linux                                                                                           |
| project_id        | 2fc4e2180c2745629753305591aeb93b                                                                     |
| scheduler_trust   | None                                                                                                 |
| status            | available                                                                                            |
| storage_usage     | {"usage": 60555264, "full": {"usage": 44695552, "snap_count": 1}, "incremental": {"usage": 15859712, |
|                   | "snap_count": 13}}                                                                                   |
| updated_at        | 2019-11-15T02:32:43.000000                                                                           |
| user_id           | 72e65c264a694272928f5d84b73fe9ce                                                                     |
| workload_type_id  | f82ce76f-17fe-438b-aa37-7a023058e50d                                                                 |
+-------------------+------------------------------------------------------------------------------------------------------+

Restore the workload

The reassigned workload can be restored using Horizon following the procedure described here.

This runbook will continue on the CLI only path.

Prepare the selective restore by getting the snapshot information

To be able to do the necessary selective restore a few pieces of information about the snapshot to be restored are required. The following process will provide all necessary information.

List all Snapshots of the workload to restore to identify the snapshot to restore

# workloadmgr snapshot-list --workload_id ac9cae9b-5e1b-4899-930c-6aa0600a2105 --all True

+----------------------------+--------------+--------------------------------------+--------------------------------------+---------------+-----------+-----------+
|         Created At         |     Name     |                  ID                  |             Workload ID              | Snapshot Type |   Status  |    Host   |
+----------------------------+--------------+--------------------------------------+--------------------------------------+---------------+-----------+-----------+
| 2019-11-02T02:30:02.000000 | jobscheduler | f5b8c3fd-c289-487d-9d50-fe27a6561d78 | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |      full     | available | Upstream2 |
| 2019-11-03T02:30:02.000000 | jobscheduler | 7e39e544-537d-4417-853d-11463e7396f9 | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |  incremental  | available | Upstream2 |
| 2019-11-04T02:30:02.000000 | jobscheduler | 0c086f3f-fa5d-425f-b07e-a1adcdcafea9 | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |  incremental  | available | Upstream2 |
+----------------------------+--------------+--------------------------------------+--------------------------------------+---------------+-----------+-----------+

Get Snapshot Details with network details for the desired snapshot

# workloadmgr snapshot-show --output networks 7e39e544-537d-4417-853d-11463e7396f9

+-------------------+--------------------------------------+
| Snapshot property | Value                                |
+-------------------+--------------------------------------+
| description       | None                                 |
| host              | Upstream2                            |
| id                | 7e39e544-537d-4417-853d-11463e7396f9 |
| name              | jobscheduler                         |
| progress_percent  | 100                                  |
| restore_size      | 44040192 Bytes or Approx (42.0MB)    |
| restores_info     |                                      |
| size              | 1310720 Bytes or Approx (1.2MB)      |
| snapshot_type     | incremental                          |
| status            | available                            |
| time_taken        | 154 Seconds                          |
| uploaded_size     | 1310720                              |
| workload_id       | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |
+-------------------+--------------------------------------+

+----------------+---------------------------------------------------------------------------------------------------------------------+
|   Instances    |                                                        Value                                                        |
+----------------+---------------------------------------------------------------------------------------------------------------------+
|     Status     |                                                      available                                                      |
| Security Group | [{u'name': u'Test', u'security_group_type': u'neutron'}, {u'name': u'default', u'security_group_type': u'neutron'}] |
|     Flavor     |                         {u'ephemeral': u'0', u'vcpus': u'1', u'disk': u'1', u'ram': u'512'}                         |
|      Name      |                                                     Test-Linux-1                                                    |
|       ID       |                                         38b620f1-24ae-41d7-b0ab-85ffc2d7958b                                        |
|                |                                                                                                                     |
|     Status     |                                                      available                                                      |
| Security Group | [{u'name': u'Test', u'security_group_type': u'neutron'}, {u'name': u'default', u'security_group_type': u'neutron'}] |
|     Flavor     |                         {u'ephemeral': u'0', u'vcpus': u'1', u'disk': u'1', u'ram': u'512'}                         |
|      Name      |                                                     Test-Linux-2                                                    |
|       ID       |                                         3fd869b2-16bd-4423-b389-18d19d37c8e0                                        |
|                |                                                                                                                     |
+----------------+---------------------------------------------------------------------------------------------------------------------+

+-------------+----------------------------------------------------------------------------------------------------------------------------------------------+
|   Networks  | Value                                                                                                                                        |
+-------------+----------------------------------------------------------------------------------------------------------------------------------------------+
|  ip_address | 172.20.20.20                                                                                                                                 |
|    vm_id    | 38b620f1-24ae-41d7-b0ab-85ffc2d7958b                                                                                                         |
|   network   | {u'subnet': {u'ip_version': 4, u'cidr': u'172.20.20.0/24', u'gateway_ip': u'172.20.20.1', u'id': u'3a756a89-d979-4cda-a7f3-dacad8594e44', 
u'name': u'Trilio Test'}, u'cidr': None, u'id': u'5f0e5d34-569d-42c9-97c2-df944f3924b1', u'name': u'Trilio_Test_Internal', u'network_type': u'neutron'}      |
| mac_address | fa:16:3e:74:58:bb                                                                                                                            |
|             |                                                                                                                                              |
|  ip_address | 172.20.20.13                                                                                                                                 |
|    vm_id    | 3fd869b2-16bd-4423-b389-18d19d37c8e0                                                                                                         |
|   network   | {u'subnet': {u'ip_version': 4, u'cidr': u'172.20.20.0/24', u'gateway_ip': u'172.20.20.1', u'id': u'3a756a89-d979-4cda-a7f3-dacad8594e44',
u'name': u'Trilio Test'}, u'cidr': None, u'id': u'5f0e5d34-569d-42c9-97c2-df944f3924b1', u'name': u'Trilio_Test_Internal', u'network_type': u'neutron'}      |
| mac_address | fa:16:3e:6b:46:ae                                                                                                                            |
+-------------+----------------------------------------------------------------------------------------------------------------------------------------------+

Get Snapshot Details with disk details for the desired Snapshot

[root@upstreamcontroller ~(keystone_admin)]# workloadmgr snapshot-show --output disks 7e39e544-537d-4417-853d-11463e7396f9

+-------------------+--------------------------------------+
| Snapshot property | Value                                |
+-------------------+--------------------------------------+
| description       | None                                 |
| host              | Upstream2                            |
| id                | 7e39e544-537d-4417-853d-11463e7396f9 |
| name              | jobscheduler                         |
| progress_percent  | 100                                  |
| restore_size      | 44040192 Bytes or Approx (42.0MB)    |
| restores_info     |                                      |
| size              | 1310720 Bytes or Approx (1.2MB)      |
| snapshot_type     | incremental                          |
| status            | available                            |
| time_taken        | 154 Seconds                          |
| uploaded_size     | 1310720                              |
| workload_id       | ac9cae9b-5e1b-4899-930c-6aa0600a2105 |
+-------------------+--------------------------------------+

+----------------+---------------------------------------------------------------------------------------------------------------------+
|   Instances    |                                                        Value                                                        |
+----------------+---------------------------------------------------------------------------------------------------------------------+
|     Status     |                                                      available                                                      |
| Security Group | [{u'name': u'Test', u'security_group_type': u'neutron'}, {u'name': u'default', u'security_group_type': u'neutron'}] |
|     Flavor     |                         {u'ephemeral': u'0', u'vcpus': u'1', u'disk': u'1', u'ram': u'512'}                         |
|      Name      |                                                     Test-Linux-1                                                    |
|       ID       |                                         38b620f1-24ae-41d7-b0ab-85ffc2d7958b                                        |
|                |                                                                                                                     |
|     Status     |                                                      available                                                      |
| Security Group | [{u'name': u'Test', u'security_group_type': u'neutron'}, {u'name': u'default', u'security_group_type': u'neutron'}] |
|     Flavor     |                         {u'ephemeral': u'0', u'vcpus': u'1', u'disk': u'1', u'ram': u'512'}                         |
|      Name      |                                                     Test-Linux-2                                                    |
|       ID       |                                         3fd869b2-16bd-4423-b389-18d19d37c8e0                                        |
|                |                                                                                                                     |
+----------------+---------------------------------------------------------------------------------------------------------------------+

+-------------------+--------------------------------------------------+
|       Vdisks      |                      Value                       |
+-------------------+--------------------------------------------------+
| volume_mountpoint |                     /dev/vda                     |
|    restore_size   |                     22020096                     |
|    resource_id    |       ebc2fdd0-3c4d-4548-b92d-0e16734b5d9a       |
|    volume_name    |       0027b140-a427-46cb-9ccf-7895c7624493       |
|    volume_type    |                       None                       |
|       label       |                       None                       |
|    volume_size    |                        1                         |
|     volume_id     |       0027b140-a427-46cb-9ccf-7895c7624493       |
| availability_zone |                       nova                       |
|       vm_id       |       38b620f1-24ae-41d7-b0ab-85ffc2d7958b       |
|      metadata     | {u'readonly': u'False', u'attached_mode': u'rw'} |
|                   |                                                  |
| volume_mountpoint |                     /dev/vda                     |
|    restore_size   |                     22020096                     |
|    resource_id    |       8007ed89-6a86-447e-badb-e49f1e92f57a       |
|    volume_name    |       2a7f9e78-7778-4452-af5b-8e2fa43853bd       |
|    volume_type    |                       None                       |
|       label       |                       None                       |
|    volume_size    |                        1                         |
|     volume_id     |       2a7f9e78-7778-4452-af5b-8e2fa43853bd       |
| availability_zone |                       nova                       |
|       vm_id       |       3fd869b2-16bd-4423-b389-18d19d37c8e0       |
|      metadata     | {u'readonly': u'False', u'attached_mode': u'rw'} |
|                   |                                                  |
+-------------------+--------------------------------------------------+

Prepare the selective restore by creating the restore.json file

The selective restore is using a restore.json file for the CLI command. This restore.json file needs to be adjusted according to the desired restore.

{
   u'description':u'<description of the restore>',
   u'oneclickrestore':False,
   u'restore_type':u'selective',
   u'type':u'openstack',
   u'name':u'<name of the restore>'
   u'openstack':{
      u'instances':[
         {
            u'name':u'<name instance 1>',
            u'availability_zone':u'<AZ instance 1>',
            u'nics':[ #####Leave empty for network topology restore
            ],
            u'vdisks':[
               {
                  u'id':u'<old disk id>',
                  u'new_volume_type':u'<new volume type name>',
                  u'availability_zone':u'<new cinder volume AZ>'
               }
            ],
            u'flavor':{
               u'ram':<RAM in MB>,
               u'ephemeral':<GB of ephemeral disk>,
               u'vcpus':<# vCPUs>,
               u'swap':u'<GB of Swap disk>',
               u'disk':<GB of boot disk>,
               u'id':u'<id of the flavor to use>'
            },
            u'include':<True/False>,
            u'id':u'<old id of the instance>'
         } #####Repeat for each instance in the snapshot
      ],
      u'restore_topology':<True/False>,
      u'networks_mapping':{
         u'networks':[ #####Leave empty for network topology restore
            
         ]
      }
   }
}

Run the selective restore

To do the actual restore use the following command:

# workloadmgr snapshot-selective-restore --filename restore.json {snapshot id}

Verify the restore

To verify the success of the restore from a Trilio perspective the restore status is checked.

[root@upstreamcontroller ~(keystone_admin)]# workloadmgr restore-list --snapshot_id 5928554d-a882-4881-9a5c-90e834c071af

+----------------------------+------------------+--------------------------------------+--------------------------------------+----------+-----------+
|         Created At         |       Name       |                  ID                  |             Snapshot ID              |   Size   |   Status  |
+----------------------------+------------------+--------------------------------------+--------------------------------------+----------+-----------+
| 2019-09-24T12:44:38.000000 | OneClick Restore | 5b4216d0-4bed-460f-8501-1589e7b45e01 | 5928554d-a882-4881-9a5c-90e834c071af | 41126400 | available |
+----------------------------+------------------+--------------------------------------+--------------------------------------+----------+-----------+

[root@upstreamcontroller ~(keystone_admin)]# workloadmgr restore-show 5b4216d0-4bed-460f-8501-1589e7b45e01
+------------------+------------------------------------------------------------------------------------------------------+
| Property         | Value                                                                                                |
+------------------+------------------------------------------------------------------------------------------------------+
| created_at       | 2019-09-24T12:44:38.000000                                                                           |
| description      | -                                                                                                    |
| error_msg        | None                                                                                                 |
| finished_at      | 2019-09-24T12:46:07.000000                                                                           |
| host             | Upstream2                                                                                            |
| id               | 5b4216d0-4bed-460f-8501-1589e7b45e01                                                                 |
| instances        | [{"status": "available", "id": "b8506f04-1b99-4ca8-839b-6f5d2c20d9aa", "name": "temp", "metadata":   |
|                  | {"instance_id": "c014a938-903d-43db-bfbb-ea4998ff1a0f", "production": "1", "config_drive": ""}}]     |
| name             | OneClick Restore                                                                                     |
| progress_msg     | Restore from snapshot is complete                                                                    |
| progress_percent | 100                                                                                                  |
| project_id       | 8e16700ae3614da4ba80a4e57d60cdb9                                                                     |
| restore_options  | {"description": "-", "oneclickrestore": true, "restore_type": "oneclick", "openstack": {"instances": |
|                  | [{"availability_zone": "US-West", "id": "c014a938-903d-43db-bfbb-ea4998ff1a0f", "name": "temp"}]},   |
|                  | "type": "openstack", "name": "OneClick Restore"}                                                     |
| restore_type     | restore                                                                                              |
| size             | 41126400                                                                                             |
| snapshot_id      | 5928554d-a882-4881-9a5c-90e834c071af                                                                 |
| status           | available                                                                                            |
| time_taken       | 89                                                                                                   |
| updated_at       | 2019-09-24T12:44:38.000000                                                                           |
| uploaded_size    | 41126400                                                                                             |
| user_id          | d5fbd79f4e834f51bfec08be6d3b2ff2                                                                     |
| warning_msg      | None                                                                                                 |
| workload_id      | 02b1aca2-c51a-454b-8c0f-99966314165e                                                                 |
+------------------+------------------------------------------------------------------------------------------------------+

Reconfigure the Target Trilio installation back to the original one

After the Desaster Recovery Process has finished it is necessary to return the Trilio installation to its original configuration. The following steps need to be done to completely reconfigure the Trilio installation.

During the reconfiguration process will all backups of the Target Region be on hold and it is not recommended to create new Backup Jobs until the Desaster Recovery Process has finished and the original Trilio configuration has been restored.

Delete NFS B2 to the Trilio Appliance Cluster

To add the NFS-Vol2 to the Trilio Appliance cluster the Trilio can either be fully reconfigured to use both NFS Volumes or it is possible to edit the configuration file and then restart all services. This procedure describes how to edit the conf file and restart the services. This needs to be repeated on every Trilio Appliance.

Edit the workloadmgr.conf

# vi /etc/workloadmgr/workloadmgr.conf

Look for the line defining the NFS mounts

vault_storage_nfs_export = <NFS_B1-IP/NFS-FQDN>:/<VOL-B1-Path>,<NFS_B2-IP/NFS-FQDN>:/<VOL—B2-Path>

Delete NFS B2 from the comma-seperated list

vault_storage_nfs_export = <NFS_B1-IP/NFS_B1-FQDN>:/<VOL-B1-Path>

Write and close the workloadmgr.conf

Restart the wlm-workloads service

# systemctl restart wlm-workloads

Delete NFS B2 to the Trilio Datamovers

Trilio is integrating natively into the Openstack deployment tools. When using the Red Hat director or JuJu charms it is recommended to adapt the environment files for these orchestrators and update the Datamovers through them.

To add the NFS B2 to the Trilio Datamovers manually the tvault-contego.conf file needs to be edited and the service restarted.

Edit the tvault-contego.conf

# vi /etc/tvault-contego/tvault-contego.conf

Look for the line defining the NFS mounts

vault_storage_nfs_export = <NFS_B1-IP/NFS-FQDN>:/<VOL-B1-Path>,<NFS_B2-IP/NFS-FQDN>:/<VOL—B2-Path>

Add NFS B2 to that as comma-seperated list. Space is not necessary, but can be set.

vault_storage_nfs_export = <NFS-IP/NFS-FQDN>:/<VOL-1-Path>

Write and close the tvault-contego.conf

Restart the tvault-contego service

# systemctl restart tvault-contego

Clean up

After the Desaster Recovery Process has been successfully completed and the Trilio installation reconfigured to its original state, it is recommended to do the following additional steps to be ready for the next Disaster Recovery process.

Remove the database entry

The Trilio database is following the Openstack standard of not deleting any database entries upon deletion of the cloud object. Any Workload, Snapshot or Restore, which gets deleted, is marked as deleted only.

To allow the Trilio installation to be ready for another disaster recovery it is necessary to completely delete the entries of the Workloads, which have been restored.

Trilio does provide and maintain a script to safely delete workload entries and all connected entities from the Trilio database.

This script can be found here: https://github.com/trilioData/solutions/tree/master/openstack/CleanWlmDatabase

Remove the admin user from the project

After all restores for the target project have been achieved it is recommended to remove the used admin user from the project again.

# source {customer admin rc file}  
# openstack role remove Admin --user <my_admin_user> --user-domain <admin_domain> --domain <target_domain>  
# openstack role remove Admin --user <my_admin_user> --user-domain <admin_domain> --project <target_project> --project-domain <target_domain>  
# openstack role remove <Backup Trustee Role> --user <my_admin_user> --user-domain <admin_domain> --project <destination_project> --project-domain <target_domain>

Last updated