Continuous Restore

This document describes setting up continuous restore for an application.

This document describes step by step instructions on how to set up continuous restore for an application. We assume user already has two running clusters with T4K installed on each cluster. Let's call them TVK1 and TVK2. TVK1 ID is e5e22c8a0-de58-4b20-8962-7261755b1173 and TVK2 ID is 23d8f984-d7e8-4c52-b3c2-e6f9af5bbf13. The event target is a shared backup target between TVK1 and TVK2. It could be NFS or S3, but it has no bearing on the discussion. In this example, we use S3 based event target

Create an Event Target

Create an event target on both clusters.

apiVersion: triliovault.trilio.io/v1
kind: Target
metadata:
  annotations:
    trilio.io/event-target: "true"
  name: s3-event-target
  namespace: default
spec:
  objectStoreCredentials:
    bucketName: aj-test-s3
    credentialSecret:
      name: s3-cred-secret
      namespace: default
    region: us-east-1
 type: ObjectStore
  vendor: AWS

T4K 3.0.0 has introduced a new flag in the target create wizard in UI. You can set this flag to True when creating an event target.

Trilio recommends creating the event target in default namespace.

When the event target resource is created in TVK1 and TVK2, the target controller automatically spawns two services, a syncher, and a service manager, in default namespace.

The event target controller also creates a new service account in the default namespace.

Make sure pods and service accounts are created on both clusters.

The syncher service will populate the state information on to event target, which lets each cluster discover each other. The /service-info/heartbeats directory on the event target should look below.

The event target object on each cluster includes the clusters it discovered on the event target. For example:

Create a BackupPlan

Create a new backup plan with a continuous restore feature

In the above example, the user chose the remote cluster with an ID 23d8f984-d7e8-4c52-b3c2-e6f9af5bbf13 with ap as a continuous restore policy. The policy yaml file is shown as follows:

The above policy states that the T4K maintains three consistent sets on the remote cluster.

The syncher service on the local cluster creates the following entry.

The manager on the remote cluster recognizes that the source cluster e5e22c8a0-de58-4b20-8962-7261755b1173 is referring to it in its backup cluster based on the data on the event target. It now spawns a watcher and continuous restore service on the remote cluster.

Similarly, the manager spawns continuous restore responder and watcher services on the local cluster

Once continuous restore service and continuous restore responder reconciles the backup plan between clusters, the backup plan in the UI is shown as follows:

Once a new backup is successfully generated on the source cluster, a new consistency set is automatically created on the remote cluster, as shown below:

Once the consistent set is successfully created, the backup summary on the source displays the consistent set information as shown below:

Users can restore from a consistent set. The current release of the UI only restores from the latest consistent set. Users can choose the consistent set they want to restore from CLI.

Click through the restore wizard.

The restore operation can be monitored from CLI.

The application pod is running as shown here.

Polling Interval

This is a time interval which is used to start synchronising a backup to the destination cluster as a Consistent Set. This is used in both Manager and Watcher. User can change the polling Intervals through the TVM Custom Resource.

To add or update the polling Interval, user needs to add "pollingInterval" field under "helmValues" in TVM Spec. In case of OCP installation, if "helmValues" section is not available in TVM Spec, then user can add it manually as bellow

spec:
  helmValues:
    pollingIntervals:
      manager: 300s
      watcher: 300s
  applicationScope: Cluster
  componentConfiguration:
    ingress-controller:
      enabled: false
  dataJobResources:
    requests:
      cpu: 100m
      memory: 800Mi
  logLevel: Info
  metadataJobResources:
    requests:
      cpu: 10m
      memory: 10Mi
  tvkInstanceName: ocp-instance

Once user add these intervals in TVM, the same will update in both Manager and Watcher. (Note- Watcher pod will only visible when CR is configured in Backup plan)

Warning : Usage of small interval can lead to more memory consumption and this is not recommended in production environment.

Last updated