AWS RDS snapshots using T4K hooks

How to backup/restore AWS RDS using hooks

1. Backup/Restore of Application with external DB (AWS RDS)

Objective - Support backup of external databases (AWS RDS, Google Cloud SQL to begin with)

Summary - This use case is to support the backup/restore of external cloud database configured with applications running on k8s cluster.

Pre-requisite -

  1. Applications on the k8s cluster are configured to use Cloud Database managed service

  2. All the required details for the cloud database connectivity are available such as database instance name, credentials

Assumptions -

  1. Applications on the k8s cluster are configured with specific Database instance

  2. User provides all the information required for Database backup (on-demand)

  3. All the requirements for database backup are provided such as user access for backup, zones/regions allowed for backups, allotted space, Database instance name/identifier, location etc.

  4. There is no additional mapping required post restore of the Database i.e. application should be able to connect to the Database using same credentials, instance name etc.

Workflow -

  1. User provides inputs for a. List of applications configured to use external database b. The Database instance name/identifier c. User credentials for backup d. Zone/region allowed for backup e. Location for backup

  2. A “jump” pod is needed with Cloud and kubectl libraries which are required to connect to remote Cloud databases for backup/restore. This “jump” pod is created in the same namespace where the application to be backed up is running.

  3. As per the policy created for the application backup, the application backup is triggered.

  4. Along with application backup, an on-demand backup/snapshot will be created for the external database using API calls. This will be achieved by Backup hooks.

  5. Snapshot identified of the Cloud Database is constructed using BackupPlan and Backup names. This will be a unique identifier and the user will always have access to these details. Also, this will help maintain a link between the T4K backup and the Database snapshot.

  6. For restore, user provides inputs for a. BackupPlan and Backup name for restore b. The Database instance name for restore c. User credentials for restore d. Zone/region allowed for restore e. VPC Security Group

  7. At the time of restore of the application, the backup/snapshot will be restored using the name/identifier. This will be achieved by Restore hooks.

  8. Post restore, the application connectivity with the external database will be established based on the database instance name/identifier and the user provided credentials.

Example with steps Followed -

In AWS RDS, create a database instance using MySQL Engine. A free tier instance can be selected. Create a new VPC Security Group. Update inbound and outbound rules to allow access.

Once the instance is created, note down the Username, Password and Endpoint which are needed to connect to the DB instance. Also, create a database for wordpress application in the Database instance.

Deploy wordpress application with AWS RDS database configured (replace host url, admin password, DB name)

$ helm install wordpress bitnami/wordpress -n triliowp \
>   --set mariadb.enabled=false \
>   --set externalDatabase.host=<ext-db-host-url>.rds.amazonaws.com \
>   --set externalDatabase.user=admin \
>   --set externalDatabase.password=<admin_password> \
>   --set externalDatabase.database=<db name> \
>   --set externalDatabase.port=3306

Check the AWS RDS Database is the wordpress tables are populated.

$ mysql -h <ext-db-host-url>.rds.amazonaws.com -P 3306 -u admin -p<admin_password>
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 55
Server version: 8.0.23 Source distribution

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| trilio             |
+--------------------+
5 rows in set (0.00 sec)

mysql> use trilio;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql>
mysql> show tables;
+-----------------------+
| Tables_in_triliowp    |
+-----------------------+
| wp_commentmeta        |
| wp_comments           |
| wp_links              |
| wp_options            |
| wp_postmeta           |
| wp_posts              |
| wp_term_relationships |
| wp_term_taxonomy      |
| wp_termmeta           |
| wp_terms              |
| wp_usermeta           |
| wp_users              |
+-----------------------+
12 rows in set (0.00 sec)

mysql>
mysql> quit;
Bye
$

Backup hooks are used to create a snapshot of the AWS RDS database snapshot. To create the RDS database snapshot, a pod with aws cli needed in the namespace. Using AWS cli, we can connect to RDS to create a snapshot.

Create the “jump” pod in the same namespace where wordpress is installed. Use the script below. It uses an image which has kubectl, aws cli installed. We need to pass on aws credentials and kubeconfig to this image. Aws credentials and region are configured using “aws configure”.

$ cat create_awsrds_jumphost.sh
#!/bin/bash

## AWS RDS Credentials and kubeconfig files are needed
## Configure AWS using "aws cofigure" cmd as below
## $ aws configure
##   AWS Access Key ID [****************Y4TA]:
##   AWS Secret Access Key [****************KmKp]:
##   Default region name [None]: us-east-2
##   Default output format [None]:
##
## Copy current kubeconfig to /root/.kube/config

# Create configmap for kubeconfig
if [ -e /root/.kube/config ]
then
  echo "Using /root/.kube/config as valid kubeconfig"
else
  echo "File /root/.kube/config doesn't exit, please copy valid kubeconfig to /root/.kube/config"
  exit 1
fi

# Cleanup existing configmap, if any
kubectl delete configmap kubeconf -n triliowp
kubectl delete configmap awsconf -n triliowp

kubectl create configmap kubeconf --from-file=/root/.kube/config -n triliowp
if (kubectl get configmap kubeconf -n triliowp 2>/dev/null)
then
  echo "Configmap for kubeconfig created successfully"
else
  echo "Configmap for kubeconfig is not created, exiting..."
  exit 1
fi

# Create configmap for aws rds credentials
if [[ -e /root/.aws/credentials && -e /root/.aws/config ]]
then
  echo "Using /root/.aws/credentials and /root/.aws/config"
else
  echo "Files /root/.aws/credentials and /root/.aws/config don't exit"
  echo "Please configure AWS credentials and Region using aws configure"
  exit 1
fi

kubectl create configmap awsconf --from-file=/root/.aws/ -n triliowp
if (kubectl get configmap awsconf -n triliowp 2>/dev/null)
then
  echo "Configmap for AWS config created successfully"
else
  echo "Configmap for AWS config is not created, exiting..."
  exit 1
fi

# Create a jump pod for running aws rds and kubectl cmds
cat <<EOF | kubectl apply -f - -n triliowp
apiVersion: v1
kind: Pod
metadata:
  name: jumphost-awscli
spec:
  containers:
    - name: jumphost-awscli
      image: bearengineer/awscli-kubectl
      imagePullPolicy: Always
      command: [ "sh", "-c", "tail -f /dev/null" ]
      volumeMounts:
      - mountPath: "/root/.kube"
        name: kubeconf
      - mountPath: "/root/.aws"
        name: awsconf
  volumes:
  - name: kubeconf
    configMap:
      name: kubeconf
  - name: awsconf
    configMap:
      name: awsconf
EOF

# Wait for the pod to be up/running
until (kubectl get pod jumphost-awscli -n triliowp 2>/dev/null | grep Running); do sleep 5; done
sleep 10

# All set, start the backup with hook to run rds snapshot
$

$ ./create_awsrds_jumphost.sh
Using /root/.kube/config as valid kubeconfig
configmap/kubeconf created
NAME       DATA   AGE
kubeconf   1      1s
Configmap for kubeconfig created successfully
Using /root/.aws/credentials and /root/.aws/config
configmap/awsconf created
NAME      DATA   AGE
awsconf   2      1s
Configmap for AWS config created successfully
apiVersion: v1
kind: Pod
metadata:
  name: jumphost-awscli
spec:
  containers:
    - name: jumphost-awscli
      image: bearengineer/awscli-kubectl
      imagePullPolicy: Always
      command: [ "sh", "-c", "tail -f /dev/null" ]
      volumeMounts:
      - mountPath: "/root/.kube"
        name: kubeconf
      - mountPath: "/root/.aws"
        name: awsconf
  volumes:
  - name: kubeconf
    configMap:
      name: kubeconf
  - name: awsconf
    configMap:
      name: awsconf
pod/jumphost-awscli created
jumphost-awscli   1/1     Running   0          6s
$
$ kubectl get pod jumphost-awscli
NAME              READY   STATUS    RESTARTS   AGE
jumphost-awscli   1/1     Running   0          10s
$

Create a backup hook as per yaml below using UI or CLI. In the “post” command of the hook, a script is provided which identifies a running backup and corresponding backupPlan, then creates the snapshot of the DB instance (--db-instance-identifier to be provided by the user) with snapshot identifier as “tvk-backupName-backupPlanName”. Please replace <DB Instance Identifier> with actual value below.

apiVersion: triliovault.trilio.io/v1
kind: Hook
metadata:
  name: wp-hook
  namespace: triliowp
spec:
  pre:
    execAction:
      command:
      - sh
      - -c
      - export DB_INSTANCE_ID=<DB Instance Identifier>; export SNAPSHOT_ID=$(kubectl get backup -n triliowp | grep -i InProgress | awk '{print "tvk"-$1"-"$2}'); aws rds create-db-snapshot --db-instance-identifier $DB_INSTANCE_ID --db-snapshot-identifier $SNAPSHOT_ID; aws rds wait db-snapshot-completed --db-snapshot-identifier $SNAPSHOT_ID
    timeoutSeconds: 30
  post:
    execAction:
      command:
      - sh
      - -c
      - echo 'post hook action completed'
    ignoreFailure: true
    timeoutSeconds: 30

Create a backupPlan using UI or CLI. Please specify the jump pod created as step 6 in pod selector for hook execution.

apiVersion: triliovault.trilio.io/v1
kind: BackupPlan
metadata:
  name: wp-hook-bkpplan
  namespace: triliowp
spec:
  backupConfig:
    schedulePolicy:
      fullBackupCron:
        schedule: ""
      incrementalCron:
        schedule: ""
    target:
      apiVersion: triliovault.trilio.io/v1
      kind: Target
      name: demo-s3-target
      namespace: triliowp
  backupPlanComponents: {}
  hookConfig:
    hooks:
    - containerRegex: jumphost-awscli*
      hook:
        apiVersion: triliovault.trilio.io/v1
        kind: Hook
        name: wp-hook
        namespace: triliowp
      podSelector:
        regex: jumphost-awscli*
    mode: Sequential
    podReadyWaitSeconds: 120

Create a namespace backup where the wordpress application is installed using CLI or UI. This will also create a snapshot of the RDS database instance.

$ kubectl get backup demo-backup -n triliowp
NAME          BACKUPPLAN        BACKUP TYPE   STATUS      DATA SIZE   START TIME             END TIME               PERCENTAGE COMPLETED   BACKUP SCOPE   DURATION
demo-backup   wp-hook-bkpplan   Full          Available   182910976   2021-10-13T11:12:15Z   2021-10-13T11:19:03Z   100                    Namespace      6m48.695350343s
$

As part of the backup, a snapshot of the RDS database is taken. Same can be checked using below command with a query.

$ aws rds describe-db-snapshots --db-instance-identifier <DB Instance Identifier> --snapshot-type manual --query="reverse(sort_by(DBSnapshots, &SnapshotCreateTime))[*]"
[
    {
        "DBSnapshotIdentifier": "tvk-demo-backup-wp-hook-bkpplan",
        "DBInstanceIdentifier": "<DB-Instance-Identifier>",
        "SnapshotCreateTime": "2021-10-13T11:07:41.519000+00:00",
        "Engine": "mysql",
        "AllocatedStorage": 20,
        "Status": "available",
        "Port": 3306,
        "AvailabilityZone": "us-east-2b",
        "VpcId": "<vpc-aaaaa>",
        "InstanceCreateTime": "2021-10-07T11:52:03.032000+00:00",
        "MasterUsername": "admin",
        "EngineVersion": "8.0.23",
        "LicenseModel": "general-public-license",
        "SnapshotType": "manual",
        "OptionGroupName": "default:mysql-8-0",
        "PercentProgress": 100,
        "StorageType": "gp2",
        "Encrypted": false,
        "DBSnapshotArn": "arn:aws:rds:us-east-2:753922706039:snapshot:demo-backup-wp-hook-bkpplan",
        "IAMDatabaseAuthenticationEnabled": false,
        "ProcessorFeatures": [],
        "DbiResourceId": "db-3BW2VXN5YZ4LORLSKOJU3IKUIQ",
        "TagList": [],
        "OriginalSnapshotCreateTime": "2021-10-13T11:07:41.519000+00:00"
    }
]
$

For Restore, create a restore hook as per the yaml file given below. We need user inputs for

  1. Database Instance name for restore (DB_INSTANCE_ID=<DB Instance Identifier>)

  2. VPC Security group to be used (VPC_SEC_GROUP_=_<VPC-SEC-GROUP>)

  3. Name of the backup (BACKUP_NAME="demo-backup")

  4. Name of the Backup Plan (BACKUPPLAN_NAME="wp-hook-bkpplan")

  5. Based on the backup and backup plan name, SNAPSHOT ID is formed by concatenating these 2 strings.

  6. Also note that additional maxRetryCount and timeoutSeconds are added for the “post” action as it may take more time.

Replace actual values below for "DB Instance Identifier" and "VPC Sec Group"

apiVersion: triliovault.trilio.io/v1
kind: Hook
metadata:
  name: restorehook
  namespace: restorens
spec:
  pre:
    execAction:
      command:
      - sh
      - -c
      - 'echo ''Pre-hook'' '
    ignoreFailure: true
    timeoutSeconds: 30
  post:
    execAction:
      command:
      - sh
      - -c
      - DB_INSTANCE_ID="<DB Instance Identifier>"; VPC_SEC_GROUP="<VPC Sec Group>"; BACKUP_NAME="demo-backup"; BACKUPPLAN_NAME="wp-hook-bkpplan"; SNAPSHOT_ID=tvk-${BACKUP_NAME}-${BACKUPPLAN_NAME}; aws rds restore-db-instance-from-db-snapshot --db-instance-identifier ${DB_INSTANCE_ID} --db-snapshot-identifier ${SNAPSHOT_ID} --vpc-security-group-ids $(VPC_SEC_GROUP); aws rds wait db-instance-available --db-instance-identifier ${DB_INSTANCE_ID}; echo $SNAPSHOT_ID > /tmp/restorehook
    maxRetryCount: 5
    timeoutSeconds: 300

If we are restoring the RDS database with the same name, no changes are needed in the restored application using the RDS database as there are no changes in the RDS Database endpoint and credentials. However, if a different RDS database name is used for restore, the restore application using the RDS database needs to be updated accordingly.

Create a restore using CLI or UI.

$ kubectl get restore -n restorens
NAME           BACKUP        BACKUP NAMESPACE   STATUS      DATA SIZE   START TIME             END TIME               PERCENTAGE COMPLETED   RESTORE SCOPE   DURATION
demo-restore   demo-backup   triliowp           Completed   126657137   2021-10-13T11:31:11Z   2021-10-13T11:34:51Z   100                    Namespace       3m40.70240849s
$

As part of the restore, the RDS Database instance is created from the snapshot. The wordpress application is up/running in the restore namespace.

This completes the backup and restore of a stateful application which uses external database like AWS RDS using Trilio for Kubernetes.

2. AWS RDS snapshot cleanup upon backup deletion

Objective - Provide tool for deletion of snapshots of external databases (AWS RDS, Google Cloud SQL to begin with)

Summary - This use case is to provide a tool to delete snapshot of external cloud database upon deletion of T4K backup linked to the cluster.

Pre-requisite -

  1. Applications on the k8s cluster are configured to use Cloud Database managed service

  2. All the required details for the cloud database connectivity are available such as database instance name, credentials

  3. k8s cluster with T4K installed and target configured. Based on the backups on the target, the snapshots will be retained/deleted.

Assumptions -

  1. Applications on the k8s cluster are configured with specific Database instance

  2. User provides all the information required for Database snapshot deletion (on-demand)

  3. All the requirements for database snapshot deletion are provided such as user access for backup, zones/regions allowed for backups, allotted space, Database instance name/identifier, location etc.

  4. The T4K backup target configured on the k8s cluster is referred for comparing the available backups against the snapshots of the Cloud database instances.

Workflow -

  1. User provides inputs for a. List of applications configured to use external database b. The Database instance name/identifier c. User credentials for backup, snapshot deletion d. Zone/region allowed for backup, snapshot deletion e. Location for backup

  2. The user inputs/credentials are stored in a configmap or secret

  3. The T4K is installed and backup target is configured.

  4. A cronjob is created using an image which has access to AWS RDS and k8s cluster. It has kubectl, awscli tools and libraries installed.

  5. The cronjob will periodically check the backups available on the backup target and compare it with the snapshots of the external cloud database. If the backup is deleted from the backup target, the cronjob will delete the corresponding snapshot from the external cloud database.

  6. The cronjob is launched "on-demand" by the user.

Example with steps Followed -

Create a configmap with a script to check backups in a user specified namespace. Refer the yaml below. Please replace <DB Instance Identifier> with actual value.

kind: ConfigMap
apiVersion: v1
metadata:
  name: snap-delete-script
data:
  snapdelscript.sh: |
    #!/bin/sh
    echo "Cronjb to cleanup RDS snapshots if T4K backups are not found"
    echo
    snaplist=$(aws rds describe-db-snapshots --db-instance-identifier <DB Instance Identifier> --snapshot-type manual --query="reverse(sort_by(DBSnapshots, &SnapshotCreateTime))[*].DBSnapshotIdentifier" | grep -i tvk | tr -d '",')
    kubectl get backup --no-headers -n triliowp | awk '{print "tvk-"$1"-"$2}' > backuplist
    for snap in ${snaplist}
    do
      if ! grep "$snap" backuplist
      then
        echo "Backup for snapshot $snap not found"
        echo "Deleting the snapshot $snap"
        aws rds delete-db-snapshot --db-snapshot-identifier $snap
        echo
      else
        echo "Backup for $snap Found"
        echo
      fi
    done
    echo "===Done==="

Create a cronjob to run the script with schedule for the execution. It uses an image which has kubectl, aws cli installed. We need to pass on aws credentials and kubeconfig to this image. Aws credentials and region are configured using “aws configure”. The schedule can be changed as per the frequency needed to cleanup the snapshots. Refer the yaml below

kind: CronJob
apiVersion: batch/v1beta1
metadata:
  name: snap-delete-cronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mycron-container
            image: bearengineer/awscli-kubectl
            imagePullPolicy: IfNotPresent

            command: [ "/bin/sh" ]
            args: [ "/tmp/snapdelscript.sh" ]
            volumeMounts:
            - name: script
              mountPath: "/tmp/"
            - name: kubeconf
              mountPath: "/root/.kube"
            - name: awsconf
              mountPath: "/root/.aws"

          volumes:
          - name: script
            configMap:
              name: snap-delete-script
          - name: kubeconf
            configMap:
              name: kubeconf
          - name: awsconf
            configMap:
              name: awsconf
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 0

  concurrencyPolicy: Replace

Steps and output of the commands

$ kubectl create -f snapdelete_configmap.yaml -n snapdelete
configmap/snap-delete-script created
$
$ kubectl get configmap -n snapdelete
NAME                 DATA   AGE
awsconf              2      4d21h
kube-root-ca.crt     1      4d23h
kubeconf             1      4d21h
snap-delete-script   1      9s
$
$ kubectl create -f snap-delete-cronjob.yaml -n snapdelete
cronjob.batch/snap-delete-cronjob created
$ kubectl get all -n snapdelete                                           NAME                                SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/snap-delete-cronjob   */1 * * * *   False     0        <none>          8s
$
$ aws rds describe-db-snapshots --db-instance-identifier <DB Instance Indetifier> --snapshot-type manual --query="reverse(sort_by(DBSnapshots, &SnapshotCreateTime))[*].DBSnapshotIdentifier"
[
    "tvk-demo3-wp-hook-bkpplan",
    "tvk-demo-backup-wp-hook-bkpplan",
    "tvk-demo2-wp-hook-bkpplan",
    "tvk-demo1-wp-hook-bkpplan",
    "demo-backup-wp-hook-bkpplan"
]
$
$ kubectl get all -n snapdelete
NAME                                       READY   STATUS      RESTARTS   AGE
pod/snap-delete-cronjob-1636549440-f7bl2   0/1     Completed   0          10s

NAME                                       COMPLETIONS   DURATION   AGE
job.batch/snap-delete-cronjob-1636549440   1/1           4s         11s

NAME                                SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/snap-delete-cronjob   */1 * * * *   False     0        18s             55s
$
$ kubectl logs pod/snap-delete-cronjob-1636549440-f7bl2 -n snapdelete
Cronjb to cleanup RDS snapshots if T4K backups are not found

Backup for snapshot tvk-demo3-wp-hook-bkpplan not found
Deleting the snapshot tvk-demo3-wp-hook-bkpplan
{
    "DBSnapshot": {
        "MasterUsername": "admin",
        "LicenseModel": "general-public-license",
        "InstanceCreateTime": "2021-10-13T11:46:33.596Z",
        "Engine": "mysql",
        "VpcId": "<vpc-aaaaa>",
        "DBSnapshotIdentifier": "tvk-demo3-wp-hook-bkpplan",
        "AllocatedStorage": 20,
        "Status": "deleted",
        "PercentProgress": 100,
        "DBSnapshotArn": "arn:aws:rds:us-east-2:753922706039:snapshot:tvk-demo3-wp-hook-bkpplan",
        "EngineVersion": "8.0.23",
        "ProcessorFeatures": [],
        "OptionGroupName": "default:mysql-8-0",
        "SnapshotCreateTime": "2021-11-10T12:51:27.805Z",
        "AvailabilityZone": "us-east-2a",
        "StorageType": "gp2",
        "Encrypted": false,
        "IAMDatabaseAuthenticationEnabled": false,
        "DbiResourceId": "db-Q5FFFZMYAU3JNOKRDFYR72YIJE",
        "SnapshotType": "manual",
        "Port": 3306,
        "DBInstanceIdentifier": "<DB-Instance-Identifier>"
    }
}

tvk-demo-backup-wp-hook-bkpplan
Backup for tvk-demo-backup-wp-hook-bkpplan Found

tvk-demo2-wp-hook-bkpplan
Backup for tvk-demo2-wp-hook-bkpplan Found

tvk-demo1-wp-hook-bkpplan
Backup for tvk-demo1-wp-hook-bkpplan Found

===Done===
$
$ aws rds describe-db-snapshots --db-instance-identifier <DB Instance Identifier> --snapshot-type manual --query="reverse(sort_by(DBSnapshots, &SnapshotCreateTime))[*].DBSnapshotIdentifier"
[
    "tvk-demo-backup-wp-hook-bkpplan",
    "tvk-demo2-wp-hook-bkpplan",
    "tvk-demo1-wp-hook-bkpplan",
    "demo-backup-wp-hook-bkpplan"
]
$