# Optimize T4K Backups with StormForge

### Introduction

Today many enterprises running Kubernetes applications are not aware of a way to monitor and optimize the resources used by those applications.

StormForge is a platform that helps users to automate resource tuning and deliver the best Kubernetes application performance at the lowest possible cost. StormForge allows users to run an experiment with a user specified number of trials. The trial runs on an application to find the optimized resource configuration required to perform certain operations. It uses machine learning algorithms to pick the combination of CPU, RAM and other resources to run a trial.

### Install and configure StormForge with T4K

Follow below instructions to install and configure the StormForge with Trilio for Kubernetes and run the experiment to find the optimized resource configuration of T4K

#### Install and Configure StormForge redsky-controller-manager with K8s Cluster

1. Download and install redskyctl CLI

```bash
curl -L https://app.stormforge.io/downloads/redskyctl-linux-amd64.tar.gz | tar x
sudo mv redskyctl /usr/local/bin/
```

2\. Authorize the linux server from where redskyctl will be evoked to run the experiment

```bash
redskyctl login --force
```

Enter the URL generated by above command into the browser to authorize the Linux server

```bash
https://auth.carbonrelay.io/authorize?audience=https%3A%2F%2Fapi.carbonrelay.io%2Fv1%2F&client_id=pE3kMKdrMTdW4DOxQHesyAuFGNOWaEke&code_challenge=nUaqSbjv0grweFAMrv_Zk9KBg_8o-4AQAnwxpijJlGk&code_challenge_method=S256&redirect_uri=http%3A%2F%2F127.0.0.1%3A8085%2F&response_type=code&scope=register%3Aclients+offline_access&state=nHGW89zTYO39loEALGTsig
```

3\. Verify if you are connected to the server where you want to run the experiment

```bash
kubectl get nodes
```

```bash
NAME                                          STATUS   ROLES               AGE    VERSION
ip-mytesthost-w1.us-east-2.compute.internal   Ready    worker              4d5h   v1.19.9
ip-mytesthost-w2.us-east-2.compute.internal   Ready    worker              4d5h   v1.19.9
ip-mytesthost-m1.us-east-2.compute.internal   Ready    controlplane,etcd   4d5h   v1.19.9
ip-mytesthost-m2.us-east-2.compute.internal   Ready    controlplane,etcd   4d5h   v1.19.9
ip-mytesthost-w3.us-east-2.compute.internal   Ready    worker              4d5h   v1.19.9
ip-mytesthost-m3.us-east-2.compute.internal   Ready    controlplane,etcd   4d5h   v1.19.9
```

4\. Initiate the redsky controller manager pod on the kubernetes cluster

```bash
redskyctl init
```

```bash
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/experiments.redskyops.dev configured
customresourcedefinition.apiextensions.k8s.io/trials.redskyops.dev configured
clusterrole.rbac.authorization.k8s.io/redsky-manager-role configured
clusterrolebinding.rbac.authorization.k8s.io/redsky-manager-rolebinding unchanged
namespace/redsky-system unchanged
deployment.apps/redsky-controller-manager unchanged
clusterrole.rbac.authorization.k8s.io/redsky-patching-role unchanged
clusterrolebinding.rbac.authorization.k8s.io/redsky-patching-rolebinding unchanged
secret/redsky-manager configured
```

5\. Verify if the redsky-controller-manager is running on k8s cluster

```bash
kubectl get pod -n redsky-system
```

```bash
NAME                                         READY   STATUS    RESTARTS   AGE
redsky-controller-manager-6cbb796b79-5wnz8   1/1     Running   1          4d4h
```

6\. Authorize the kubernetes cluster, where user want to run the experiment and application is present

```bash
redskyctl authorize-cluster
```

```bash
secret/redsky-manager configured
deployment.apps/redsky-controller-manager patched
```

#### Install Demo Application and Configure T4K Resources

1\. There must be an application running in the namespace which will be used to perform the backup by StormForge experiment

```bash
kubectl get pods | grep k8s-demo-app
```

```bash
k8s-demo-app-frontend-7c4bdbf9b-bbz2z                            1/1     Running     1          4d4h
k8s-demo-app-frontend-7c4bdbf9b-c84m7                            1/1     Running     1          4d4h
k8s-demo-app-frontend-7c4bdbf9b-z8jhx                            1/1     Running     1          4d4h
k8s-demo-app-mysql-754f46dbd7-v85mh                              1/1     Running     2          4d4h
```

2\. It is expected that Trilio for Kubernetes product is installed on the K8s cluster

```bash
kubectl get pods | grep k8s-triliovault
```

```bash
k8s-triliovault-admission-webhook-7c5b454ff4-85t92               1/1     Running     1          4d3h
k8s-triliovault-control-plane-54594f5796-vpz4r                   2/2     Running     2          4d3h
k8s-triliovault-exporter-86d94c9967-bx56s                        1/1     Running     1          4d3h
k8s-triliovault-ingress-gateway-6bdfc75c9b-l5qvb                 1/1     Running     1          4d3h
k8s-triliovault-web-76d84fdcc5-lj52v                             1/1     Running     1          4d3h
k8s-triliovault-web-backend-7fb8784588-8v865                     1/1     Running     1          4d3h
triliovault-operator-k8s-triliovault-operator-66b6d9d96f-k8fjs   1/1     Running     1          4d4h
```

3\. Verify that T4K `License` is applied to initiate the backup

```bash
kubectl get license
```

```bash
NAME             STATUS   MESSAGE                                   CURRENT NODE COUNT   GRACE PERIOD END TIME   EDITION   CAPACITY   EXPIRATION TIME
test-license-1   Active   Cluster License Activated successfully.   9                                            Basic     10         2022-04-30T00:0
```

4\. Make sure that the `Target` to store the backup is configured and is in `Available` state

```bash
kubectl get target
```

```bash
NAME             TYPE          THRESHOLD CAPACITY   VENDOR   STATUS      BROWSING ENABLED
demo-s3-target   ObjectStore   100Gi                AWS      Available   
```

5\. Create a `Backup plan` for the application or namespace

```bash
kubectl apply -f app/mysql-sample-backupplan.yaml
kubectl apply -f app/mysql-sample-backupplan-1.yaml
```

6\. Verify the backup plan is created correctly

```bash
kubectl get backupplan
```

```bash
NAME                       TARGET           RETENTION POLICY   INCREMENTAL SCHEDULE   FULL BACKUP SCHEDULE   STATUS
mysql-label-backupplan     demo-s3-target                                                                    Available
mysql-label-backupplan-1   demo-s3-target                                                                    Available
```

#### Configure and Monitor StormForge Experiment

1. Users need to create a backup, bash script to monitor the backup and tvk-manager configmap.

```bash
kubectl create configmap bashscript --from-file=configmaps/bash-script.sh
kubectl create configmap bashscript1 --from-file=configmaps/bash-script-1.sh

kubectl create configmap backup --from-file=configmaps/mysql-sample-backup.yaml
kubectl create configmap backup1 --from-file=configmaps/mysql-sample-backup-1.yaml

kubectl create configmap tvkmanager --from-file=configmaps/tvk-manager.yaml
```

2\. User has all required entities in place to start the experiment

```bash
kubectl apply -f experiment/tvk-scale-2-backup-delete-parallel-prod.yaml
```

3\. User can monitor the running experiment being performed for each backup.

```bash
kubectl get experiment
```

```bash
NAME                                        STATUS
tvk-scale-400-backup-delete-parallel-prod   Running
```

4\. Check the new `Trial` running by the experiment

```bash
kubectl get trial
```

```bash
NAME                                            STATUS       ASSIGNMENTS                                                                                            VALUES
tvk-scale-400-backup-delete-parallel-prod-000   Setting up   deploymentMemory=512, deploymentCpu=250, metaCpu=500, metaMemory=512, dataMemory=1536, dataCpu=1200    
```

5\. Verify the jobs

```bash
kubectl get jobs
```

```bash
NAME                                            COMPLETIONS   DURATION   AGE
tvk-scale-2-backup-delete-parallel-prod-001          0/1           4d3h       4d3h
tvk-scale-2-backup-delete-parallel-prod-001-create   1/1           51s        4d3h
tvk-scale-2-backup-delete-parallel-prod-001-delete   1/1           6s         4d3h
```

#### Monitor the Experiment from StormForge UI

1. After the experiment is in `running` state, user can view the progress at the [StormForge](https://app.stormforge.io) UI using the login they have configured.

![StormForge Experiment Trial Run Graph](https://content.gitbook.com/content/9sDjF5HJP1bf8TtLcgkk/blobs/z0DVwvPDREh5BU6pq7Z8/StormForge%20Experiment%20Graph%20for%20Trial%20Run.PNG)

2\. Once the experiment run is complete, StormForge experiment will show the recommended configuration with resource details.

![StormForge Recommended Resource Configuration](https://content.gitbook.com/content/9sDjF5HJP1bf8TtLcgkk/blobs/lIY1tdxtX7hzBZY832nF/StormForge%20Experiment%20Recommended%20Configuration.PNG)

#### Delete / Remove StormForge Experiment

1. After the experiment run is complete, user can remove the StormForge redsky-controller-manager with simple commands

* Delete the experiment from kubernetes cluster along with all other resources

```bash
kubectl apply -f experiment/tvk-scale-2-backup-delete-parallel-prod.yaml
```

* Delete the experiment from redsky tenant

```bash
redskyctl delete exp tvk-scale-2-backup-delete-parallel-prod
```

> Note: If you delete the experiment from redsky tenant, it will get deleted from the StormForge UI as well.

2\. If user is facing any issues while running the experiment, they can check logs of redsky-controller-manager

```bash
kubectl logs redsky-controller-manager-6cbb796b79-5wnz8 -n redsky-system
```

### Conclusion

Using StormForge, users can create desired experiments. Users run those experiments on Kubernetes cluster with applications running and perform operations like Backup/Restore. StormForge UI will provide the analysis through a well curated graph. It will show the best to worst combination of resources used to perform the operation in a particular trial. These optimized resource combinations can be applied in the production clusters to achieve the desired RPO/RTO.
