Troubleshooting Guide

The troubleshooting guide describes the the different phases of a backup and recovery process and which logs to check if manually troubleshooting issues.

Deprecated Documentation

This document is deprecated and no longer supported. For accurate, up-to-date information, please refer to the documentation for the latest version of Trilio.

# Troubleshooting Guide

Troubleshooting the Trilio for Kubernetes (T4K) application is no different than troubleshooting any other Kubernetes application. You best friend is obviously kubectl for Kubernetes and oc for OpenShift. The commands are same for both tooling.

Successful Deployment

The following command displays the lists T4K Pods in a successful deployment. Control Plane Pod hosts controllers including Target, BackupPlan, Backup and Restore. Executor Pod includes job controllers that backup and restore controllers create.

$ kubectl get pods -A | grep trilio
openshift-operators      k8s-triliovault-admission-webhook-68494db64c-drsb4     1/1     Running    0  3d9h
openshift-operators      k8s-triliovault-control-plane-cdd4864c4-p2crc          1/1     Running    0  3d9h
openshift-operators      k8s-triliovault-exporter-8598b65b56-rjzlf              1/1     Running    0  3d9h

Make sure other artifacts of the Trilio deployment are in good shape.

#####oc get crds | grep trilio
backupplans.triliovault.trilio.io      2020-04-30T20:07:38Z
backups.triliovault.trilio.io          2020-04-30T20:07:38Z
hooks.triliovault.trilio.io            2020-04-30T20:07:38Z
policies.triliovault.trilio.io         2020-04-30T20:07:38Z
restores.triliovault.trilio.io         2020-04-30T20:07:38Z
targets.triliovault.trilio.io          2020-04-30T20:07:38Z

Backup and Restore Phases

It would be helpful to understand different phases of backup and restore operations and where to find the corresponding logs for the different phases of an operation.

Backup Phases

Snapshot: In this phase, T4K performs the snapshot of the Persistent Volume (PV) using the CSI driver functionality. If the backup fails at this step we can check logs of following T4K pods, making sure the CSI snapshot are working manually.

Upload: In this phase, T4K uploads the data and metadata to the target. T4K creates multiple pods dynamically depending on number of PVs associated with the application .

Retention: This is last phase, of the backup process where T4K validates the retention policy and performs a merge operation on the backup if the purging of backups gets activated based on the retention policy.

If the backup fails log of following pod will help to provide more details:

k8s-triliovault-control-plane-xxxxxxxx

Restore Phases

Validation: In this phase, T4K does the validation check of resource in the namespace where restore operation is specified. If there are any resource with same name that will be getting restored restore will fail

Data Restore: In this phase, T4K creats the PV and then copies the data from the target into the PV which will be attached to pods.

Metadata Restore: In this phase, T4K does the restore of all the resource which were backed up. These can be pods, secret, service etc.

If the restore fails at any step logs from following pod can add more detail

k8s-triliovault-control-plane-xxxxxxxx

Once the restore operation has successfully completed, you can list all components of the application -pods PV's and all the resources to make sure the application is restored.

Troubleshooting through Logs

To troubleshoot a backup or restore issue, first start with displaying backups with following commands.

See BACKUP PHASE column for more details.

master $ kubectl get backup
NAME               APPLICATION               BACKUP TYPE   STATUS       START TIME   BACKUP PHASE   PERCENTAGE COMPLETED
demo-full-backup   backup-job-k8s-demo-app   Full          InProgress   2s           Snapshot
master $ kubectl get backup
NAME               APPLICATION               BACKUP TYPE   STATUS       START TIME   BACKUP PHASE   PERCENTAGE COMPLETED
demo-full-backup   backup-job-k8s-demo-app   Full          InProgress   39s          Upload         30
master $ kubectl describe backup demo-full-backup
Name:         demo-full-backup
Namespace:
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"triliovault.trilio.io/v1alpha1","kind":"Backup","metadata":{"annotations":{},"name":"demo-full-backup"},"spec":{"applicatio...
API Version:  triliovault.trilio.io/v1alpha1
Kind:         Backup
Metadata:
  Creation Timestamp:  2020-03-24T19:15:43Z
  Generation:          1
  Owner References:
    API Version:     triliovault.trilio.io/v1alpha1
    Kind:            Application
    Name:            backup-job-k8s-demo-app
    UID:             6b4f60dd-17cd-4413-85fb-e30952c6cf19
  Resource Version:  1417
  Self Link:         /apis/triliovault.trilio.io/v1alpha1/backups/demo-full-backup
  UID:               50264019-e235-4054-8510-8c5b4947c0f6
Spec:
  Application:
    API Version:       triliovault.trilio.io/v1alpha1
    Kind:              Application
    Name:              backup-job-k8s-demo-app
    Resource Version:  1327
    UID:               6b4f60dd-17cd-4413-85fb-e30952c6cf19
  Schedule Type:       Periodic
  Type:                Full
Status:
  Percentage Completion:  30
  Phase:                  Upload
  Phase Status:           InProgress
  Size:                   0
  Snapshot Content:
    Custom:
      Component:
        Group Version Kind:
          Kind:     Secret
          Version:  v1
        Metadata:
          {"apiVersion":"v1","data":{"password":"dHJpbGlvcGFzcwo="},"kind":"Secret","metadata":{"labels":{"app":"k8s-demo-app","tier":"frontend"},"name":"mysql-pass","namespace":"default"},"type":"Opaque"}

        Group Version Kind:
          Kind:     Service
          Version:  v1
        Metadata:
          {"apiVersion":"v1","kind":"Service","metadata":{"labels":{"app":"k8s-demo-app","tier":"frontend"},"name":"k8s-demo-app-frontend","namespace":"default"},"spec":{"ports":[{"name":"web","port":80,"protocol":"TCP","targetPort":80}],"selector":{"app":"k8s-demo-app","tier":"frontend"},"sessionAffinity":"None","type":"ClusterIP"}}

          {"apiVersion":"v1","kind":"Service","metadata":{"labels":{"app":"k8s-demo-app","tier":"mysql"},"name":"k8s-demo-app-mysql","namespace":"default"},"spec":{"ports":[{"port":3306,"protocol":"TCP","targetPort":3306}],"selector":{"app":"k8s-demo-app","tier":"mysql"},"sessionAffinity":"None","type":"ClusterIP"}}

        Group Version Kind:
          Group:    apps
          Kind:     Deployment
          Version:  v1
        Metadata:
          {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","tier":"frontend"},"name":"k8s-demo-app-frontend","namespace":"default"},"spec":{"progressDeadlineSeconds":600,"replicas":3,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app":"k8s-demo-app","tier":"frontend"}},"strategy":{"rollingUpdate":{"maxSurge":"25%","maxUnavailable":"25%"},"type":"RollingUpdate"},"template":{"metadata":{"labels":{"app":"k8s-demo-app","tier":"frontend"}},"spec":{"containers":[{"image":"docker.io/trilio/k8s-demo-app:v1","imagePullPolicy":"IfNotPresent","name":"demoapp-frontend","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30}}}}

          {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","tier":"mysql"},"name":"k8s-demo-app-mysql","namespace":"default"},"spec":{"progressDeadlineSeconds":600,"replicas":1,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app":"k8s-demo-app","tier":"mysql"}},"strategy":{"type":"Recreate"},"template":{"metadata":{"labels":{"app":"k8s-demo-app","tier":"mysql"}},"spec":{"containers":[{"env":[{"name":"MYSQL_ROOT_PASSWORD","valueFrom":{"secretKeyRef":{"key":"password","name":"mysql-pass"}}}],"image":"mysql:5.6","imagePullPolicy":"IfNotPresent","name":"mysql","ports":[{"containerPort":3306,"name":"mysql","protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/var/lib/mysql","name":"mysql-persistent-storage"}]}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30,"volumes":[{"name":"mysql-persistent-storage","persistentVolumeClaim":{"claimName":"mysql-pv-claim"}}]}}}}

        Group Version Kind:
          Group:    apps
          Kind:     ReplicaSet
          Version:  v1
        Metadata:
          {"apiVersion":"apps/v1","kind":"ReplicaSet","metadata":{"annotations":{"deployment.kubernetes.io/desired-replicas":"3","deployment.kubernetes.io/max-replicas":"4","deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","pod-template-hash":"6544df7845","tier":"frontend"},"name":"k8s-demo-app-frontend-6544df7845","namespace":"default","ownerReferences":[{"apiVersion":"apps/v1","blockOwnerDeletion":true,"controller":true,"kind":"Deployment","name":"k8s-demo-app-frontend","uid":"63c2334c-9e48-4119-b333-4bd47a3f824e"}]},"spec":{"replicas":3,"selector":{"matchLabels":{"app":"k8s-demo-app","pod-template-hash":"6544df7845","tier":"frontend"}},"template":{"metadata":{"labels":{"app":"k8s-demo-app","pod-template-hash":"6544df7845","tier":"frontend"}},"spec":{"containers":[{"image":"docker.io/trilio/k8s-demo-app:v1","imagePullPolicy":"IfNotPresent","name":"demoapp-frontend","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30}}}}

          {"apiVersion":"apps/v1","kind":"ReplicaSet","metadata":{"annotations":{"deployment.kubernetes.io/desired-replicas":"1","deployment.kubernetes.io/max-replicas":"1","deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","pod-template-hash":"765495d764","tier":"mysql"},"name":"k8s-demo-app-mysql-765495d764","namespace":"default","ownerReferences":[{"apiVersion":"apps/v1","blockOwnerDeletion":true,"controller":true,"kind":"Deployment","name":"k8s-demo-app-mysql","uid":"8a5e4484-8b5a-4a96-9957-14dd502cb5b6"}]},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"k8s-demo-app","pod-template-hash":"765495d764","tier":"mysql"}},"template":{"metadata":{"labels":{"app":"k8s-demo-app","pod-template-hash":"765495d764","tier":"mysql"}},"spec":{"containers":[{"env":[{"name":"MYSQL_ROOT_PASSWORD","valueFrom":{"secretKeyRef":{"key":"password","name":"mysql-pass"}}}],"image":"mysql:5.6","imagePullPolicy":"IfNotPresent","name":"mysql","ports":[{"containerPort":3306,"name":"mysql","protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/var/lib/mysql","name":"mysql-persistent-storage"}]}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30,"volumes":[{"name":"mysql-persistent-storage","persistentVolumeClaim":{"claimName":"mysql-pv-claim"}}]}}}}

      Data Snapshot:
        Persistent Volume Claim Metadata:  {"kind":"PersistentVolumeClaim","apiVersion":"v1","metadata":{"name":"mysql-pv-claim","namespace":"default","selfLink":"/api/v1/namespaces/default/persistentvolumeclaims/mysql-pv-claim","uid":"05a835dc-045d-422c-8940-0a4b4917fa44","resourceVersion":"1100","creationTimestamp":"2020-03-24T19:13:38Z","labels":{"app":"k8s-demo-app","tier":"mysql"},"annotations":{"pv.kubernetes.io/bind-completed":"yes","pv.kubernetes.io/bound-by-controller":"yes","volume.beta.kubernetes.io/storage-provisioner":"hostpath.csi.k8s.io"},"finalizers":["kubernetes.io/pvc-protection"]},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"5Gi"}},"volumeName":"pvc-05a835dc-045d-422c-8940-0a4b4917fa44","storageClassName":"csi-hostpath-sc","volumeMode":"Filesystem"},"status":{"phase":"Bound","accessModes":["ReadWriteOnce"],"capacity":{"storage":"5Gi"}}}
        Persistent Volume Claim Name:      mysql-pv-claim
        Pod Containers Map:
          Containers:
            mysql
          Pod Name:     k8s-demo-app-mysql-765495d764-56f6b
        Size:           0
        Snapshot Size:  0
        Volume Snapshot:
          Retry Count:  1
          Status:       Completed
          Volume Snapshot:
            API Version:       snapshot.storage.k8s.io/v1alpha1
            Kind:              VolumeSnapshot
            Name:              mysql-pv-claim-f727ec86-f3a1-4eff-83b9-99a2e30d331e
            Namespace:         default
            Resource Version:  1352
            UID:               10bc3f16-3ad5-4534-86e4-f556e8d932be
  Start Timestamp:             2020-03-24T19:15:43Z
  Status:                      InProgress
Events:
  Type     Reason              Age   From               Message
  ----     ------              ----  ----               -------
  Warning  BackupUpdateFailed  68s   backup-controller  Updating Backup: demo-full-backup, Failed%!(EXTRA string=)
master $ kubectl get backup
NAME               APPLICATION               BACKUP TYPE   STATUS      START TIME   BACKUP PHASE   PERCENTAGE COMPLETED
demo-full-backup   backup-job-k8s-demo-app   Full          Completed   119s         Retention      100

If the backup phase or restore phase is in Validation, then you need to look into the Control Plane Pod for root cause.

Log collector

You can use this process to collect the logs and send it to Trilio Team for further analysis on the issue. This script will create triliovault-<date-time>.zip zip file containing cluster debugging information.

Pre-requisite : python >= 3.6

pip3 install k8s-triliovault-logcollector --extra-index-url https://pypi.fury.io/k8s-triliovault/

log_collector.py

Optional arguments:

Last updated