The troubleshooting guide describes the the different phases of a backup and recovery process and which logs to check if manually troubleshooting issues.
Troubleshooting the Trilio for Kubernetes (T4K) application is no different than troubleshooting any other Kubernetes application. You best friend is obviously kubectl for Kubernetes and oc for OpenShift. The commands are same for both tooling.
Successful Deployment
The following command displays the lists T4K Pods in a successful deployment. Control Plane Pod hosts controllers including Target, BackupPlan, Backup and Restore. Executor Pod includes job controllers that backup and restore controllers create.
It would be helpful to understand different phases of backup and restore operations and where to find the corresponding logs for the different phases of an operation.
Backup Phases
Snapshot: In this phase, T4K performs the snapshot of the Persistent Volume (PV) using the CSI driver functionality. If the backup fails at this step we can check logs of following T4K pods, making sure the CSI snapshot are working manually.
_Upload: _ In this phase, T4K uploads the data and metadata to the target. T4K creates multiple pods dynamically depending on number of PVs associated with the application .
Retention: This is last phase, of the backup process where T4K validates the retention policy and performs a merge operation on the backup if the purging of backups gets activated based on the retention policy.
Restore Phases
Validation: In this phase, T4K does the validation check of resource in the namespace where restore operation is specified. If there are any resource with same name that will be getting restored restore will fail
Data Restore: In this phase, T4K creats the PV and then copies the data from the target into the PV which will be attached to pods.
Metadata Restore: In this phase, T4K does the restore of all the resource which were backed up. These can be pods, secret, service etc.
Once the restore operation has successfully completed, you can list all components of the application -pods PV's and all the resources to make sure the application is restored.
Troubleshooting through Logs
To troubleshoot a backup or restore issue, first start with displaying backups with following commands.
See BACKUP PHASE column for more details.
master $ kubectl get backup
NAME APPLICATION BACKUP TYPE STATUS START TIME BACKUP PHASE PERCENTAGE COMPLETED
demo-full-backup backup-job-k8s-demo-app Full InProgress 2s Snapshot
master $ kubectl get backup
NAME APPLICATION BACKUP TYPE STATUS START TIME BACKUP PHASE PERCENTAGE COMPLETED
demo-full-backup backup-job-k8s-demo-app Full InProgress 39s Upload 30
master $ kubectl describe backup demo-full-backup
Name: demo-full-backup
Namespace:
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"triliovault.trilio.io/v1alpha1","kind":"Backup","metadata":{"annotations":{},"name":"demo-full-backup"},"spec":{"applicatio...
API Version: triliovault.trilio.io/v1alpha1
Kind: Backup
Metadata:
Creation Timestamp: 2020-03-24T19:15:43Z
Generation: 1
Owner References:
API Version: triliovault.trilio.io/v1alpha1
Kind: Application
Name: backup-job-k8s-demo-app
UID: 6b4f60dd-17cd-4413-85fb-e30952c6cf19
Resource Version: 1417
Self Link: /apis/triliovault.trilio.io/v1alpha1/backups/demo-full-backup
UID: 50264019-e235-4054-8510-8c5b4947c0f6
Spec:
Application:
API Version: triliovault.trilio.io/v1alpha1
Kind: Application
Name: backup-job-k8s-demo-app
Resource Version: 1327
UID: 6b4f60dd-17cd-4413-85fb-e30952c6cf19
Schedule Type: Periodic
Type: Full
Status:
Percentage Completion: 30
Phase: Upload
Phase Status: InProgress
Size: 0
Snapshot Content:
Custom:
Component:
Group Version Kind:
Kind: Secret
Version: v1
Metadata:
{"apiVersion":"v1","data":{"password":"dHJpbGlvcGFzcwo="},"kind":"Secret","metadata":{"labels":{"app":"k8s-demo-app","tier":"frontend"},"name":"mysql-pass","namespace":"default"},"type":"Opaque"}
Group Version Kind:
Kind: Service
Version: v1
Metadata:
{"apiVersion":"v1","kind":"Service","metadata":{"labels":{"app":"k8s-demo-app","tier":"frontend"},"name":"k8s-demo-app-frontend","namespace":"default"},"spec":{"ports":[{"name":"web","port":80,"protocol":"TCP","targetPort":80}],"selector":{"app":"k8s-demo-app","tier":"frontend"},"sessionAffinity":"None","type":"ClusterIP"}}
{"apiVersion":"v1","kind":"Service","metadata":{"labels":{"app":"k8s-demo-app","tier":"mysql"},"name":"k8s-demo-app-mysql","namespace":"default"},"spec":{"ports":[{"port":3306,"protocol":"TCP","targetPort":3306}],"selector":{"app":"k8s-demo-app","tier":"mysql"},"sessionAffinity":"None","type":"ClusterIP"}}
Group Version Kind:
Group: apps
Kind: Deployment
Version: v1
Metadata:
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","tier":"frontend"},"name":"k8s-demo-app-frontend","namespace":"default"},"spec":{"progressDeadlineSeconds":600,"replicas":3,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app":"k8s-demo-app","tier":"frontend"}},"strategy":{"rollingUpdate":{"maxSurge":"25%","maxUnavailable":"25%"},"type":"RollingUpdate"},"template":{"metadata":{"labels":{"app":"k8s-demo-app","tier":"frontend"}},"spec":{"containers":[{"image":"docker.io/trilio/k8s-demo-app:v1","imagePullPolicy":"IfNotPresent","name":"demoapp-frontend","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30}}}}
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","tier":"mysql"},"name":"k8s-demo-app-mysql","namespace":"default"},"spec":{"progressDeadlineSeconds":600,"replicas":1,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app":"k8s-demo-app","tier":"mysql"}},"strategy":{"type":"Recreate"},"template":{"metadata":{"labels":{"app":"k8s-demo-app","tier":"mysql"}},"spec":{"containers":[{"env":[{"name":"MYSQL_ROOT_PASSWORD","valueFrom":{"secretKeyRef":{"key":"password","name":"mysql-pass"}}}],"image":"mysql:5.6","imagePullPolicy":"IfNotPresent","name":"mysql","ports":[{"containerPort":3306,"name":"mysql","protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/var/lib/mysql","name":"mysql-persistent-storage"}]}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30,"volumes":[{"name":"mysql-persistent-storage","persistentVolumeClaim":{"claimName":"mysql-pv-claim"}}]}}}}
Group Version Kind:
Group: apps
Kind: ReplicaSet
Version: v1
Metadata:
{"apiVersion":"apps/v1","kind":"ReplicaSet","metadata":{"annotations":{"deployment.kubernetes.io/desired-replicas":"3","deployment.kubernetes.io/max-replicas":"4","deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","pod-template-hash":"6544df7845","tier":"frontend"},"name":"k8s-demo-app-frontend-6544df7845","namespace":"default","ownerReferences":[{"apiVersion":"apps/v1","blockOwnerDeletion":true,"controller":true,"kind":"Deployment","name":"k8s-demo-app-frontend","uid":"63c2334c-9e48-4119-b333-4bd47a3f824e"}]},"spec":{"replicas":3,"selector":{"matchLabels":{"app":"k8s-demo-app","pod-template-hash":"6544df7845","tier":"frontend"}},"template":{"metadata":{"labels":{"app":"k8s-demo-app","pod-template-hash":"6544df7845","tier":"frontend"}},"spec":{"containers":[{"image":"docker.io/trilio/k8s-demo-app:v1","imagePullPolicy":"IfNotPresent","name":"demoapp-frontend","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30}}}}
{"apiVersion":"apps/v1","kind":"ReplicaSet","metadata":{"annotations":{"deployment.kubernetes.io/desired-replicas":"1","deployment.kubernetes.io/max-replicas":"1","deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","pod-template-hash":"765495d764","tier":"mysql"},"name":"k8s-demo-app-mysql-765495d764","namespace":"default","ownerReferences":[{"apiVersion":"apps/v1","blockOwnerDeletion":true,"controller":true,"kind":"Deployment","name":"k8s-demo-app-mysql","uid":"8a5e4484-8b5a-4a96-9957-14dd502cb5b6"}]},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"k8s-demo-app","pod-template-hash":"765495d764","tier":"mysql"}},"template":{"metadata":{"labels":{"app":"k8s-demo-app","pod-template-hash":"765495d764","tier":"mysql"}},"spec":{"containers":[{"env":[{"name":"MYSQL_ROOT_PASSWORD","valueFrom":{"secretKeyRef":{"key":"password","name":"mysql-pass"}}}],"image":"mysql:5.6","imagePullPolicy":"IfNotPresent","name":"mysql","ports":[{"containerPort":3306,"name":"mysql","protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/var/lib/mysql","name":"mysql-persistent-storage"}]}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30,"volumes":[{"name":"mysql-persistent-storage","persistentVolumeClaim":{"claimName":"mysql-pv-claim"}}]}}}}
Data Snapshot:
Persistent Volume Claim Metadata: {"kind":"PersistentVolumeClaim","apiVersion":"v1","metadata":{"name":"mysql-pv-claim","namespace":"default","selfLink":"/api/v1/namespaces/default/persistentvolumeclaims/mysql-pv-claim","uid":"05a835dc-045d-422c-8940-0a4b4917fa44","resourceVersion":"1100","creationTimestamp":"2020-03-24T19:13:38Z","labels":{"app":"k8s-demo-app","tier":"mysql"},"annotations":{"pv.kubernetes.io/bind-completed":"yes","pv.kubernetes.io/bound-by-controller":"yes","volume.beta.kubernetes.io/storage-provisioner":"hostpath.csi.k8s.io"},"finalizers":["kubernetes.io/pvc-protection"]},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"5Gi"}},"volumeName":"pvc-05a835dc-045d-422c-8940-0a4b4917fa44","storageClassName":"csi-hostpath-sc","volumeMode":"Filesystem"},"status":{"phase":"Bound","accessModes":["ReadWriteOnce"],"capacity":{"storage":"5Gi"}}}
Persistent Volume Claim Name: mysql-pv-claim
Pod Containers Map:
Containers:
mysql
Pod Name: k8s-demo-app-mysql-765495d764-56f6b
Size: 0
Snapshot Size: 0
Volume Snapshot:
Retry Count: 1
Status: Completed
Volume Snapshot:
API Version: snapshot.storage.k8s.io/v1alpha1
Kind: VolumeSnapshot
Name: mysql-pv-claim-f727ec86-f3a1-4eff-83b9-99a2e30d331e
Namespace: default
Resource Version: 1352
UID: 10bc3f16-3ad5-4534-86e4-f556e8d932be
Start Timestamp: 2020-03-24T19:15:43Z
Status: InProgress
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackupUpdateFailed 68s backup-controller Updating Backup: demo-full-backup, Failed%!(EXTRA string=)
master $ kubectl get backup
NAME APPLICATION BACKUP TYPE STATUS START TIME BACKUP PHASE PERCENTAGE COMPLETED
demo-full-backup backup-job-k8s-demo-app Full Completed 119s Retention 100
If the backup phase or restore phase is in Validation, then you need to look into the Control Plane Pod for the root cause.
Log collector
You can refer to the Log Collection page, collect logs and send it to the Trilio Team for further analysis of the issue.