Search…
Troubleshooting Guide
The troubleshooting guide describes the the different phases of a backup and recovery process and which logs to check if manually troubleshooting issues.
Troubleshooting the TrilioVault for Kubernetes (TVK) application is no different than troubleshooting any other Kubernetes application. You best friend is obviously kubectl for Kubernetes and oc for OpenShift. The commands are same for both tooling.

Successful Deployment

The following command displays the lists TVK Pods in a successful deployment. Control Plane Pod hosts controllers including Target, BackupPlan, Backup and Restore. Executor Pod includes job controllers that backup and restore controllers create.
1
$ kubectl get pods -A | grep trilio
2
openshift-operators k8s-triliovault-admission-webhook-68494db64c-drsb4 1/1 Running 0 3d9h
3
openshift-operators k8s-triliovault-control-plane-cdd4864c4-p2crc 1/1 Running 0 3d9h
4
openshift-operators k8s-triliovault-exporter-8598b65b56-rjzlf 1/1 Running 0 3d9h
5
Copied!
Make sure other artifacts of the TrilioVault deployment are in good shape.
1
#####oc get crds | grep trilio
2
backupplans.triliovault.trilio.io 2020-04-30T20:07:38Z
3
backups.triliovault.trilio.io 2020-04-30T20:07:38Z
4
hooks.triliovault.trilio.io 2020-04-30T20:07:38Z
5
policies.triliovault.trilio.io 2020-04-30T20:07:38Z
6
restores.triliovault.trilio.io 2020-04-30T20:07:38Z
7
targets.triliovault.trilio.io 2020-04-30T20:07:38Z
8
Copied!

Backup and Restore Phases

It would be helpful to understand different phases of backup and restore operations and where to find the corresponding logs for the different phases of an operation.

Backup Phases

Snapshot: In this phase, TVK performs the snapshot of the Persistent Volume (PV) using the CSI driver functionality. If the backup fails at this step we can check logs of following TVK pods, making sure the CSI snapshot are working manually.
Upload: In this phase, TVK uploads the data and metadata to the target. TVK creates multiple pods dynamically depending on number of PVs associated with the application .
Retention: This is last phase, of the backup process where TVK validates the retention policy and performs a merge operation on the backup if the purging of backups gets activated based on the retention policy.
If the backup fails log of following pod will help to provide more details:
1
k8s-triliovault-control-plane-xxxxxxxx
Copied!

Restore Phases

Validation: In this phase, TVK does the validation check of resource in the namespace where restore operation is specified. If there are any resource with same name that will be getting restored restore will fail
Data Restore: In this phase, TVK creats the PV and then copies the data from the target into the PV which will be attached to pods.
Metadata Restore: In this phase, TVK does the restore of all the resource which were backed up. These can be pods, secret, service etc.
If the restore fails at any step logs from following pod can add more detail
1
k8s-triliovault-control-plane-xxxxxxxx
Copied!
Once the restore operation has successfully completed, you can list all components of the application -pods PV's and all the resources to make sure the application is restored.

Troubleshooting through Logs

To troubleshoot a backup or restore issue, first start with displaying backups with following commands.
See BACKUP PHASE column for more details.
1
master $ kubectl get backup
2
NAME APPLICATION BACKUP TYPE STATUS START TIME BACKUP PHASE PERCENTAGE COMPLETED
3
demo-full-backup backup-job-k8s-demo-app Full InProgress 2s Snapshot
Copied!
1
master $ kubectl get backup
2
NAME APPLICATION BACKUP TYPE STATUS START TIME BACKUP PHASE PERCENTAGE COMPLETED
3
demo-full-backup backup-job-k8s-demo-app Full InProgress 39s Upload 30
Copied!
1
master $ kubectl describe backup demo-full-backup
2
Name: demo-full-backup
3
Namespace:
4
Labels: <none>
5
Annotations: kubectl.kubernetes.io/last-applied-configuration:
6
{"apiVersion":"triliovault.trilio.io/v1alpha1","kind":"Backup","metadata":{"annotations":{},"name":"demo-full-backup"},"spec":{"applicatio...
7
API Version: triliovault.trilio.io/v1alpha1
8
Kind: Backup
9
Metadata:
10
Creation Timestamp: 2020-03-24T19:15:43Z
11
Generation: 1
12
Owner References:
13
API Version: triliovault.trilio.io/v1alpha1
14
Kind: Application
15
Name: backup-job-k8s-demo-app
16
UID: 6b4f60dd-17cd-4413-85fb-e30952c6cf19
17
Resource Version: 1417
18
Self Link: /apis/triliovault.trilio.io/v1alpha1/backups/demo-full-backup
19
UID: 50264019-e235-4054-8510-8c5b4947c0f6
20
Spec:
21
Application:
22
API Version: triliovault.trilio.io/v1alpha1
23
Kind: Application
24
Name: backup-job-k8s-demo-app
25
Resource Version: 1327
26
UID: 6b4f60dd-17cd-4413-85fb-e30952c6cf19
27
Schedule Type: Periodic
28
Type: Full
29
Status:
30
Percentage Completion: 30
31
Phase: Upload
32
Phase Status: InProgress
33
Size: 0
34
Snapshot Content:
35
Custom:
36
Component:
37
Group Version Kind:
38
Kind: Secret
39
Version: v1
40
Metadata:
41
{"apiVersion":"v1","data":{"password":"dHJpbGlvcGFzcwo="},"kind":"Secret","metadata":{"labels":{"app":"k8s-demo-app","tier":"frontend"},"name":"mysql-pass","namespace":"default"},"type":"Opaque"}
42
43
Group Version Kind:
44
Kind: Service
45
Version: v1
46
Metadata:
47
{"apiVersion":"v1","kind":"Service","metadata":{"labels":{"app":"k8s-demo-app","tier":"frontend"},"name":"k8s-demo-app-frontend","namespace":"default"},"spec":{"ports":[{"name":"web","port":80,"protocol":"TCP","targetPort":80}],"selector":{"app":"k8s-demo-app","tier":"frontend"},"sessionAffinity":"None","type":"ClusterIP"}}
48
49
{"apiVersion":"v1","kind":"Service","metadata":{"labels":{"app":"k8s-demo-app","tier":"mysql"},"name":"k8s-demo-app-mysql","namespace":"default"},"spec":{"ports":[{"port":3306,"protocol":"TCP","targetPort":3306}],"selector":{"app":"k8s-demo-app","tier":"mysql"},"sessionAffinity":"None","type":"ClusterIP"}}
50
51
Group Version Kind:
52
Group: apps
53
Kind: Deployment
54
Version: v1
55
Metadata:
56
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","tier":"frontend"},"name":"k8s-demo-app-frontend","namespace":"default"},"spec":{"progressDeadlineSeconds":600,"replicas":3,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app":"k8s-demo-app","tier":"frontend"}},"strategy":{"rollingUpdate":{"maxSurge":"25%","maxUnavailable":"25%"},"type":"RollingUpdate"},"template":{"metadata":{"labels":{"app":"k8s-demo-app","tier":"frontend"}},"spec":{"containers":[{"image":"docker.io/trilio/k8s-demo-app:v1","imagePullPolicy":"IfNotPresent","name":"demoapp-frontend","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30}}}}
57
58
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","tier":"mysql"},"name":"k8s-demo-app-mysql","namespace":"default"},"spec":{"progressDeadlineSeconds":600,"replicas":1,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app":"k8s-demo-app","tier":"mysql"}},"strategy":{"type":"Recreate"},"template":{"metadata":{"labels":{"app":"k8s-demo-app","tier":"mysql"}},"spec":{"containers":[{"env":[{"name":"MYSQL_ROOT_PASSWORD","valueFrom":{"secretKeyRef":{"key":"password","name":"mysql-pass"}}}],"image":"mysql:5.6","imagePullPolicy":"IfNotPresent","name":"mysql","ports":[{"containerPort":3306,"name":"mysql","protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/var/lib/mysql","name":"mysql-persistent-storage"}]}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30,"volumes":[{"name":"mysql-persistent-storage","persistentVolumeClaim":{"claimName":"mysql-pv-claim"}}]}}}}
59
60
Group Version Kind:
61
Group: apps
62
Kind: ReplicaSet
63
Version: v1
64
Metadata:
65
{"apiVersion":"apps/v1","kind":"ReplicaSet","metadata":{"annotations":{"deployment.kubernetes.io/desired-replicas":"3","deployment.kubernetes.io/max-replicas":"4","deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","pod-template-hash":"6544df7845","tier":"frontend"},"name":"k8s-demo-app-frontend-6544df7845","namespace":"default","ownerReferences":[{"apiVersion":"apps/v1","blockOwnerDeletion":true,"controller":true,"kind":"Deployment","name":"k8s-demo-app-frontend","uid":"63c2334c-9e48-4119-b333-4bd47a3f824e"}]},"spec":{"replicas":3,"selector":{"matchLabels":{"app":"k8s-demo-app","pod-template-hash":"6544df7845","tier":"frontend"}},"template":{"metadata":{"labels":{"app":"k8s-demo-app","pod-template-hash":"6544df7845","tier":"frontend"}},"spec":{"containers":[{"image":"docker.io/trilio/k8s-demo-app:v1","imagePullPolicy":"IfNotPresent","name":"demoapp-frontend","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30}}}}
66
67
{"apiVersion":"apps/v1","kind":"ReplicaSet","metadata":{"annotations":{"deployment.kubernetes.io/desired-replicas":"1","deployment.kubernetes.io/max-replicas":"1","deployment.kubernetes.io/revision":"1"},"labels":{"app":"k8s-demo-app","pod-template-hash":"765495d764","tier":"mysql"},"name":"k8s-demo-app-mysql-765495d764","namespace":"default","ownerReferences":[{"apiVersion":"apps/v1","blockOwnerDeletion":true,"controller":true,"kind":"Deployment","name":"k8s-demo-app-mysql","uid":"8a5e4484-8b5a-4a96-9957-14dd502cb5b6"}]},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"k8s-demo-app","pod-template-hash":"765495d764","tier":"mysql"}},"template":{"metadata":{"labels":{"app":"k8s-demo-app","pod-template-hash":"765495d764","tier":"mysql"}},"spec":{"containers":[{"env":[{"name":"MYSQL_ROOT_PASSWORD","valueFrom":{"secretKeyRef":{"key":"password","name":"mysql-pass"}}}],"image":"mysql:5.6","imagePullPolicy":"IfNotPresent","name":"mysql","ports":[{"containerPort":3306,"name":"mysql","protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/var/lib/mysql","name":"mysql-persistent-storage"}]}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"terminationGracePeriodSeconds":30,"volumes":[{"name":"mysql-persistent-storage","persistentVolumeClaim":{"claimName":"mysql-pv-claim"}}]}}}}
68
69
Data Snapshot:
70
Persistent Volume Claim Metadata: {"kind":"PersistentVolumeClaim","apiVersion":"v1","metadata":{"name":"mysql-pv-claim","namespace":"default","selfLink":"/api/v1/namespaces/default/persistentvolumeclaims/mysql-pv-claim","uid":"05a835dc-045d-422c-8940-0a4b4917fa44","resourceVersion":"1100","creationTimestamp":"2020-03-24T19:13:38Z","labels":{"app":"k8s-demo-app","tier":"mysql"},"annotations":{"pv.kubernetes.io/bind-completed":"yes","pv.kubernetes.io/bound-by-controller":"yes","volume.beta.kubernetes.io/storage-provisioner":"hostpath.csi.k8s.io"},"finalizers":["kubernetes.io/pvc-protection"]},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"5Gi"}},"volumeName":"pvc-05a835dc-045d-422c-8940-0a4b4917fa44","storageClassName":"csi-hostpath-sc","volumeMode":"Filesystem"},"status":{"phase":"Bound","accessModes":["ReadWriteOnce"],"capacity":{"storage":"5Gi"}}}
71
Persistent Volume Claim Name: mysql-pv-claim
72
Pod Containers Map:
73
Containers:
74
mysql
75
Pod Name: k8s-demo-app-mysql-765495d764-56f6b
76
Size: 0
77
Snapshot Size: 0
78
Volume Snapshot:
79
Retry Count: 1
80
Status: Completed
81
Volume Snapshot:
82
API Version: snapshot.storage.k8s.io/v1alpha1
83
Kind: VolumeSnapshot
84
Name: mysql-pv-claim-f727ec86-f3a1-4eff-83b9-99a2e30d331e
85
Namespace: default
86
Resource Version: 1352
87
UID: 10bc3f16-3ad5-4534-86e4-f556e8d932be
88
Start Timestamp: 2020-03-24T19:15:43Z
89
Status: InProgress
90
Events:
91
Type Reason Age From Message
92
---- ------ ---- ---- -------
93
Warning BackupUpdateFailed 68s backup-controller Updating Backup: demo-full-backup, Failed%!(EXTRA string=)
Copied!
1
master $ kubectl get backup
2
NAME APPLICATION BACKUP TYPE STATUS START TIME BACKUP PHASE PERCENTAGE COMPLETED
3
demo-full-backup backup-job-k8s-demo-app Full Completed 119s Retention 100
Copied!
If the backup phase or restore phase is in Validation, then you need to look into the Control Plane Pod for root cause.

Log collector

You can refer to the Log Collection page, collect logs and send it to the Trilio Team for further analysis of the issue.
Last modified 1mo ago