LogoLogo
5.0.X
5.0.X
  • About Trilio for Kubernetes
    • Welcome to Trilio For Kubernetes
    • Version 5.0.X Release Highlights
    • Compatibility Matrix
    • Marketplace Support
    • Features
    • Use Cases
  • Getting Started
    • Getting Started with Trilio on Red Hat OpenShift (OCP)
    • Getting Started with Trilio for Upstream Kubernetes (K8S)
    • Getting Started with Trilio for AWS Elastic Kubernetes Service (EKS)
    • Getting Started with Trilio on Google Kubernetes Engine (GKE)
    • Getting Started with Trilio on VMware Tanzu Kubernetes Grid (TKG)
    • More Trilio Supported Kubernetes Distributions
      • General Installation Prerequisites
      • Rancher Deployments
      • Azure Cloud AKS
      • Digital Ocean Cloud
      • Mirantis Kubernetes Engine
      • IBM Cloud
    • Licensing
    • Using Trilio
      • Overview
      • Post-Install Configuration
      • Management Console
        • About the UI
        • Navigating the UI
          • UI Login
          • Cluster Management (Home)
          • Backup & Recovery
            • Namespaces
              • Namespaces - Actions
              • Namespaces - Bulk Actions
            • Applications
              • Applications - Actions
              • Applications - Bulk Actions
            • Virtual Machines
              • Virtual Machine -Actions
              • Virtual Machine - Bulk Actions
            • Backup Plans
              • Create Backup Plans
              • Backup Plans - Actions
            • Targets
              • Create New Target
              • Targets - Actions
            • Hooks
              • Create Hook
              • Hooks - Actions
            • Policies
              • Create Policies
              • Policies - Actions
          • Monitoring
          • Guided Tours
        • UI How-to Guides
          • Multi-Cluster Management
          • Creating Backups
            • Pause Schedule Backups and Snapshots
            • Cancel InProgress Backups
            • Cleanup Failed Backups
          • Restoring Backups & Snapshots
            • Cross-Cluster Restores
            • Namespace & application scoped
            • Cluster scoped
          • Disaster Recovery Plan
          • Continuous Restore
      • Command-Line Interface
        • YAML Examples
        • Trilio Helm Operator Values
    • Upgrade
    • Air-Gapped Installations
    • Uninstall
  • Reference Guides
    • T4K Pod/Job Capabilities
      • Resource Quotas
    • Trilio Operator API Specifications
    • Custom Resource Definition - Application
  • Advanced Configuration
    • AWS S3 Target Permissions
    • Management Console
      • KubeConfig Authenticaton
      • Authentication Methods Via Dex
      • UI Authentication
      • RBAC Authentication
      • Configuring the UI
    • Resource Request Requirements
      • Fine Tuning Resource Requests and Limits
    • Observability
      • Observability of Trilio with Prometheus and Grafana
      • Exported Prometheus Metrics
      • Observability of Trilio with Openshift Monitoring
      • T4K Integration with Observability Stack
    • Modifying Default T4K Configuration
  • T4K Concepts
    • Supported Application Types
    • Support for Helm Releases
    • Support for OpenShift Operators
    • T4K Components
    • Backup and Restore Details
      • Immutable Backups
      • Application Centric Backups
    • Retention Process
      • Retention Use Case
    • Continuous Restore
      • Architecture and Concepts
  • Performance
    • S3 as Backup Target
      • T4K S3 Fuse Plugin performance
    • Measuring Backup Performance
  • Ecosystem
    • T4K Integration with Slack using BotKube
    • Monitoring T4K Logs using ELK Stack
    • Rancher Navigation Links for Trilio Management Console
    • Optimize T4K Backups with StormForge
    • T4K GitHub Runner
    • AWS RDS snapshots using T4K hooks
    • Deploying Trilio For Kubernetes with Openshift ACM Policies
  • Krew Plugins
    • T4K QuickStart Plugin
    • Trilio for Kubernetes Preflight Checks Plugin
    • T4K Log Collector Plugin
    • T4K Cleanup Plugin
  • Support
    • Troubleshooting Guide
    • Known Issues and Workarounds
    • Contacting Support
  • Appendix
    • Ignored Resources
    • OpenSource Software Disclosure
    • CSI Drivers
      • Installing VolumeSnapshot CRDs
      • Install AWS EBS CSI Driver
    • T4K Product Quickview
    • OpenShift OperatorHub Custom CatalogSource
      • Custom CatalogSource in a restricted environment
    • Configure OVH Object Storage as a Target
    • Connect T4K UI hosted with HTTPS to another cluster hosted with HTTP or vice versa
    • Fetch DigitalOcean Kubernetes Cluster kubeconfig for T4K UI Authentication
    • Force Update T4K Operator in Rancher Marketplace
    • Backup and Restore Virtual Machines running on OpenShift
    • T4K For Volumes with Generic Storage
    • T4K Best Practices
Powered by GitBook
On this page
  • Troubleshooting Guide
  • Successful Deployment
  • Troubleshooting through Logs
  • Log collector

Was this helpful?

  1. Support

Troubleshooting Guide

The troubleshooting guide describes the the different phases of a backup and recovery process and which logs to check if manually troubleshooting issues.

Troubleshooting Guide

Troubleshooting the Trilio for Kubernetes (T4K) application is no different than troubleshooting any other Kubernetes application. You best friend is obviously kubectl for Kubernetes and oc for OpenShift. The commands are same for both tooling.

Successful Deployment

The following command displays the lists T4K Pods in a successful deployment. Control Plane Pod hosts controllers including Target, BackupPlan, Backup and Restore. Executor Pod includes job controllers that backup and restore controllers create.

$ kubectl get pods -A | grep trilio
trilio-system                                      k8s-triliovault-admission-webhook-59bf44976-bvm4v                                          1/1     Running     0               25h
trilio-system                                      k8s-triliovault-control-plane-5769c9c965-k2jd6                                             2/2     Running     0               25h
trilio-system                                      k8s-triliovault-dex-586bcc8f9-td8gq                                                        1/1     Running     0               25h
trilio-system                                      k8s-triliovault-exporter-77dc69f795-l8crn                                                  1/1     Running     0               25h
trilio-system                                      k8s-triliovault-operator-5cbc888d4c-7ddg5                                                  1/1     Running     0               7m48s
trilio-system                                      k8s-triliovault-resource-cleaner-28293630-nzmqz                                            0/1     Completed   0               16m
trilio-system                                      k8s-triliovault-web-678c48864b-9gnjj                                                       1/1     Running     0               25h
trilio-system                                      k8s-triliovault-web-backend-d4dbddb4f-wsjrz                                                1/1     Running     0               25h

Make sure other artifacts of the Trilio deployment are in good shape.

#####oc get crds | grep trilio
backupplans.triliovault.trilio.io                                 2023-10-16T07:00:21Z
backups.triliovault.trilio.io                                     2023-10-16T07:00:21Z
clusterbackupplans.triliovault.trilio.io                          2023-10-16T07:00:21Z
clusterbackups.triliovault.trilio.io                              2023-10-16T07:00:21Z
clusterrestores.triliovault.trilio.io                             2023-10-16T07:00:21Z
consistentsets.triliovault.trilio.io                              2023-10-16T07:00:21Z
continuousrestoreplans.triliovault.trilio.io                      2023-10-16T07:00:22Z
hooks.triliovault.trilio.io                                       2023-10-16T07:00:22Z
licenses.triliovault.trilio.io                                    2023-10-16T07:00:22Z
policies.triliovault.trilio.io                                    2023-10-16T07:00:22Z
restores.triliovault.trilio.io                                    2023-10-16T07:00:22Z
targets.triliovault.trilio.io                                     2023-10-16T07:00:22Z
triliovaultmanagers.triliovault.trilio.io                         2023-10-16T06:57:15Z

Troubleshooting through Logs

It would be helpful to know different phases of backup and restore operations and where to find the corresponding logs for the different phases of an operation.

Broadly, the backup operation has the following phases namely MetaSnapshot, HookTargetIdentification, Quiesce, ImageBackup, DataSnapshot, Unquiesce, DataUpload, MetadataUpload, Retention and Cleanup.

Similarly, the restore operation has the following phases namely TargetValidation, Validation, PrimitiveMetadataRestore, DataRestore, DataOwnerUpdate, Unquiesce, MetadataRestore, RestoreCleanup, AddProtection, ImageRestore and HookTargetIdentification In case backup or restore fails during any of the following phases, the first thing to make sure is that all the other workloads of T4K and cluster are running properly and also whether CSI snapshot controller is working properly.

To troubleshoot a backup or restore issue, first start with displaying backups with following commands.

See BACKUP STATUS column for more details.

master $ kubectl get backup
NAME               BACKUPPLAN            BACKUP TYPE   STATUS       DATA SIZE   CREATION TIME          START TIME             END TIME           PERCENTAGE COMPLETED   BACKUP SCOPE   DURATION
demo-backup        demo-backupplan       Full          InProgress   7077888     2023-10-17T06:30:05Z   2023-10-17T06:30:05Z                      20                     Namespace      1m13s
master $ kubectl get backup
NAME               BACKUPPLAN            BACKUP TYPE   STATUS       DATA SIZE   CREATION TIME          START TIME             END TIME               PERCENTAGE COMPLETED   BACKUP SCOPE   DURATION
demo-backup        demo-backupplan       Full          Failed       7077888     2023-10-17T23:00:03Z   2023-10-17T23:00:04Z   2023-10-17T23:02:24Z   31                     Namespace      1m13s
master $ kubectl describe backup demo-backup
Name:         demo-backup
Namespace:    default
Labels:       app.kubernetes.io/managed-by=k8s-triliovault-ui
              app.kubernetes.io/name=k8s-triliovault
              app.kubernetes.io/part-of=k8s-triliovault
Annotations:  triliovault.trilio.io/creator: system:serviceaccount:default:k8s-triliovault
              triliovault.trilio.io/instance-id: 3c23759c-c6bc-4431-a948-c8a9b83a8d2a
              triliovault.trilio.io/updater:
                [{"username":"system:serviceaccount:default:k8s-triliovault","lastUpdatedTimestamp":"2023-10-17T23:00:04.048729325Z"}]
API Version:  triliovault.trilio.io/v1
Kind:         Backup
Metadata:
  Creation Timestamp:  2023-10-17T23:00:03Z
  Finalizers:
    backup-cleanup-finalizer
  Generation:  1
  Resource Version:        29232936
  UID:                     8185000b-9253-40f8-8744-552c62893fc3
Spec:
  Backup Plan:
    API Version:       triliovault.trilio.io/v1
    Kind:              BackupPlan
    Name:              demo-backupplan
    Namespace:         default
    Resource Version:  28572742
    UID:               984a46f2-d225-4008-8a11-469644a0d837
  Type:                Full
Status:
  Backup Scope:          Namespace
  Completion Timestamp:  2023-10-17T23:02:24Z
  Condition:
    Phase:                MetaSnapshot
    Reason:               MetaSnapshot InProgress
    Status:               InProgress
    Timestamp:            2023-10-17T23:00:04Z
    Phase:                MetaSnapshot
    Reason:               MetaSnapshot Completed
    Status:               Completed
    Timestamp:            2023-10-17T23:01:17Z
    Phase:                HookTargetIdentification
    Reason:               HookTargetIdentification Failed
    Status:               Failed
    Timestamp:            2023-10-17T23:01:17Z
    Phase:                MetadataUpload
    Reason:               MetadataUpload InProgress
    Status:               InProgress
    Timestamp:            2023-10-17T23:01:17Z
    Phase:                MetadataUpload
    Reason:               MetadataUpload Completed
    Status:               Completed
    Timestamp:            2023-10-17T23:02:24Z
  Duration:               2m20s
  Encryption Enabled:     false
  Expiration Timestamp:   2023-10-22T23:00:00Z
  Location:               984a46f2-d225-4008-8a11-469644a0d837/8185000b-9253-40f8-8744-552c62893fc3
  Metadata Size:          6557696
  Percentage Completion:  31
  Phase:                  MetadataUpload
  Phase Status:           Completed
  Size:                   6557696
  Snapshot:
    Custom:
      Resources:
        Group Version Kind:
          Kind:     ConfigMap
          Version:  v1
        Objects:
          kube-root-ca.crt
        Group Version Kind:
          Kind:     ServiceAccount
          Version:  v1
        Objects:
          default
  Start Timestamp:  2023-10-17T23:00:04Z
  Stats:
    Hook Exists:  true
    Target Info:
      Target:
        API Version:       triliovault.trilio.io/v1
        Kind:              Target
        Name:              demo-target
        Namespace:         default
        Resource Version:  28338697
        UID:               15a18f3f-9bfb-4c81-9407-738a7cc484ca
      Type:                NFS
      Vendor:              Other
  Status:                  Failed
  Type:                    Full
Events:                    <none>

The phase at which failure occurred can be found in the status of the output through above command. If the status doesn't have clear reason of failure, we need to check the logs of the pods would be generally in an Error state for that particular phase of backup or restore which failed.

If there no such pods in error state either, and none of the above steps are helpful, then we need to check the T4K control plane logs which we can collect using the log collector tool mentioned below.

Log collector

PreviousT4K Cleanup PluginNextKnown Issues and Workarounds

Last updated 3 months ago

Was this helpful?

You can refer to the page, collect logs and send it to the Trilio Team for further analysis of the issue.

Log Collection