LogoLogo
5.0.X
5.0.X
  • About Trilio for Kubernetes
    • Welcome to Trilio For Kubernetes
    • Version 5.0.X Release Highlights
    • Compatibility Matrix
    • Marketplace Support
    • Features
    • Use Cases
  • Getting Started
    • Getting Started with Trilio on Red Hat OpenShift (OCP)
    • Getting Started with Trilio for Upstream Kubernetes (K8S)
    • Getting Started with Trilio for AWS Elastic Kubernetes Service (EKS)
    • Getting Started with Trilio on Google Kubernetes Engine (GKE)
    • Getting Started with Trilio on VMware Tanzu Kubernetes Grid (TKG)
    • More Trilio Supported Kubernetes Distributions
      • General Installation Prerequisites
      • Rancher Deployments
      • Azure Cloud AKS
      • Digital Ocean Cloud
      • Mirantis Kubernetes Engine
      • IBM Cloud
    • Licensing
    • Using Trilio
      • Overview
      • Post-Install Configuration
      • Management Console
        • About the UI
        • Navigating the UI
          • UI Login
          • Cluster Management (Home)
          • Backup & Recovery
            • Namespaces
              • Namespaces - Actions
              • Namespaces - Bulk Actions
            • Applications
              • Applications - Actions
              • Applications - Bulk Actions
            • Virtual Machines
              • Virtual Machine -Actions
              • Virtual Machine - Bulk Actions
            • Backup Plans
              • Create Backup Plans
              • Backup Plans - Actions
            • Targets
              • Create New Target
              • Targets - Actions
            • Hooks
              • Create Hook
              • Hooks - Actions
            • Policies
              • Create Policies
              • Policies - Actions
          • Monitoring
          • Guided Tours
        • UI How-to Guides
          • Multi-Cluster Management
          • Creating Backups
            • Pause Schedule Backups and Snapshots
            • Cancel InProgress Backups
            • Cleanup Failed Backups
          • Restoring Backups & Snapshots
            • Cross-Cluster Restores
            • Namespace & application scoped
            • Cluster scoped
          • Disaster Recovery Plan
          • Continuous Restore
      • Command-Line Interface
        • YAML Examples
        • Trilio Helm Operator Values
    • Upgrade
    • Air-Gapped Installations
    • Uninstall
  • Reference Guides
    • T4K Pod/Job Capabilities
      • Resource Quotas
    • Trilio Operator API Specifications
    • Custom Resource Definition - Application
  • Advanced Configuration
    • AWS S3 Target Permissions
    • Management Console
      • KubeConfig Authenticaton
      • Authentication Methods Via Dex
      • UI Authentication
      • RBAC Authentication
      • Configuring the UI
    • Resource Request Requirements
      • Fine Tuning Resource Requests and Limits
    • Observability
      • Observability of Trilio with Prometheus and Grafana
      • Exported Prometheus Metrics
      • Observability of Trilio with Openshift Monitoring
      • T4K Integration with Observability Stack
    • Modifying Default T4K Configuration
  • T4K Concepts
    • Supported Application Types
    • Support for Helm Releases
    • Support for OpenShift Operators
    • T4K Components
    • Backup and Restore Details
      • Immutable Backups
      • Application Centric Backups
    • Retention Process
      • Retention Use Case
    • Continuous Restore
      • Architecture and Concepts
  • Performance
    • S3 as Backup Target
      • T4K S3 Fuse Plugin performance
    • Measuring Backup Performance
  • Ecosystem
    • T4K Integration with Slack using BotKube
    • Monitoring T4K Logs using ELK Stack
    • Rancher Navigation Links for Trilio Management Console
    • Optimize T4K Backups with StormForge
    • T4K GitHub Runner
    • AWS RDS snapshots using T4K hooks
    • Deploying Trilio For Kubernetes with Openshift ACM Policies
  • Krew Plugins
    • T4K QuickStart Plugin
    • Trilio for Kubernetes Preflight Checks Plugin
    • T4K Log Collector Plugin
    • T4K Cleanup Plugin
  • Support
    • Troubleshooting Guide
    • Known Issues and Workarounds
    • Contacting Support
  • Appendix
    • Ignored Resources
    • OpenSource Software Disclosure
    • CSI Drivers
      • Installing VolumeSnapshot CRDs
      • Install AWS EBS CSI Driver
    • T4K Product Quickview
    • OpenShift OperatorHub Custom CatalogSource
      • Custom CatalogSource in a restricted environment
    • Configure OVH Object Storage as a Target
    • Connect T4K UI hosted with HTTPS to another cluster hosted with HTTP or vice versa
    • Fetch DigitalOcean Kubernetes Cluster kubeconfig for T4K UI Authentication
    • Force Update T4K Operator in Rancher Marketplace
    • Backup and Restore Virtual Machines running on OpenShift
    • T4K For Volumes with Generic Storage
    • T4K Best Practices
Powered by GitBook
On this page
  • Management Console not accessible using NodePort
  • Target Creation Failure
  • Backup Failures
  • Restore Failures
  • OOMKilled Issues
  • Uninstall Trilio
  • Coexisting with service mesh

Was this helpful?

  1. Support

Known Issues and Workarounds

This page describes issues that may be encountered and potential solutions to those issues.

PreviousTroubleshooting GuideNextContacting Support

Was this helpful?

Management Console not accessible using NodePort

If user has done all the configuration mentioned as per the instructions and still not able to access the UI, check the Firewall Rules for the Kubernetes cluster nodes.

Here is an example to check firewall rules on the Google GKE cluster:

  1. Search the VPC Network on GCP web console

  2. Select `Firewall` option from the left pane

  1. Search the rules with your Kubernetes cluster name

  2. In the list of firewalls, verify:

    1. Filters column - Shows the source IP which can access the T4K Management Console Hostname and NodePort

    2. Protocols/Ports column - Shows the ports which are accessible on the cluster nodes.

  3. Verify that the NodePort assigned to service k8s-triliovault-ingress-gateway is added in the firewall rules Protocols and Ports.

Target Creation Failure

kubectl apply -f <backup_target.yaml> is not successful and the resource goes into Unavailable state.

Possible reasons for the target marked Unavailable could be not having enough permissions or lack of network connectivity. Target resource creation launches a validation pod to check if Trilio can connect and write to the Target. Trilio then cleans the pod after the validation process completes, which clears any logs that the validation pod has. Users should repeat the same operation and actively check the logs on the validation pods.

Kubernetes

kubectl get pods -A | grep validator
kubectl logs <validator-pod> -n <namespace>

OpenShift

oc get pods -A | grep validator
oc logs <validator-pod> -n <namespace

Backup Failures

Please follow the following steps to analyze backup failures.

kubectl get jobs
oc get jobs

To troubleshoot backup issues, it is important to know the different phases of jobs.triliovault pod.

  1. <backupname>-snapshot: Trilio leverages CSI snapshot functionality to take PV snapshots. If the backup fails in this step user should try a manual snapshot of the PV to make sure all required drivers are present, and that this operation works without Trilio.

  2. <backupname>-upload: In this phase, Trilio uploads metadata and data to the Target. If the backup fails in this phase, we need to look into the following logs for error.

Kubernetes

kubectl get pods -A | grep k8s-triliovault-control-plane
kubectl logs <controller POD>
kubectl logs <failed or errored datamover pod>

OpenShift

oc get pods -A | grep k8s-triliovault-control-plane
oc logs <controller POD>
oc logs <failed or errored datamover pod>

Restore Failures

The most common reasons for restore failures are resource conflicts. Users may try to restore the backup to the same namespace where the original application and restored application have names conflict. Please check the logs to analyze the root cause of the issue.

Kubernetes

kubectl get pods -A | grep k8s-triliovault-control-plane
kubectl logs <controller POD>
kubectl logs <failed or errored metamoved pod>

OpenShift

oc get pods -A | grep k8s-triliovault-control-plane
oc logs <controller POD>
oc logs <failed or errored metamoved pod>

Webhooks are installed

Problem Description: If a Trilio operation is failing with an error Internal error occurred: failed or service "k8s-triliovault-webhook-service" not found

Error from server (InternalError): error when creating "targetmigration.yaml": Internal error occurred: failed 
calling webhook "mapplication.kb.io": Post https://k8s-triliovault-webhook-service.default.svc:443/mutate?timeout=30s: service "k8s-triliovault-webhook-service" not found

Solution : This might be due to the installation of T4K in multiple Namespaces. Even a simple clean-up of those installations will not have webhook instances cleared. Please run the following commands to get all the webhooks instances.

kubectl get validatingwebhookconfigurations  -A | grep trilio
kubectl get mutatingwebhookconfigurations -A | grep trilio

Delete duplicate entries if any.

Restore Failed

Problem Description: If you are not able to restore an application into different namespace (OpenShift)

Error:
Permission denied: could not bind to address 0.0.0.0:80

Solution: User may not have sufficient permissions to restore the application into the namespace.

oc adm policy add-scc-to-user anyuid -z default -n <NameSpace>O

OOMKilled Issues

Sometimes users may notice some Trilio pods go into OOMKilled state. This can happen in environments where a lot of objects exist for which additional processing power may be needed. In such scenarios, the memory and CPU limits can be bumped up by running the following commands:

Kubernetes

OCP

Within OCP, the memory and CPU values for Trilio are managed and controlled by the Operator ClusterServiceVersion(CSV)Users can either edit the Operator CSV YAML directly from within the OCP console or can patch the pods based on the snippets provided below.

The following snippet shows how to increase the memory value to 2048Mi for the web-backend pod

export VERSION=2.6.0
export NAMESPACE=openshift-operators

oc patch csv k8s-triliovault-stable.$VERSION --type='json'   -p='[{"op": "replace", "path": "/spec/install/spec/deployments/4/spec/template/spec/containers/0/resources/limits/memory", "value":"2048Mi"}]'

kubectl delete pod -l app=k8s-triliovault-web-backend -n $NAMESPACE

The following snippet shows how to increase the CPU value to 800M for the web-backend pod

export VERSION=2.6.0
export NAMESPACE=openshift-operators

oc patch csv k8s-triliovault-stable.$VERSION --type='json'   -p='[{"op": "replace", "path": "/spec/install/spec/deployments/4/spec/template/spec/containers/0/resources/limits/cpu", "value":"800m"}]'

kubectl delete pod -l app=k8s-triliovault-web-backend -n $NAMESPACE

After performing this workaround you will see two k8s-triliovault-web-backend pods because the replica set will be set to 2. This happens because Trilio sets therevisionhistoryto 1. Any changes to the csv and/or deployment will leave behind an extra replicaset. This can be resolved by manually deleting the extra replicaset or by setting revisionhistory to 0.

Uninstall Trilio

To uninstall Trilio and start fresh, delete all Trilio CRD's.

On Kubernetes

kubectl delete crds $(kubectl get crds | grep trilio | awk '{print $1}')

On OpenShift

oc delete crds $(kubectl get crds | grep trilio | awk '{print $1}')

Coexisting with service mesh

A service mesh assigned to a namespace can cause Trilio containers to fail to exit due to the injection of sidecar containers. Trilio operations, including the creation of targets, backups, and restores, may encounter indefinite execution or failure due to timeout.

By running the following commands, one can determine if a non-Trilio container is running inside the pod.

kubectl get pods -n NAMESPACE

kubectl describe pod PODNAME -n NAMESPACE

Trilio has the ability to coexist with a service mesh if it supports exclusion of sidecar init containers from pods based on labeling. Additionally, Trilio offers optional flags that users can provide as needed.

Edit the Trilio "tvm" CR by running the below command.

The namespace is dependent on the Kubernetes distro and Trilio version.

kubectl edit tvm -n NAMESPACE

Add or edit the section “helmvalues” as shown below for the respective service mesh. Please follow the respective service mesh software’s best practices for assigning the label. Below is an example of linkerd service mesh.

spec:
  applicationScope: Cluster
  helmValues:
    PodAnnotations:
      linkerd.io/inject: disabled

Note : Trilio for Kubernetes automatically performs this exclusion for Istio and Portshift service mesh sidecars; no further action is required. There is an enhancement around this in Kubernetes version 1.27

Please follow the to increase resource limits for Trilio pods deployed via the upstream Helm based Operator

Accessing the Management Console
Select Firewall from VPC Network
Filters showing IP range and Protols/Ports showing exposed Port
API documentation