In-place Restore
Overview
In-Place Restore allows you to restore an application to a previous known-good state by performing a complete cleanup of existing resources before restoring them from a backup. Unlike a standard restore — which creates resources alongside existing ones — In-Place Restore first removes all current application resources (both metadata and data) and then recreates everything from the backup.
This gives you a clean, consistent application state that exactly matches the backup, with no leftover, drifted, or corrupted resources.
When to Use In-Place Restore
In-Place Restore is designed for scenarios where you need your application returned to an exact previous state:
Application drift — Your application has been modified (scaled, reconfigured, patched) and you want to roll it back to a known-good backup.
Corruption recovery — Application resources are in a broken or inconsistent state and need to be fully reset.
Environment reset — You want to return a development, staging, or test environment to a clean baseline captured in a backup.
Failed upgrade rollback — An upgrade went wrong and you want to restore the application exactly as it was before the upgrade.
Downtime expected: In-Place Restore deletes all existing application resources before restoring them. Your application will be unavailable during the cleanup and restore phases. Plan accordingly and inform stakeholders before running an In-Place Restore on production workloads.
How It Works
In-Place Restore integrates into Trilio's existing restore workflow. When you enable the cleanupConfig on a Restore CR, the following happens:
Step 1: Validation
Before anything is modified, Trilio runs a validation job that performs the following checks:
Ensures the target namespace is not a critical system namespace (see Blocked Namespaces)
Validates that data components (PVCs) in the backup are not owned by workloads that were created after the backup was taken (see Data Component Ownership)
Performs standard restore validations (dry-run resource checks, transformation validation, storage provisioner compatibility)
If any validation check fails, the restore is rejected and no resources are modified.
Step 2: Data Preparation
Trilio creates the data components (PVCs) in an internal install namespace and copies backup data into them. This step happens before any cleanup, so your existing application remains untouched until data is ready.
Step 3: Cleanup
Trilio determines the appropriate cleanup strategy automatically based on your backup type:
Namespace backups → Namespace-level cleanup (removes all resources in the namespace)
Application-scoped backups (Helm, Operator, Custom) → Application-level cleanup (removes only the application's resources)
During cleanup, Trilio performs these ordered actions:
All deletions use a two-phase approach:
First, a graceful deletion is attempted (standard Kubernetes delete).
If the resource is not deleted within the configured timeout (default: 60 seconds), Trilio automatically falls back to force deletion by removing finalizers and owner references.
This guarantees that cleanup always completes, even when resources are stuck.
Step 4: Restore
After cleanup, Trilio proceeds with its standard metadata restore process — recreating all application resources from the backup.
Step 5: Post-Restore Operations
Standard Trilio post-restore operations run as usual (unquiesce, add protection, cleanup restore artifacts).
Configuration
In-Place Restore is configured through the cleanupConfig field in the Restore CR spec.
Fields
cleanupConfig.enabled
boolean
Yes
—
Set to true to enable In-Place Restore
cleanupConfig.gracefulDeletionTimeoutSeconds
integer
No
60
Seconds to wait for graceful deletion before falling back to force deletion. Range: 60–1800
Basic Example
Example with Custom Timeout
Example with Transformations
You can combine In-Place Restore with resource transformations. For example, to change the storage class of PVCs during restore:
Cluster-Scoped Restore Example
For cluster-scoped restores (multi-namespace), cleanupConfig can be set at the global level or per-namespace in the RestoreConfig:
Cleanup Strategies
Trilio automatically selects the cleanup strategy based on the backup type. You do not need to configure this — it is determined at runtime and recorded in the Restore CR status.
Namespace Cleanup
Used when the backup is a namespace-scoped backup.
Trilio cleans up all resources captured in the backup using a combination of application-specific handlers:
Helm releases are uninstalled using native
helm uninstallCustom and label-selected resources are cleaned up using categorical ordering
All remaining resources are cleaned up in a second pass
This approach ensures comprehensive cleanup even when a namespace contains a mix of Helm, Operator, and Custom applications.
Application Cleanup
Used when the backup is an application-scoped backup (Helm, Operator, or Custom).
Trilio cleans up only the specific application's resources as identified in the backup:
Helm
Native helm uninstall, followed by categorical cleanup of any remaining release resources, then data cleanup
Operator
Custom Resources deleted first (concurrent with retry logic), then application resources, then operator infrastructure, then data
OLM Operator
Same as Operator, but operator infrastructure (Deployments, ServiceAccounts managed by OLM) is preserved to avoid impacting other namespaces
Virtual Machine
VMPool → VM → DataVolume deletion in controller-safe order, then categorical cleanup for remaining resources
Custom (Label/GVK)
VM resources first (if any), then Custom Resources, then categorical cleanup, then data
Resource Handling
Resources That Are Automatically Excluded from Cleanup
Certain resources are never deleted during cleanup to prevent damage to shared cluster infrastructure:
CustomResourceDefinition (CRD)
Deleting a CRD removes all custom resources of that type across the entire cluster
StorageClass
Cluster-scoped infrastructure shared by multiple applications
OperatorGroup
Shared by multiple operators in the same namespace
CatalogSource
Shared OLM infrastructure for operator catalogs
Subscription
OLM-managed; deleting it disrupts operator lifecycle management
ClusterServiceVersion (CSV)
OLM-managed operator definition; removing it impacts all consumers
InstallPlan
OLM-managed; deleting it disrupts reconciliation
Namespace
Deleting the namespace would destroy the entire restore scope
These resources are automatically skipped during cleanup. If they still exist on the cluster after cleanup, they are also skipped during the restore phase.
Platform Default Services
The kubernetes and openshift default services are always skipped during cleanup. These services are managed by the Kubernetes and OpenShift control planes, and deleting them would disrupt communication to the API server.
Controller-Managed Shared Resources
Some resources are automatically recreated by Kubernetes or external controllers after deletion (for example, the kube-root-ca.crt ConfigMap or the default ServiceAccount). When Trilio deletes and immediately tries to recreate such a resource from backup, the controller may have already recreated it, causing a conflict.
Default behavior: The resource is skipped and a warning is recorded in the Restore CR status.
With patchIfAlreadyExists: true: The existing resource is patched with the backed-up state using a 3-way merge, ensuring it matches the backup while preserving any fields added by the controller.
For more details on patchIfAlreadyExists, see the Restore Flags Guide.
Two-Phase Deletion
Every resource deletion during cleanup follows a two-phase approach:
Graceful deletion: A standard Kubernetes delete request is sent. Trilio waits up to the configured timeout (default: 60 seconds) for the resource to be removed.
Force deletion: If the resource still exists after the timeout, Trilio removes all finalizers and owner references from the resource and deletes it again. This ensures cleanup completes even when resources are stuck due to finalizers, webhook issues, or controller conflicts.
All force deletion actions are recorded as warnings in the Restore CR status (see Monitoring Cleanup Status), giving you full visibility into what happened.
Configuring the Timeout
The gracefulDeletionTimeoutSeconds field controls how long Trilio waits for graceful deletion before escalating to force deletion.
Default: 60 seconds
Minimum: 60 seconds
Maximum: 1800 seconds (30 minutes)
For most applications, the default of 60 seconds is sufficient. Increase this value if your application has resources with long-running finalizers or complex cleanup logic that needs more time.
Important Constraints
Blocked Namespaces
In-Place Restore with cleanup is not allowed in critical system namespaces. If you attempt to create a restore with cleanup enabled in any of these namespaces, the restore will be rejected during validation.
Kubernetes
kube-system, kube-public, kube-node-lease
Google Kubernetes Engine (GKE)
gke-system, gke-gmp-system, gke-managed-system, istio-system, config-management-system
Amazon EKS
amazon-cloudwatch, aws-observability, aws-load-balancer-controller, external-dns, karpenter
Azure AKS
gatekeeper-system, azure-arc, azure-monitor, calico-system, tigera-operator
OpenShift
openshift, openshift-apiserver, openshift-apiserver-operator, openshift-authentication, openshift-authentication-operator, openshift-cloud-controller-manager, openshift-cloud-network-config-controller, openshift-cluster-node-tuning-operator, openshift-cluster-storage-operator, openshift-config, openshift-config-managed, openshift-console, openshift-console-operator, openshift-controller-manager, openshift-controller-manager-operator, openshift-dns, openshift-dns-operator, openshift-etcd, openshift-image-registry, openshift-ingress, openshift-ingress-operator, openshift-kube-apiserver, openshift-kube-apiserver-operator, openshift-kube-controller-manager, openshift-kube-controller-manager-operator, openshift-kube-scheduler, openshift-kube-scheduler-operator, openshift-machine-api, openshift-machine-config-operator, openshift-monitoring, openshift-multus, openshift-network-diagnostics, openshift-network-operator, openshift-node, openshift-operator-lifecycle-manager, openshift-operators, openshift-ovn-kubernetes, openshift-service-ca, openshift-storage
Data Component Ownership Validation
When cleanup is enabled, Trilio validates the ownership of data components (PVCs) before proceeding. Specifically, for every PVC that exists both in the backup and on the cluster, Trilio checks the owner chain (the workload that mounts or owns the PVC). If any owner in the chain is a workload that was created after the backup was taken (i.e., it is not present in the backup), the restore will fail during validation.
This prevents Trilio from accidentally deleting PVCs that are now used by new workloads not covered by the backup.
Example: You took a backup of your namespace. Later, a new StatefulSet analytics-db was created that mounts an existing PVC shared-data (which is in the backup). If you attempt an In-Place Restore, validation will fail because analytics-db is not in the backup, and deleting shared-data would cause data loss for analytics-db.
Mutual Exclusivity with Resource Exclusions
In-Place Restore cannot be used together with resource exclusion selectors (excludeResources) in the same Restore CR. When cleanup is enabled, the entire backup scope is restored. If you need selective resource restoration, use a standard restore without cleanup.
Monitoring Cleanup Status
After a restore with cleanup completes (or fails), you can inspect the cleanup details in the Restore CR status.
Viewing Cleanup Status
Status Fields
The cleanupStatus section in the Restore CR status contains:
cleanupStrategy
The strategy used: Namespace or Application
forceCleanupWarnings
List of resources that required force deletion, with details
cleanupSummary.totalResources
Total number of resources processed during cleanup
cleanupSummary.gracefulCleanups
Number of resources deleted gracefully
cleanupSummary.forceCleanups
Number of resources that required force deletion
startTime
When the cleanup phase started
completionTime
When the cleanup phase completed
Example Status
In this example, 25 resources were cleaned up. 23 were deleted gracefully, and 2 required force deletion. The two force-deleted resources are listed with their names, types, and reasons.
Application Type Details
Helm Applications
When cleaning up a Helm application, Trilio first attempts a native helm uninstall to remove the release. If the uninstall fails or some resources remain, Trilio falls back to deleting individual resources in dependency order. After helm uninstall has already attempted graceful cleanup at the release level, individual remaining resources are force-deleted directly without waiting for a graceful timeout.
Data resources (PVCs and PVs) associated with the Helm release are cleaned up separately using the standard two-phase approach.
Operator Applications
Operator cleanup follows a structured five-phase approach:
OLM Operators
For operators managed by the Operator Lifecycle Manager (OLM), Trilio takes a conservative approach:
Cleaned: Custom Resources, application resources owned by CRs, and data resources
Preserved: OperatorGroup, CatalogSource, Subscription, ClusterServiceVersion, InstallPlan, and operator Deployments/ServiceAccounts
This is because OLM operators often serve multiple namespaces. Deleting operator infrastructure would impact all consumers of that operator, not just the application being restored.
Virtual Machine Applications
Virtual Machine resources are cleaned up in a specific order to prevent controllers from recreating resources:
Other VM-related resources (InstanceTypes, Preferences, ConfigMaps, Secrets) are handled through the standard categorical cleanup.
Best Practices
Frequently Asked Questions
Q: Will In-Place Restore delete resources that were created after the backup?
A: No. Regardless of whether the backup is namespace-scoped or application-scoped, In-Place Restore only cleans up resources that are present in the backup AND currently exist on the cluster. Resources that were created after the backup was taken are not touched during cleanup.
Q: What happens if cleanup fails partway through?
A: Infrastructure resources (ServiceAccounts, Secrets, ConfigMaps, etc.) use an atomic approach — they are deleted and immediately recreated from backup, so they are never left in a deleted state. For other resources, the restore will record the error in its status. Resources that were already deleted will need to be manually restored or a new restore attempt can be made.
Q: Does cleanup delete my CRDs?
A: No. CRDs are always excluded from cleanup because deleting a CRD would remove all custom resources of that type across the entire cluster, potentially impacting other applications.
Q: Can I preview what will be cleaned up before running In-Place Restore?
A: The validation phase checks for potential issues (blocked namespaces, data component ownership) and will reject the restore if problems are found. However, there is no dry-run mode for cleanup itself. You can review the backup contents to understand what resources will be affected.
Q: What is the difference between cleanupOnFailure and cleanupConfig?
A: These are different features:
cleanupOnFailure— Cleans up partially restored resources when a restore operation fails, reverting the cluster to its pre-restore state.cleanupConfig— Cleans up existing resources before restoring, enabling a complete application reset to the backup state. This is the In-Place Restore feature.
Last updated