# Modifying Default T4K Configuration

## Main Configuration Types

There are different types of configuration for T4K which are helpful in tuning T4K as per requirements/cluster constraints. The available configurations are:

* InstanceName
* Log Configuration
* Pause Schedule Backups and Snapshot at T4K level
* Job Resource Requirements
* Scheduling Configuration
* Application Scope
* Ingress Configuration
* CSI Configuration
* Event Target Configuration
* S3Fuse WorkerPool Configuration

### InstanceName Configuration

Provide the **InstanceName** for the T4K installation.

### Log Configuration

Log configuration encapsulates the configuration of all the different logging components.\
T4K supports the following logging configurations:

* **logLevel**: Used to configure the logging level across the product.
* **datamoverLogLevel:** Used to configure the logging level for Datamover jobs. Datamover jobs are responsible for performing backup and restore of the data part of an application. Refer the backup and restore details [here](/kubernetes/concepts/backup-and-restore-process.md).
* the available log levels for the above two fields are as follows:
  1. `Panic`
  2. `Fatal`
  3. `Error`
  4. `Warn`
  5. `Info` (default value)
  6. `Debug`
  7. `Trace`
* **enableDualOutputLog (default: false):** Used to toggle dual output logging stack for T4K control plane components.\
  \
  Due to the huge volume of logs generated by the control plane components, the default kubernetes pod logs are rotated quickly and lost. Enabling this feature has the following properties:
  * Keeps the default logging behavior on the standard output/standard error that is handled by container runtime on the node.
  * Additionally, store the components logs separately in an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume in the control plane pod at path `/var/log/triliovault` .This will help in retaining logs for longer which is useful in case of any issues during debugging. There are 3 internal components in the control plane namely `analyzer`, `webhook-server` and `controller`.
  * Each component log files have maximum size in MB which is configurable and applies to each component log file commonly.
  * Each of the live log file for each component is rotated and compressed automatically once the above configured size limit is reached.
  * Total number of log files to retain is configurable and applies to each log files of each component commonly.
* **maxLogFileSizeMB (default: 10):** Used to configure the maximum size of each log file for each component. If not provided, the default value of 10 MB for each log file applies.
* **maxLogFiles (default: 5):** Used to configure the maximum number of log files for each components to retain after which the oldest one get deleted during rotation. If not provided, the default value of 5 log files for each component applies.

#### Advanced Configuration for dual output log stack

These are advanced configuration flags that a user can configure only if `enableDualOutputLog` flag is enabled. In most cases, the default values are sufficient for optimal working but can be adjusted as per user's requirements.

* **bufferSize (default: 20000):** Used to configure buffer capacity to hold logs in memory in case of high burst traffic scenarios like control plane pod starting right after upgrades. The default value of 20000 buffer capacity with consideration of an average of 200 bytes per log size would mean \~4MB additional memory utilization.
* **flushDurationMilliSeconds (default: 300):** Used to configure at what time interval does logs get flushed into the log files from the in-memory buffer. By default, the logs will get flushed from the buffer to log files every 300 milliseconds.

{% hint style="warning" %}
Important points to consider if the dual output log feature is enabled:

* The logs captured occupies space on the node itself. User should be wary of storage constraints of the node where the control plane pod is scheduled and accordingly specify the different configuration options for this feature to limit the space utilized by the captured logs.
* The logs captured for the control plane components by the dual output logging stack is tied to the lifecycle of the control plane pod itself and will be lost if the pod gets deleted.<br>
  {% endhint %}

### Pause Schedule Backups And Snapshots

This flag provides ability to pause the schedule backups and snapshots at the global level. Enabling this flag will pause all the backups and snapshots for the given T4K. This functionality provides users with a straightforward way to manage the scheduled backups and snapshot at T4K level and allows user to pause all backups and snapshots in case of upgrades and maintenance.

### Job Resource Requirements

Job Resource Requirement is used to modify the default resources requirements like CPU and memory for all the pods which are created as part of product installation as well as backup and restore operation. There are different fields for setting resource requirements for different types of pods.

* **metadataJobResources** - Specifies the resource requirements for all meta-data related and target mounting jobs like target-validator, meta-snapshot, pre-restore-validation, meta-restore, etc
* **dataJobResources** : Specifies the resource requirements for Datamover jobs
* **targetBrowserResources** : Specifies the resource requirements for Target Browser Jobs
* **deploymentLimits** : Specifies limits for helm chart deployments. Not applicable for OCP

### Scheduling Configuration

T4K provides a way to put a constraint on the pods it creates by itself and schedule those pods on a specific set of nodes in the cluster. T4K leverages the existing kubernetes scheduling mechanisms to schedule its pods.

> We have enhanced scheduling flexibility by separating the control plane and worker node scheduling configurations. Previously, all pods shared a single scheduling configuration. Now, the configuration is divided into two distinct parts: one for the control plane and another for worker jobs

#### Control-Plane Pods Scheduling Configuration

Control-Plane Pods include the pod of following workloads that are created by T4K when installed

1. Control Plane Deployment
2. Exporter Deployment
3. Ingress Nginx Controller Deployment
4. Resource Cleaner CronJob
5. Web-backend Deployment
6. Web Deployment

There are three fields to put scheduling constraints on control-plane pods of T4K, they are:

1. `NodeSelector`**:** User can specify matching node-labels to schedule pods on a particular node or set of nodes. Refer to the official Kubernetes documentation on [Node Selection](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) for more information.
2. `Affinity`**:** Affinity is used when a pod **should** schedule or **prefer to** schedule on a particular set of nodes. Affinity is of two types viz. - *Node Affinity* and *Pod Affinity*. On the other hand, *Pod Anti-Affinity* is opposite of *Pod Affinity* where a pod **should not** or **prefer not to** schedule on a particular set of nodes. User can specify both the Affinity and Anti-Affinity. Refer to the official Kubernetes documentation on [Affinity and anti-affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity) for more information.
3. `Toleration`**:** Kubernetes allows nodes to add **taints** so that nodes can repel a set of pods which do not tolerate the given taints. To tolerate a taint, a pod must specify **toleration**. User can specify the tolerations with matching taints, so that T4K pods will be able to schedule themselves in a cluster with tainted nodes. Refer to the official Kubernetes documentation on [Taints and Tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) for more information.

#### WorkerJob Pods Scheduling Configuration

WorkerJob Pods are job pods, these jobs are triggered during actions such as backup, snapshot or restore. These jobs also have a similar scheduling config including the three fields just like the one in control-plane pods.

* `WorkerJobsSchedulingConfig` : This config takes `NodeSelector` , `Toleration` , and `Affinity` that would be used to set the scheduling configuration for worker job pods.

### Application Scope

Application Scope denotes T4K installation scope. Installation scope can be of two types - **Cluster** and **Namespaced**. When T4K is installed in Namespaced scope, the cluster scoped CRDs will not be installed. The cluster scoped CRDs are:

1. ClusterBackupplan
2. ClusterBackup
3. ClusterRestore
4. ContinuousRestorePlan
5. ConsistentSet

In namespaced scope installation, the cluster scoped features such as - multi-namespace backup and restore and continuous-restore won't be enabled for use. The working of T4K will be restricted to it's installation namespace only.

### Ingress Configuration

User can configure the external access to T4K services using the Ingress through HTTP or HTTPS. There four configurable fields related to Ingress. They are:

1. `IngressClass`: Name of the IngressClass resource which contains additional information regarding Ingress' parameters and the name of the controller that implements this particular IngressClass.
2. `Annotations`: Extra annotations to be added on the required Ingress resource.
3. `Host`: Name based virtual host against the IP address on which the external traffic will be received.
4. `TLSSecretName`: Name of the TLS secret that contains a private key and certificate. When TLS secret name is specified, the external traffic to Ingress will use HTTPS protocol (port 443). The TLS secret should be present in the namespace where T4K is to be installed.

The fields `IngressClass` and `Annotations` fields should be empty when `componentConfiguration.ingress-controller` map contains key-value `enabled: "true"` .

{% hint style="warning" %}
If the ingress controller service is exposed using `type: LoadBalancer`, confirm your load balancer registers only nodes that actually run the ingress controller pod when using `ExternalTrafficPolicy: Local`. Some providers register all nodes. With `Local`, traffic sent to a node without an ingress pod will be dropped, leading to slow or failed UI access. Setting `ExternalTrafficPolicy: Cluster` avoids this by allowing traffic to be forwarded to healthy pods on other nodes.

Generic guidance:

* Use `ExternalTrafficPolicy: Cluster` unless your LB backends are node-local to ingress.
* Validate LB backends/endpoints vs. nodes running the controller before opting into `Local`.
  {% endhint %}

Expose T4K UI on Custom Path

1. `urlPath`: Path on which UI should be accessible. By default values is "/".

### CSI Configuration

CSI configuration is used to configure the CSI provisioners which do not support Volume Snapshot functionality. For full details about this, refer to [T4K For Volumes with Generic Storage](/kubernetes/appendix/storage/generic-storage-volumes.md). To enable this configuration there are 3 lists:

* `default` CSI Provisioner list: Known list of CSI provisioners which do not support Volume Snapshot. Maintained by T4K. This list will be updated as and when new non-snapshot CSI provisioners are discovered.
* `include` CSI Provisioner list: User given list of CSI provisioners which do not support Volume Snapshot.
* `exclude` CSI Provisioner list: User given list of CSI provisioners which need to be ignored from Default list.

### Event Target Configuration

The Event Target configuration enables you to customize the deployment settings for the event-target component in T4K. Specifically, you can adjust the liveness probe parameters to fine-tune the health checks for the event-target service. Modifying these settings can be helpful in environments with high network latency between the cluster where T4K is deployed and the target.

The configurable parameters for event-target include:

* `initialDelaySeconds`: Number of seconds after the container has started before liveness probe is initiated
* `timeoutSeconds`: Number of seconds after which the probe times out
* `periodSeconds`: How often (in seconds) to perform the probe
* `successThreshold`: Minimum consecutive successes for the probe to be considered successful after having failed
* `failureThreshold`: Number of times Kubernetes will retry a failed probe before giving up

## Changing the Configuration

Input for the aforementioned T4K configuration is provided through the TVM CR. The following section explains how to modify the T4K configuration:

```yaml
   apiVersion: triliovault.trilio.io/v1
   kind: TrilioVaultManager
   metadata:
     labels:
       triliovault: k8s
     name: sample-triliovaultmanager
   spec:
     helmValues:
       schedulePolicyTimezone: Etc/UTC
       urlPath: "/"
       jobSpec:
         activeDeadlineSeconds: 43200
         pendingDeadlineSeconds: 3600
     componentConfiguration:
       event-target:
         livenessProbe:
           initialDelaySeconds: 300
           timeoutSeconds: 120
           periodSeconds: 240
           successThreshold: 1
           failureThreshold: 3
     applicationScope: Namespaced
     tvkInstanceName: "tvk"
     logConfig:
       logLevel: Info
       datamoverLogLevel: Info
       enableDualOutputLog: false
     dataJobResources:
       limits:
         cpu: 1500m
         memory: 5Gi
       requests:
         cpu: '1'
         memory: '2Gi'
     metadataJobResources:
       limits:
         cpu: 500m
         memory: 1024Mi
       requests:
         cpu: 10m
         memory: 10Mi
     ingressConfig:
       ingressClass: alb # specify `ingressClass` only when you are not using trilio's ingress controller
       annotations:
         alb.ingress.kubernetes.io/load-balancer-name: trilio-load-balancer
       host: "trilio.com"
       tlsSecretName: "tls-secret"
     nodeSelector:
       host: Linux
       arch: x86
     affinity:
       nodeAffinity:
         preferredDuringSchedulingIgnoredDuringExecution:
         - weight: 1
           preference:
             matchExpressions:
             - key: arch
               operator: Exists
               values:
               - x86
       podAffinity:
         requiredDuringgSchedulingIgnoredDuringExecution:
           labelSelector:
           - matchExpressions:
             - key: host
               operator: In
               values:
               - Linux
           topologyKey: topology.kubernetes.io/zone
     tolerations:
     - key: "key1"
       operator: "Equal"
       value: "value1"
     - key: "key2"
       value: "Exists"
       value: "value2"
     workerJobsSchedulingConfig:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: "kubernetes.io/hostname"
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            operation: trilio-datamover
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: arch
                operator: Exists
                values:
                - x86
        podAffinity:
         requiredDuringgSchedulingIgnoredDuringExecution:
           labelSelector:
           - matchExpressions:
             - key: host
               operator: In
               values:
               - Linux
           topologyKey: topology.kubernetes.io/zone
     csiConfig:
       include:
         - example.provisioner.csi
       exclude:
         - test.provisioner.csi
```

**Job Resource Requirements:**

* Set `spec.dataJobResources` for data job resource requirements
* Set `spec.metadataJobResources` for meta job resource requirements
* Set `spec.targetBrowserResources` for target browser pod resource requirements

**T4K Configuration:**

* Set `spec.logLevel` for logLevel. Default is `Info`
* Set `spec.datamoverLogLevel` for datamover jobs logLevel. Default is `Info`
* Set `spec.tvkInstanceName` for T4K instance name

**Scheduling Configuration**

* Set `spec.nodeSelector` for running control-plane pods on a particular set of nodes
* Set `spec.affinity.nodeAffinity` for scheduling control-plane pods with node affinity
* Set `spec.affinity.podAffinity` for scheduling control-plane pods with pod affinity
* Set `spec.affinity.podAntiAffinity` for scheduling control-plane pods with anti affinity
* Set `spec.tolerations` to make control-plane pods tolerant of the taints mentioned on nodes
* Set `spec.workerJobsSchedulingConfig.affinity.nodeAffinity` for scheduling worker job pods with node affinity
* Set `spec.workerJobsSchedulingConfig.affinity.podAffinity` for scheduling worker job pods with pod affinity
* Set `spec.workerJobsSchedulingConfig.affinity.podAntiAffinity` for scheduling worker job pods with pod anti affinity
* Set `spec.workerJobsSchedulingConfig.tolerations` to make worker job pods tolerant of the taints mentioned on nodes
* Set `spec.workerJobsSchedulingConfig.nodeSelector` for running worker job pods on a particular set of nodes
* Set `spec.workerJobSchedulingConfig.topologySpreadConstraints` for evenly distributing worker pods across nodes, zones, or regions in your Kubernetes cluster.

For more details on workerJobSchedulingConfig's [topologySpreadConstraints](https://github.com/trilioData/docs-public-k8s/blob/5.2.x/advanced-configuration/topology-spread-constraints.md)

**Application Scope**

* Set `spec.applicationScope` to set the scope of T4K installation. Default is `Namespaced`

**Ingress Configuration**

* Set `spec.ingressConfig.ingressClass` to set the IngressClass resource
* Set `spec.ingressConfig.host` to set the host name to route external HTTP(S) traffic
* Set `spec.ingressConfig.annotations` to add extra annotations on the Ingress resource
* Set `spec.ingressConfig.tlsSecretName` to set the TLS secret to use the TLS port 443 for external Ingress traffic.

**Custom Path Configuration:**

* Set spec.helmValues.urlPath to set custom path.

**CSI Configuration:**

* Set `spec.csiConfig.include` list for including the CSI provisioners in the non-snapshot functionality category
* Set `spec.csiConfig.exclude` list for excluding the CSI provisioners from the non-snapshot functionality category.

**Event Target Configuration:**

* Set `spec.componentConfiguration.event-target.livenessProbe.initialDelaySeconds` to configure the initial delay before liveness probe starts
* Set `spec.componentConfiguration.event-target.livenessProbe.timeoutSeconds` to configure the timeout for liveness probe
* Set `spec.componentConfiguration.event-target.livenessProbe.periodSeconds` to configure how often the liveness probe runs
* Set `spec.componentConfiguration.event-target.livenessProbe.successThreshold` to configure minimum consecutive successes for the probe to be considered successful
* Set `spec.componentConfiguration.event-target.livenessProbe.failureThreshold` to configure number of retries before giving up on failed probe

**Worker Job Deadline configuration :**

* Configure `spec.helmValues.jobSpec.activeDeadlineSeconds` to use [k8s jobs' activeDeadlineSeconds](https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup) feature for T4K worker jobs.
* Set `spec.helmValues.jobSpec.pendingDeadlineSeconds` to set the pending deadline value to T4K worker jobs. It is a custom field introduced by T4K to limit the time a Job can stay in pending state before it is forcefully terminated and marked failed.

{% hint style="info" %}
The deadline configuration set during job creation applies specifically to that job.
{% endhint %}

**S3Fuse WorkerPool Configuration**

* Set `spec.helmValues.s3fuse.workerPoolSize` to set number of S3Fuse worker threads to use for s3 upload and download during backup and restore.

#### SchedulePolicyTimezone:

* Set `spec.helmValues.schedulePolicyTimezone` to configure timezone in which scheduled CronJobs BackupPlan/ ClusterBackupPlan will be triggered. By default this field will be set to `Etc/UTC` . The value can be any valid TZ Identifier from [this list](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.trilio.io/kubernetes/configuration/configuring-default-tvk-configuration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
