# Online upgrade Trilio Appliance

{% hint style="info" %}
This describes the upgrade process from Trilio 4.0 or Trilio 4.0SP1 to Trilio 4.1 GA or its hotfix releases.
{% endhint %}

{% hint style="warning" %}
Kolla Ansible Openstack only: The mount point for the Trilio Backup Target has changed in Trilio 4.1. A reconfiguration after the upgrade is required.
{% endhint %}

## **Generic Pre-requisites**

{% hint style="info" %}
The prerequisites should already be fulfilled from upgrading the Trilio components on the Controller and Compute nodes.
{% endhint %}

* Please ensure to complete the upgrade of all the Trilio components on the Openstack controller & compute nodes before starting the rolling upgrade of TVM.
* The mentioned Gemfury repository should be accessible from TVault VM.
* Please ensure the following points before starting the upgrade process:
  * *No snapshot OR restore to be running.*
  * *Global job-scheduler should be disabled.*
  * wlm-cron should be disabled and any lingering process should be killed.

### Deactivating the wlm-cron service

The following sets of commands will disable the wlm-cron service and verify that it has been completely shut down.

```
pcs resource disable wlm-cron                                                                                                                                                                                                                                                                                                                                                        -cron

```

Verify if the service is shut down with the below set of commands and expected output:

```
[root@TVM2 ~]# systemctl status wlm-cron
● wlm-cron.service - workload's scheduler cron service
   Loaded: loaded (/etc/systemd/system/wlm-cron.service; disabled; vendor preset                                                                                                                                                                                                                                                                                                                                                           : disabled)
   Active: inactive (dead)

Jun 11 08:27:06 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:06 - INFO - 1...t
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 140686268624368 Child 11389 ki...5
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - 1...5
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: Shutting down thread pool
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - S...l
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: Stopping the threads
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - S...s
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: All threads are stopped succes...y
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - A...y
Jun 11 08:27:09 TVM2 systemd[1]: Stopped workload's scheduler cron service.
Hint: Some lines were ellipsized, use -l to show in full.
[root@TVM2 ~]# pcs resource show wlm-cron
 Resource: wlm-cron (class=systemd type=wlm-cron)
  Meta Attrs: target-role=Stopped
  Operations: monitor interval=30s on-fail=restart timeout=300s (wlm-cron-monito                                                                                                                                                                                                                                                                                                                                                           r-interval-30s)
              start interval=0s on-fail=restart timeout=300s (wlm-cron-start-int                                                                                                                                                                                                                                                                                                                                                           erval-0s)
              stop interval=0s timeout=300s (wlm-cron-stop-interval-0s)
[root@TVM2 ~]# ps -ef | grep -i workloadmgr-cron
root     15379 14383  0 08:27 pts/0    00:00:00 grep --color=auto -i workloadmgr 
```

### Backup old configuration data

Take a backup of the conf files on all TVM nodes.

```
tar -czvf tvault_backup.tar.gz /etc/tvault /etc/tvault-config /etc/workloadmgr
cp tvault_backup.tar.gz /root/ 
```

### Setup Python3.8 virtual environment <a href="#hardbreak-setup-python3.8-virtual-environment" id="hardbreak-setup-python3.8-virtual-environment"></a>

Check if Python 3.8 virtual environment exists on the T4O nodes

```
ls -al /home/stack/myansible_3.8
```

If the virtual environment does not exist, perform the below steps on the T4O nodes

```
yum-config-manager --disable bintray-rabbitmq-server
yum-config-manager --disable mariadb
yum -y groupinstall "Development Tools"
yum -y install openssl-devel bzip2-devel libffi-devel xz-devel 
wget https://www.python.org/ftp/python/3.8.12/Python-3.8.12.tgz 
tar xvf Python-3.8.12.tgz
cd Python-3.8*/
./configure --enable-optimizations
sudo make altinstall
# Create the Python3.8 virtual env
cd /home/stack/
virtualenv -p /usr/local/bin/python3.8 myansible_3.8 --system-site-packages
source /home/stack/myansible_3.8/bin/activate
pip3 install pip --upgrade
pip3 install setuptools --upgrade
pip3 install jinja2 ansible>=2.9.0 configobj pbr
```

### Setup Python3.6 virtual environment <a href="#setup-python3.6-virtual-environment" id="setup-python3.6-virtual-environment"></a>

Activate the Python3.6 virtual environment on all T4O nodes for wlm services upgrade

```
source /home/stack/myansible/bin/activate
```

#### \[T4O 4.0 to T4O 4.1 only] uninstall Ansible

Ansible doesn't support the upgrade from previous versions to the latest one (2.10.4) and needs to be uninstalled for that reason

```
pip3 uninstall ansible
```

#### **Upgrade pip package**

Run the following command on all TVM nodes to upgrade the pip package

```
pip3 install --upgrade pip
```

## Set pip package repository env variable

```
export PIP_EXTRA_INDEX_URL=https://pypi.fury.io/triliodata-4-1/
```

## **Upgrade s3fuse/tvault-object-store**

#### Major Upgrade

Run the following commands on all TVM nodes to upgrade s3fuse and its dependent packages.

```shell
source /home/stack/myansible/bin/activate 
systemctl stop tvault-object-store
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL s3fuse --upgrade --no-cache-dir
rm -rf /var/triliovault/*
```

#### Hotfix Upgrade

Run the following commands on all TVM nodes to upgrade s3fuse packages only.

```shell
source /home/stack/myansible/bin/activate 
systemctl stop tvault-object-store
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL s3fuse --upgrade --no-cache-dir --no-deps
rm -rf /var/triliovault/*
```

## **Upgrade tvault-configurator**

{% hint style="info" %}
Post upgrade, the password for T4O configurator will be reset to the default one i.e. 'password' for user 'admin'.\
Reset T4O configurator password after the upgrade.
{% endhint %}

{% hint style="info" %}
Make sure the correct virtual environment(myansible\_3.8) has been activated
{% endhint %}

#### Major Upgrade

Run the following command on all TVM nodes to upgrade tvault-configurator and its dependent packages.

```shell
source /home/stack/myansible_3.8/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL tvault-configurator --upgrade --no-cache-dir
```

#### Hotfix Upgrade

Run the following command on all TVM nodes to upgrade tvault-configurator packages only.

```shell
source /home/stack/myansible_3.8/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL tvault-configurator --upgrade --no-cache-dir
```

{% hint style="warning" %}
During the update of the tvault-configurator the following error might be shown:

```
ERROR: Command errored out with exit status 1:
command: /home/stack/myansible/bin/python3 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-crie5qno/ansible_086eb28a1523443f802ab202398d361e/setup.py'"'"'; __file__='"'"'/tmp/pip-install-crie5qno/ansible_086eb28a1523443f802ab202398d361e/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-pdd9x77v
cwd: /tmp/pip-install-crie5qno/ansible_086eb28a1523443f802ab202398d361e/
```

This error can be ignored.
{% endhint %}

## **Upgrade workloadmgr**

#### Major Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr and its dependent packages.

```shell
source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL workloadmgr --upgrade --no-cache-dir
```

#### Hotfix Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr packages only.

```shell
source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL workloadmgr --upgrade --no-cache-dir --no-deps
```

## Upgrade workloadmgrclient

#### Major Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr and its dependent packages.

```shell
source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL workloadmgrclient --upgrade --no-cache-dir
```

#### Hotfix Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr packages only.

```shell
source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL workloadmgrclient --upgrade --no-cache-dir --no-deps
```

## Upgrade contegoclient

#### Major Upgrade

Run the upgrade command on all TVM nodes to upgrade contegoclient and its dependent packages.

```shell
source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL contegoclient --upgrade --no-cache-dir
```

#### Hotfix Upgrade

Run the upgrade command on all TVM nodes to upgrade contegoclient packages only.

```shell
source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL contegoclient --upgrade --no-cache-dir --no-deps
```

## Set oslo.messaging version

Using the latest available oslo.messaging version can lead to stuck RPC and API calls.

It is therefore required to fix the oslo.messaging version on the TVM.

```shell
source /home/stack/myansible/bin/activate
pip3 install oslo.messaging==12.1.6 --no-deps
```

## **Post Upgrade Steps**

#### Restore the backed-up config files

```
cd /root 
tar -xzvf tvault_backup.tar.gz -C /
```

#### \[Major Upgrade 4.0 to 4.1 only] Delete wlm-scheduler pcs resource

Delete the wlm-scheduler pcs resource because in 4.1 it is not a part of pcs

```
pcs resource delete wlm-scheduler
```

#### Restart services

Restart the following services on all node(s) using respective commands\\

{% hint style="info" %}
*tvault-object-store restart required only if Trilio is configured with S3 backend storage*
{% endhint %}

```shell
systemctl restart tvault-object-store
systemctl restart wlm-api 
systemctl restart wlm-scheduler
systemctl restart wlm-workloads 
systemctl restart tvault-config
```

**Enable Global Job Scheduler**\
\*\*\*\*Restart pcs resources ***only*** ***on the primary node***

```shell
pcs resource enable wlm-cron
pcs resource restart wlm-cron
```

#### Verify the status of the services

{% hint style="info" %}
tvault-object-store will run only if TVault configured with S3 backend storage
{% endhint %}

```shell
systemctl status wlm-api wlm-scheduler wlm-workloads tvault-config tvault-object-store | grep -E 'Active|loaded'
pcs status
```

Additional check for wlm-cron on the ***primary node***

```shell
systemctl status wlm-cron
ps -ef | grep [w]orkloadmgr-cron
```

The above command should show only 2 processes running: sample below:

```
[root@tvm6 ~]# ps -ef | grep [w]orkloadmgr-cron
nova      8841     1  2 Jul28 ?        00:40:44 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-cron --config-file=/etc/workloadmgr/workloadmgr.conf
nova      8898  8841  0 Jul28 ?        00:07:03 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-cron --config-file=/etc/workloadmgr/workloadmgr.conf
```

Check the mount point using “***df -h***” command

## \[Upgrade to HF1 and higher only] Reconfigure the Trilio Appliance

Trilio for Openstack 4.1 HF1 is introducing several new config parameters, which will be automatically set upon reconfiguration.

## \[RHOSP and Kolla only] Reconfigure the Trilio Appliance

Trilio for Openstack 4.1 is changing the Trilio mount point as follows:

RHOSP 13 & 16.0 & 16.1: `/var/lib/nova/triliovault-mounts`\
Kolla Ansible Ussuri: `/var/trilio/triliovault-mounts`

Reconfiguring the Trilio Appliance will automatically handle this change.

## \[RHOSP and Kolla only] Create the mount bind to the old Trilio Mountpoint

Trilio for Openstack 4.1 is changing the Trilio mount point as follows:

RHOSP 13 & 16.0 & 16.1: `/var/lib/nova/triliovault-mounts`\
Kolla Ansible Ussuri: `/var/trilio/triliovault-mounts`

After reconfiguration of the Trilio Appliance, it is necessary to create a mount bind between the old and new mount points to provide full access to the old Trilio backups.

For RHOSP:

<pre class="language-shell"><code class="lang-shell"><strong>mount --bind /var/lib/nova/triliovault-mounts /var/triliovault-mounts
</strong></code></pre>

For Kolla:

```shell
mount --bind /var/trilio/triliovault-mounts /var/triliovault-mounts
```

To have this change persistent it is recommended to change the fstab accordingly:

For RHOSP:

```shell
echo "/var/lib/nova/triliovault-mounts /var/triliovault-mounts    none    bind    0 0" >> /etc/fstab
```

For Kolla:

```shell
echo "/var/trilio/triliovault-mounts /var/triliovault-mounts	none bind	0 0" >> /etc/fstab
```

## \[RHOSP and Kolla only] Verify nova UID/GID for nova user on the Appliance

Red Hat OpenStack and Kolla Ansible Openstack are using the nova UID/GID of 42436 inside their containers instead of 162:162 which is the standard in other Openstack environments.

Please verify that the nova UID/GID on the Trilio Appliance is still 42436,

```
[root@TVM1 ~]# id nova
uid=42436(nova) gid=42436(nova) groups=42436(nova),990(libvirt),36(kvm)
```

In case of the UID/GID is changed back to 162:162 follow these steps to set it back to 42436:42436.

1. Download the shell script that will change the user id
2. Assign executable permissions
3. Execute the script
4. Verify that `nova` user and group ids have changed to '42436'

```
## Download the shell script
$ curl -O https://raw.githubusercontent.com/trilioData/triliovault-cfg-scripts/master/common/nova_userid.sh

## Assign executable permissions
$ chmod +x nova_userid.sh

## Execute the shell script to change 'nova' user and group id to '42436'
$ ./nova_userid.sh

## Ignore any errors and verify that 'nova' user and group id has changed to '42436'
$ id nova
   uid=42436(nova) gid=42436(nova) groups=42436(nova),990(libvirt),36(kvm)
```
