Online upgrade Trilio Appliance

This describes the upgrade process from Trilio 4.0 or Trilio 4.0SP1 to Trilio 4.1 GA or its hotfix releases.

Kolla Ansible Openstack only: The mount point for the Trilio Backup Target has changed in Trilio 4.1. A reconfiguration after the upgrade is required.

Generic Pre-requisites

The prerequisites should already be fulfilled from upgrading the Trilio components on the Controller and Compute nodes.

  • Please ensure to complete the upgrade of all the Trilio components on the Openstack controller & compute nodes before starting the rolling upgrade of TVM.

  • The mentioned Gemfury repository should be accessible from TVault VM.

  • Please ensure the following points before starting the upgrade process:

    • No snapshot OR restore to be running.

    • Global job-scheduler should be disabled.

    • wlm-cron should be disabled and any lingering process should be killed.

Deactivating the wlm-cron service

The following sets of commands will disable the wlm-cron service and verify that it has been completely shut down.

pcs resource disable wlm-cron                                                                                                                                                                                                                                                                                                                                                        -cron

Verify if the service is shut down with the below set of commands and expected output:

[root@TVM2 ~]# systemctl status wlm-cron
● wlm-cron.service - workload's scheduler cron service
   Loaded: loaded (/etc/systemd/system/wlm-cron.service; disabled; vendor preset                                                                                                                                                                                                                                                                                                                                                           : disabled)
   Active: inactive (dead)

Jun 11 08:27:06 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:06 - INFO - 1...t
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 140686268624368 Child 11389 ki...5
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - 1...5
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: Shutting down thread pool
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - S...l
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: Stopping the threads
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - S...s
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: All threads are stopped succes...y
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - A...y
Jun 11 08:27:09 TVM2 systemd[1]: Stopped workload's scheduler cron service.
Hint: Some lines were ellipsized, use -l to show in full.
[root@TVM2 ~]# pcs resource show wlm-cron
 Resource: wlm-cron (class=systemd type=wlm-cron)
  Meta Attrs: target-role=Stopped
  Operations: monitor interval=30s on-fail=restart timeout=300s (wlm-cron-monito                                                                                                                                                                                                                                                                                                                                                           r-interval-30s)
              start interval=0s on-fail=restart timeout=300s (wlm-cron-start-int                                                                                                                                                                                                                                                                                                                                                           erval-0s)
              stop interval=0s timeout=300s (wlm-cron-stop-interval-0s)
[root@TVM2 ~]# ps -ef | grep -i workloadmgr-cron
root     15379 14383  0 08:27 pts/0    00:00:00 grep --color=auto -i workloadmgr 

Backup old configuration data

Take a backup of the conf files on all TVM nodes.

tar -czvf tvault_backup.tar.gz /etc/tvault /etc/tvault-config /etc/workloadmgr
cp tvault_backup.tar.gz /root/ 

Setup Python3.8 virtual environment

Check if Python 3.8 virtual environment exists on the T4O nodes

ls -al /home/stack/myansible_3.8

If the virtual environment does not exist, perform the below steps on the T4O nodes

yum-config-manager --disable bintray-rabbitmq-server
yum-config-manager --disable mariadb
yum -y groupinstall "Development Tools"
yum -y install openssl-devel bzip2-devel libffi-devel xz-devel 
wget https://www.python.org/ftp/python/3.8.12/Python-3.8.12.tgz 
tar xvf Python-3.8.12.tgz
cd Python-3.8*/
./configure --enable-optimizations
sudo make altinstall
# Create the Python3.8 virtual env
cd /home/stack/
virtualenv -p /usr/local/bin/python3.8 myansible_3.8 --system-site-packages
source /home/stack/myansible_3.8/bin/activate
pip3 install pip --upgrade
pip3 install setuptools --upgrade
pip3 install jinja2 ansible>=2.9.0 configobj pbr

Setup Python3.6 virtual environment

Activate the Python3.6 virtual environment on all T4O nodes for wlm services upgrade

source /home/stack/myansible/bin/activate

[T4O 4.0 to T4O 4.1 only] uninstall Ansible

Ansible doesn't support the upgrade from previous versions to the latest one (2.10.4) and needs to be uninstalled for that reason

pip3 uninstall ansible

Upgrade pip package

Run the following command on all TVM nodes to upgrade the pip package

pip3 install --upgrade pip

Set pip package repository env variable

export PIP_EXTRA_INDEX_URL=https://pypi.fury.io/triliodata-4-1/

Upgrade s3fuse/tvault-object-store

Major Upgrade

Run the following commands on all TVM nodes to upgrade s3fuse and its dependent packages.

source /home/stack/myansible/bin/activate 
systemctl stop tvault-object-store
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL s3fuse --upgrade --no-cache-dir
rm -rf /var/triliovault/*

Hotfix Upgrade

Run the following commands on all TVM nodes to upgrade s3fuse packages only.

source /home/stack/myansible/bin/activate 
systemctl stop tvault-object-store
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL s3fuse --upgrade --no-cache-dir --no-deps
rm -rf /var/triliovault/*

Upgrade tvault-configurator

Post upgrade, the password for T4O configurator will be reset to the default one i.e. 'password' for user 'admin'. Reset T4O configurator password after the upgrade.

Make sure the correct virtual environment(myansible_3.8) has been activated

Major Upgrade

Run the following command on all TVM nodes to upgrade tvault-configurator and its dependent packages.

source /home/stack/myansible_3.8/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL tvault-configurator --upgrade --no-cache-dir

Hotfix Upgrade

Run the following command on all TVM nodes to upgrade tvault-configurator packages only.

source /home/stack/myansible_3.8/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL tvault-configurator --upgrade --no-cache-dir

During the update of the tvault-configurator the following error might be shown:

ERROR: Command errored out with exit status 1:
command: /home/stack/myansible/bin/python3 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-crie5qno/ansible_086eb28a1523443f802ab202398d361e/setup.py'"'"'; __file__='"'"'/tmp/pip-install-crie5qno/ansible_086eb28a1523443f802ab202398d361e/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-pdd9x77v
cwd: /tmp/pip-install-crie5qno/ansible_086eb28a1523443f802ab202398d361e/

This error can be ignored.

Upgrade workloadmgr

Major Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr and its dependent packages.

source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL workloadmgr --upgrade --no-cache-dir

Hotfix Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr packages only.

source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL workloadmgr --upgrade --no-cache-dir --no-deps

Upgrade workloadmgrclient

Major Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr and its dependent packages.

source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL workloadmgrclient --upgrade --no-cache-dir

Hotfix Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr packages only.

source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL workloadmgrclient --upgrade --no-cache-dir --no-deps

Upgrade contegoclient

Major Upgrade

Run the upgrade command on all TVM nodes to upgrade contegoclient and its dependent packages.

source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL contegoclient --upgrade --no-cache-dir

Hotfix Upgrade

Run the upgrade command on all TVM nodes to upgrade contegoclient packages only.

source /home/stack/myansible/bin/activate
pip3 install --extra-index-url $PIP_EXTRA_INDEX_URL contegoclient --upgrade --no-cache-dir --no-deps

Set oslo.messaging version

Using the latest available oslo.messaging version can lead to stuck RPC and API calls.

It is therefore required to fix the oslo.messaging version on the TVM.

source /home/stack/myansible/bin/activate
pip3 install oslo.messaging==12.1.6 --no-deps

Post Upgrade Steps

Restore the backed-up config files

cd /root 
tar -xzvf tvault_backup.tar.gz -C /

[Major Upgrade 4.0 to 4.1 only] Delete wlm-scheduler pcs resource

Delete the wlm-scheduler pcs resource because in 4.1 it is not a part of pcs

pcs resource delete wlm-scheduler

Restart services

Restart the following services on all node(s) using respective commands\

tvault-object-store restart required only if Trilio is configured with S3 backend storage

systemctl restart tvault-object-store
systemctl restart wlm-api 
systemctl restart wlm-scheduler
systemctl restart wlm-workloads 
systemctl restart tvault-config

Enable Global Job Scheduler ****Restart pcs resources only on the primary node

pcs resource enable wlm-cron
pcs resource restart wlm-cron

Verify the status of the services

tvault-object-store will run only if TVault configured with S3 backend storage

systemctl status wlm-api wlm-scheduler wlm-workloads tvault-config tvault-object-store | grep -E 'Active|loaded'
pcs status

Additional check for wlm-cron on the primary node

systemctl status wlm-cron
ps -ef | grep [w]orkloadmgr-cron

The above command should show only 2 processes running: sample below:

[root@tvm6 ~]# ps -ef | grep [w]orkloadmgr-cron
nova      8841     1  2 Jul28 ?        00:40:44 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-cron --config-file=/etc/workloadmgr/workloadmgr.conf
nova      8898  8841  0 Jul28 ?        00:07:03 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-cron --config-file=/etc/workloadmgr/workloadmgr.conf

Check the mount point using “df -h” command

[Upgrade to HF1 and higher only] Reconfigure the Trilio Appliance

Trilio for Openstack 4.1 HF1 is introducing several new config parameters, which will be automatically set upon reconfiguration.

[RHOSP and Kolla only] Reconfigure the Trilio Appliance

Trilio for Openstack 4.1 is changing the Trilio mount point as follows:

RHOSP 13 & 16.0 & 16.1: /var/lib/nova/triliovault-mounts Kolla Ansible Ussuri: /var/trilio/triliovault-mounts

Reconfiguring the Trilio Appliance will automatically handle this change.

[RHOSP and Kolla only] Create the mount bind to the old Trilio Mountpoint

Trilio for Openstack 4.1 is changing the Trilio mount point as follows:

RHOSP 13 & 16.0 & 16.1: /var/lib/nova/triliovault-mounts Kolla Ansible Ussuri: /var/trilio/triliovault-mounts

After reconfiguration of the Trilio Appliance, it is necessary to create a mount bind between the old and new mount points to provide full access to the old Trilio backups.

For RHOSP:

mount --bind /var/lib/nova/triliovault-mounts /var/triliovault-mounts

For Kolla:

mount --bind /var/trilio/triliovault-mounts /var/triliovault-mounts

To have this change persistent it is recommended to change the fstab accordingly:

For RHOSP:

echo "/var/lib/nova/triliovault-mounts /var/triliovault-mounts    none    bind    0 0" >> /etc/fstab

For Kolla:

echo "/var/trilio/triliovault-mounts /var/triliovault-mounts	none bind	0 0" >> /etc/fstab

[RHOSP and Kolla only] Verify nova UID/GID for nova user on the Appliance

Red Hat OpenStack and Kolla Ansible Openstack are using the nova UID/GID of 42436 inside their containers instead of 162:162 which is the standard in other Openstack environments.

Please verify that the nova UID/GID on the Trilio Appliance is still 42436,

[root@TVM1 ~]# id nova
uid=42436(nova) gid=42436(nova) groups=42436(nova),990(libvirt),36(kvm)

In case of the UID/GID is changed back to 162:162 follow these steps to set it back to 42436:42436.

  1. Download the shell script that will change the user id

  2. Assign executable permissions

  3. Execute the script

  4. Verify that nova user and group ids have changed to '42436'

## Download the shell script
$ curl -O https://raw.githubusercontent.com/trilioData/triliovault-cfg-scripts/master/common/nova_userid.sh

## Assign executable permissions
$ chmod +x nova_userid.sh

## Execute the shell script to change 'nova' user and group id to '42436'
$ ./nova_userid.sh

## Ignore any errors and verify that 'nova' user and group id has changed to '42436'
$ id nova
   uid=42436(nova) gid=42436(nova) groups=42436(nova),990(libvirt),36(kvm)

Last updated