Online upgrade Trilio Appliance

This describes the upgrade process from Trilio 4.0 or Trilio 4.0SP1 to Trilio 4.1 GA or its hotfix releases.

Generic Pre-requisites

The prerequisites should already be fulfilled from upgrading the Trilio components on the Controller and Compute nodes.

  • Please ensure to complete the upgrade of all the Trilio components on the Openstack controller & compute nodes before starting the rolling upgrade of TVM.

  • The mentioned Gemfury repository should be accessible from TVault VM.

  • Please ensure the following points before starting the upgrade process:

    • No snapshot OR restore to be running.

    • Global job-scheduler should be disabled.

    • wlm-cron should be disabled and any lingering process should be killed.

Deactivating the wlm-cron service

The following sets of commands will disable the wlm-cron service and verify that it has been completely shut down.

pcs resource disable wlm-cron                                                                                                                                                                                                                                                                                                                                                        -cron

Verify if the service is shut down with the below set of commands and expected output:

[root@TVM2 ~]# systemctl status wlm-cron
● wlm-cron.service - workload's scheduler cron service
   Loaded: loaded (/etc/systemd/system/wlm-cron.service; disabled; vendor preset                                                                                                                                                                                                                                                                                                                                                           : disabled)
   Active: inactive (dead)

Jun 11 08:27:06 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:06 - INFO - 1...t
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 140686268624368 Child 11389 ki...5
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - 1...5
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: Shutting down thread pool
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - S...l
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: Stopping the threads
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - S...s
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: All threads are stopped succes...y
Jun 11 08:27:07 TVM2 workloadmgr-cron[11115]: 11-06-2021 08:27:07 - INFO - A...y
Jun 11 08:27:09 TVM2 systemd[1]: Stopped workload's scheduler cron service.
Hint: Some lines were ellipsized, use -l to show in full.
[root@TVM2 ~]# pcs resource show wlm-cron
 Resource: wlm-cron (class=systemd type=wlm-cron)
  Meta Attrs: target-role=Stopped
  Operations: monitor interval=30s on-fail=restart timeout=300s (wlm-cron-monito                                                                                                                                                                                                                                                                                                                                                           r-interval-30s)
              start interval=0s on-fail=restart timeout=300s (wlm-cron-start-int                                                                                                                                                                                                                                                                                                                                                           erval-0s)
              stop interval=0s timeout=300s (wlm-cron-stop-interval-0s)
[root@TVM2 ~]# ps -ef | grep -i workloadmgr-cron
root     15379 14383  0 08:27 pts/0    00:00:00 grep --color=auto -i workloadmgr 

Backup old configuration data

Take a backup of the conf files on all TVM nodes.

Setup Python3.8 virtual environment

Check if Python 3.8 virtual environment exists on the T4O nodes

If the virtual environment does not exist, perform the below steps on the T4O nodes

Setup Python3.6 virtual environment

Activate the Python3.6 virtual environment on all T4O nodes for wlm services upgrade

[T4O 4.0 to T4O 4.1 only] uninstall Ansible

Ansible doesn't support the upgrade from previous versions to the latest one (2.10.4) and needs to be uninstalled for that reason

Upgrade pip package

Run the following command on all TVM nodes to upgrade the pip package

Set pip package repository env variable

Upgrade s3fuse/tvault-object-store

Major Upgrade

Run the following commands on all TVM nodes to upgrade s3fuse and its dependent packages.

Hotfix Upgrade

Run the following commands on all TVM nodes to upgrade s3fuse packages only.

Upgrade tvault-configurator

Post upgrade, the password for T4O configurator will be reset to the default one i.e. 'password' for user 'admin'. Reset T4O configurator password after the upgrade.

Make sure the correct virtual environment(myansible_3.8) has been activated

Major Upgrade

Run the following command on all TVM nodes to upgrade tvault-configurator and its dependent packages.

Hotfix Upgrade

Run the following command on all TVM nodes to upgrade tvault-configurator packages only.

Upgrade workloadmgr

Major Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr and its dependent packages.

Hotfix Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr packages only.

Upgrade workloadmgrclient

Major Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr and its dependent packages.

Hotfix Upgrade

Run the upgrade command on all TVM nodes to upgrade workloadmgr packages only.

Upgrade contegoclient

Major Upgrade

Run the upgrade command on all TVM nodes to upgrade contegoclient and its dependent packages.

Hotfix Upgrade

Run the upgrade command on all TVM nodes to upgrade contegoclient packages only.

Set oslo.messaging version

Using the latest available oslo.messaging version can lead to stuck RPC and API calls.

It is therefore required to fix the oslo.messaging version on the TVM.

Post Upgrade Steps

Restore the backed-up config files

[Major Upgrade 4.0 to 4.1 only] Delete wlm-scheduler pcs resource

Delete the wlm-scheduler pcs resource because in 4.1 it is not a part of pcs

Restart services

Restart the following services on all node(s) using respective commands\

tvault-object-store restart required only if Trilio is configured with S3 backend storage

Enable Global Job Scheduler ****Restart pcs resources only on the primary node

Verify the status of the services

tvault-object-store will run only if TVault configured with S3 backend storage

Additional check for wlm-cron on the primary node

The above command should show only 2 processes running: sample below:

Check the mount point using “df -h” command

[Upgrade to HF1 and higher only] Reconfigure the Trilio Appliance

Trilio for Openstack 4.1 HF1 is introducing several new config parameters, which will be automatically set upon reconfiguration.

[RHOSP and Kolla only] Reconfigure the Trilio Appliance

Trilio for Openstack 4.1 is changing the Trilio mount point as follows:

RHOSP 13 & 16.0 & 16.1: /var/lib/nova/triliovault-mounts Kolla Ansible Ussuri: /var/trilio/triliovault-mounts

Reconfiguring the Trilio Appliance will automatically handle this change.

[RHOSP and Kolla only] Create the mount bind to the old Trilio Mountpoint

Trilio for Openstack 4.1 is changing the Trilio mount point as follows:

RHOSP 13 & 16.0 & 16.1: /var/lib/nova/triliovault-mounts Kolla Ansible Ussuri: /var/trilio/triliovault-mounts

After reconfiguration of the Trilio Appliance, it is necessary to create a mount bind between the old and new mount points to provide full access to the old Trilio backups.

For RHOSP:

For Kolla:

To have this change persistent it is recommended to change the fstab accordingly:

For RHOSP:

For Kolla:

[RHOSP and Kolla only] Verify nova UID/GID for nova user on the Appliance

Red Hat OpenStack and Kolla Ansible Openstack are using the nova UID/GID of 42436 inside their containers instead of 162:162 which is the standard in other Openstack environments.

Please verify that the nova UID/GID on the Trilio Appliance is still 42436,

In case of the UID/GID is changed back to 162:162 follow these steps to set it back to 42436:42436.

  1. Download the shell script that will change the user id

  2. Assign executable permissions

  3. Execute the script

  4. Verify that nova user and group ids have changed to '42436'

Last updated

Was this helpful?