Post Installation Health-Check

After the installation and configuration of Trilio for Openstack did succeed the following steps can be done to verify that the Trilio installation is healthy.

On the Controller node

Make sure below containers are in a running state, triliovault-wlm-cron would be running on only one of the controllers in case of multi-controller setup.

  • triliovault_datamover_api

  • triliovault_wlm_api

  • triliovault_wlm_scheduler

  • triliovault_wlm_workloads

  • triliovault-wlm-cron

If the containers are in a restarting state or not listed by the following command then your deployment is not done correctly. Please recheck if you followed the complete documentation.

[root@overcloudtrain5-controller-0 /]# podman ps  | grep trilio-
76511a257278  undercloudqa162.ctlplane.trilio.local:8787/trilio/trilio-horizon-plugin:<CONTAINER-TAG-VERSION>-rhosp16.2          kolla_start           12 days ago   Up 12 days ago           horizon
5c5acec33392  cluster.common.tag/trilio-wlm:pcmklatest                                                          /bin/bash /usr/lo...  7 days ago    Up 7 days ago            triliovault-wlm-cron-podman-0
8dc61a674a7f  undercloudqa162.ctlplane.trilio.local:8787/trilio/trilio-datamover-api:<CONTAINER-TAG-VERSION>-rhosp16.2           kolla_start           7 days ago    Up 7 days ago            triliovault_datamover_api
a945fbf80554  undercloudqa162.ctlplane.trilio.local:8787/trilio/trilio-wlm:<CONTAINER-TAG-VERSION>-rhosp16.2                     kolla_start           7 days ago    Up 7 days ago            triliovault_wlm_scheduler
402c9fdb3647  undercloudqa162.ctlplane.trilio.local:8787/trilio/trilio-wlm:<CONTAINER-TAG-VERSION>-rhosp16.2                     kolla_start           7 days ago    Up 6 days ago            triliovault_wlm_workloads
f9452e4b3d14  undercloudqa162.ctlplane.trilio.local:8787/trilio/trilio-wlm:<CONTAINER-TAG-VERSION>-rhosp16.2                     kolla_start           7 days ago    Up 6 days ago            triliovault_wlm_api

After successful deployment, triliovault-wlm-cron service would get added in pcs cluster as a cluster resource, you can verify through pcs status command

[root@overcloudtrain5-controller-0 /]# pcs status
Cluster name: tripleo_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: overcloudtrain5-controller-0 (version 2.0.5-9.el8_4.3-ba59be7122) - partition with quorum
  * Last updated: Mon Jul 24 11:19:05 2023
  * Last change:  Mon Jul 17 10:38:45 2023 by root via cibadmin on overcloudtrain5-controller-0
  * 4 nodes configured
  * 14 resource instances configured

Node List:
  * Online: [ overcloudtrain5-controller-0 ]
  * GuestOnline: [ galera-bundle-0@overcloudtrain5-controller-0 rabbitmq-bundle-0@overcloudtrain5-controller-0 redis-bundle-0@overcloudtrain5-controller-0 ]

Full List of Resources:
  * ip-172.30.6.27      (ocf::heartbeat:IPaddr2):        Started overcloudtrain5-controller-0
  * ip-172.30.6.16      (ocf::heartbeat:IPaddr2):        Started overcloudtrain5-controller-0
  * Container bundle: haproxy-bundle [cluster.common.tag/openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0   (ocf::heartbeat:podman):         Started overcloudtrain5-controller-0
  * Container bundle: galera-bundle [cluster.common.tag/openstack-mariadb:pcmklatest]:
    * galera-bundle-0   (ocf::heartbeat:galera):         Master overcloudtrain5-controller-0
  * Container bundle: rabbitmq-bundle [cluster.common.tag/openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):       Started overcloudtrain5-controller-0
  * Container bundle: redis-bundle [cluster.common.tag/openstack-redis:pcmklatest]:
    * redis-bundle-0    (ocf::heartbeat:redis):  Master overcloudtrain5-controller-0
  * Container bundle: openstack-cinder-volume [cluster.common.tag/openstack-cinder-volume:pcmklatest]:
    * openstack-cinder-volume-podman-0  (ocf::heartbeat:podman):         Started overcloudtrain5-controller-0
  * Container bundle: triliovault-wlm-cron [cluster.common.tag/trilio-wlm:pcmklatest]:
    * triliovault-wlm-cron-podman-0     (ocf::heartbeat:podman):         Started overcloudtrain5-controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Verify the HAproxy configuration under:

/var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg

On Compute node

Make sure the Trilio Datamover container is in running state and no other Trilio container is deployed on compute nodes.

[root@overcloudtrain5-novacompute-0 heat-admin]# podman  ps | grep -i datamover
c750a8d0471f  undercloudqa162.ctlplane.trilio.local:8787/trilio/trilio-datamover:<CONTAINER-TAG-VERSION>-rhosp16.2              kolla_start  7 days ago   Up 7 days ago           triliovault_datamover

Check if provided backup target is mounted well on Compute host.

[root@overcloudtrain5-novacompute-0 heat-admin]# df -h  | grep triliovault-mounts
172.30.1.9:/mnt/rhosptargetnfs  7.0T  5.1T  2.0T  72% /var/lib/nova/triliovault-mounts/L21udC9yaG9zcHRhcmdldG5mcw==

On the node with Horizon service

Make sure the horizon container is in a running state. Please note that the Horizon container is replaced with Trilio's Horizon container. This container will have the latest OpenStack horizon + Trilio horizon plugin.

[root@overcloudtrain5-controller-0 heat-admin]# podman ps  | grep horizon
76511a257278  undercloudqa162.ctlplane.trilio.local:8787/trilio/trilio-horizon-plugin:<CONTAINER-TAG-VERSION>-rhosp16.2          kolla_start           12 days ago   Up 12 days ago           horizon

If the Trilio Horizon container is in the restarted state on RHOSP 16.1.8/RHSOP 16.2.4 then use the below workaround

## Either of the below workarounds should be performed on all the controller nodes where issue occurs for horizon pod.

option-1: Restart the memcached service on controller using systemctl (command: systemctl restart tripleo_memcached.service)

option-2: Restart the memcached pod (command: podman restart memcached)

Last updated