Post Installation Health-Check

After the installation and configuration of Trilio for Openstack did succeed the following steps can be done to verify that the Trilio installation is healthy.

Verify the Trilio Appliance services are up

Trilio is using 3 main services on the Trilio Appliance:

  • wlm-api

  • wlm-scheduler

  • wlm-workloads

Those can be verified to be up and running using the systemctl status command.

systemctl status wlm-api
######
 wlm-api.service - Cluster Controlled wlm-api
   Loaded: loaded (/etc/systemd/system/wlm-api.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/wlm-api.service.d
           └─50-pacemaker.conf
   Active: active (running) since Wed 2020-04-22 09:17:05 UTC; 1 day 2h ago
 Main PID: 21265 (python)
    Tasks: 1
   CGroup: /system.slice/wlm-api.service
           └─21265 /home/rhv/myansible/bin/python /usr/bin/workloadmgr-api --config-file=/etc/workloadmgr/workloadmgr.conf
systemctl status wlm-scheduler
######
 wlm-scheduler.service - Cluster Controlled wlm-scheduler
   Loaded: loaded (/etc/systemd/system/wlm-scheduler.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/wlm-scheduler.service.d
           └─50-pacemaker.conf
   Active: active (running) since Wed 2020-04-22 09:17:17 UTC; 1 day 2h ago
 Main PID: 21512 (python)
    Tasks: 1
   CGroup: /system.slice/wlm-scheduler.service
           └─21512 /home/rhv/myansible/bin/python /usr/bin/workloadmgr-scheduler --config-file=/etc/workloadmgr/workloadmgr.conf
systemctl status wlm-workloads
######
 wlm-workloads.service - workloadmanager workloads service
   Loaded: loaded (/etc/systemd/system/wlm-workloads.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-04-22 09:15:43 UTC; 1 day 2h ago
 Main PID: 20079 (python)
    Tasks: 33
   CGroup: /system.slice/wlm-workloads.service
           ├─20079 /home/rhv/myansible/bin/python /usr/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
           ├─20180 /home/rhv/myansible/bin/python /usr/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
           [...]
           ├─20181 /home/rhv/myansible/bin/python /usr/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
           ├─20233 /home/rhv/myansible/bin/python /usr/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
           ├─20236 /home/rhv/myansible/bin/python /usr/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
           └─20237 /home/rhv/myansible/bin/python /usr/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf

Check the Trilio pacemaker and nginx cluster

The second component to check the Trilio Appliance's health is the nginx and pacemaker cluster.

pcs status
######
Cluster name: triliovault

WARNINGS:
Corosync and pacemaker node names do not match (IPs used in setup?)
Stack: corosync
Current DC: om_tvm (version 1.1.19-8.el7_6.1-c3c624ea3d) -
partition with quorum
Last updated: Wed Dec 5 12:25:02 2018
Last change: Wed Dec 5 09:20:08 2018 by root via cibadmin on om_tvm
1 node configured
4 resources configured

Online: [ om_tvm ]
Full list of resources:
virtual_ip (ocf::'heartbeat:IPaddr2): Started om_tvm
wlm-api (systemd:wlm-api): Started om_tvm
wlm-scheduler (systemd:wlm-scheduler): Started om_tvm
Clone Set: lb_nginx-clone [lb_nginx]
Started: [ om_tvm ]
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

Verify API connectivity of the Trilio Appliance

Checking the availability of the Trilio API on the chosen endpoints is recommended.

The following example curl command lists the available workload-types and verifies that the connection is available and working:

curl http://10.10.2.34:8780/v1/8e16700ae3614da4ba80a4e57d60cdb9/workload_types/detail -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-workloadmgrclient" -H "Accept: application/json" -H "X-Auth-Token: gAAAAABe40NVFEtJeePpk1F9QGGh1LiGnHJVLlgZx9t0HRrK9rC5vqKZJRkpAcW1oPH6Q9K9peuHiQrBHEs1-g75Na4xOEESR0LmQJUZP6n37fLfDL_D-hlnjHJZ68iNisIP1fkm9FGSyoyt6IqjO9E7_YVRCTCqNLJ67ZkqHuJh1CXwShvjvjw

Please check the API guide for more commands and how to generate the X-Auth-Token.

Verify the tvault-contego services are up and running

The tvault-contego service is the Data Mover that got installed on all compute nodes nodes. It is recommended to check its status after the installation.

openstack compute service list
+----+----------------------+--------------------+----------+----------+-------+----------------------------+
| ID | Binary               | Host               | Zone     | Status   | State | Updated At                 |
+----+----------------------+--------------------+----------+----------+-------+----------------------------+
|  7 | nova-conductor       | upstreamcontroller | internal | enabled  | up    | 2020-06-12T09:13:55.000000 |
|  8 | nova-scheduler       | upstreamcontroller | internal | enabled  | up    | 2020-06-12T09:13:54.000000 |
|  9 | nova-consoleauth     | upstreamcontroller | internal | enabled  | up    | 2020-06-12T09:13:52.000000 |
| 10 | nova-compute         | upstreamcompute1   | US-East  | enabled  | up    | 2020-06-12T09:13:50.000000 |
| 11 | nova-compute         | upstreamcompute2   | US-West  | enabled  | up    | 2020-06-12T09:13:51.000000 |
| 12 | nova-contego_3.0.174 | upstreamcompute2   | internal | enabled  | up    | 2020-06-12T09:13:51.000000 |
| 13 | nova-contego_3.0.174 | upstreamcompute1   | internal | enabled  | up    | 2020-06-12T09:13:47.000000 |
+----+---------------------+--------------------+----------+----------+-------+----------------------------+
[root@upstreamcompute1 ~]# systemctl status tvault-contego.service
 tvault-contego.service - Tvault contego
   Loaded: loaded (/etc/systemd/system/tvault-contego.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-06-10 10:07:28 EDT; 1 day 19h ago
 Main PID: 10384 (python)
    Tasks: 21
   CGroup: /system.slice/tvault-contego.service
           └─10384 /usr/bin/python /usr/bin/tvault-contego --config-file=/etc...

Jun 12 03:15:33 upstreamcompute1 python[10384]: libvirt: QEMU Driver error :...d
Jun 12 03:15:33 upstreamcompute1 python[10384]: libvirt: QEMU Driver error :...d
Jun 12 03:16:11 upstreamcompute1 python[10384]: libvirt: QEMU Driver error :...d
Jun 12 03:16:31 upstreamcompute1 sudo[13977]:     nova : TTY=unknown ; PWD=/...n
Jun 12 03:16:33 upstreamcompute1 sudo[14004]:     nova : TTY=unknown ; PWD=/ ...
Jun 12 05:15:33 upstreamcompute1 python[10384]: libvirt: QEMU Driver error :...d
Jun 12 05:15:33 upstreamcompute1 python[10384]: libvirt: QEMU Driver error :...d
Jun 12 05:16:11 upstreamcompute1 python[10384]: libvirt: QEMU Driver error :...d
Jun 12 05:16:29 upstreamcompute1 sudo[23356]:     nova : TTY=unknown ; PWD=/...n
Jun 12 05:16:32 upstreamcompute1 sudo[23422]:     nova : TTY=unknown ; PWD=/ ...
Hint: Some lines were ellipsized, use -l to show in full.

Verify the NFS Volume is correctly mounted

Trilio mounts the NFS Backup Target to the Trilio Appliance and Compute nodes.

To verify those are correctly mounted it is recommended to do the following checks.

First df -h looking for /var/triliovault-mounts/<hash-value>

df -h
######
Filesystem                                      Size  Used Avail Use% Mounted on
devtmpfs                                         63G     0   63G   0% /dev
tmpfs                                            63G   16K   63G   1% /dev/shm
tmpfs                                            63G   35M   63G   1% /run
tmpfs                                            63G     0   63G   0% /sys/fs/cgroup
/dev/mapper/rhvh-rhvh--4.3.8.1--0.20200126.0+1  7.1T  3.7G  6.8T   1% /
/dev/sda2                                       976M  198M  712M  22% /boot
/dev/mapper/rhvh-var                             15G  1.9G   12G  14% /var
/dev/mapper/rhvh-home                           976M  2.6M  907M   1% /home
/dev/mapper/rhvh-tmp                            976M  2.6M  907M   1% /tmp
/dev/mapper/rhvh-var_log                        7.8G  230M  7.2G   4% /var/log
/dev/mapper/rhvh-var_log_audit                  2.0G   17M  1.8G   1% /var/log/audit
/dev/mapper/rhvh-var_crash                      9.8G   37M  9.2G   1% /var/crash
30.30.1.4:/rhv_backup                           2.0T  5.3G  1.9T   1% /var/triliovault-mounts/MzAuMzAuMS40Oi9yaHZfYmFja3Vw
30.30.1.4:/rhv_data                             2.0T   37G  2.0T   2% /rhev/data-center/mnt/30.30.1.4:_rhv__data
tmpfs                                            13G     0   13G   0% /run/user/0
30.30.1.4:/rhv_iso                              2.0T   37G  2.0T   2% /rhev/data-center/mnt/30.30.1.4:_rhv__iso

Secondly do a read / write / delete test as the user nova:nova (uid = 36 / gid = 36) from the Trilio Appliance and the RHV-Host.

su nova
######
[nova@tvm MTAuMTAuMi4yMDovdXBzdHJlYW0=]$ touch foo
[nova@tvm MTAuMTAuMi4yMDovdXBzdHJlYW0=]$ ll
total 24
drwxr-xr-x  3 nova nova 4096 Apr  2 17:27 contego_tasks
-rw-r--r--  1 nova nova    0 Apr 23 12:25 foo
drwxr-xr-x  2 nova nova 4096 Apr  2 15:38 test-cloud-id
drwxr-xr-x 10 nova nova 4096 Apr 22 11:00 workload_1540698c-8e22-4dd1-a898-8f49cd1a898c
drwxr-xr-x  9 nova nova 4096 Apr  8 15:21 workload_51517816-6d5a-4fce-9ac7-46ee1e09052c
drwxr-xr-x  6 nova nova 4096 Apr 22 11:30 workload_77fb42d2-8d34-4b8d-bfd5-4263397b636c
drwxr-xr-x  5 nova nova 4096 Apr 23 06:15 workload_85bf16ed-d4fd-49a6-a753-98c5ca6e906b
[nova@tvm MTAuMTAuMi4yMDovdXBzdHJlYW0=]$ rm foo
[nova@tvm MTAuMTAuMi4yMDovdXBzdHJlYW0=]$ ll
total 24
drwxr-xr-x  3 nova nova 4096 Apr  2 17:27 contego_tasks
drwxr-xr-x  2 nova nova 4096 Apr  2 15:38 test-cloud-id
drwxr-xr-x 10 nova nova 4096 Apr 22 11:00 workload_1540698c-8e22-4dd1-a898-8f49cd1a898c
drwxr-xr-x  9 nova nova 4096 Apr  8 15:21 workload_51517816-6d5a-4fce-9ac7-46ee1e09052c
drwxr-xr-x  6 nova nova 4096 Apr 22 11:30 workload_77fb42d2-8d34-4b8d-bfd5-4263397b636c
drwxr-xr-x  5 nova nova 4096 Apr 23 06:15 workload_85bf16ed-d4fd-49a6-a753-98c5ca6e906b

Last updated