Healthcheck of Trilio

Trilio is composed out of multiple services, which can be checked in case of any errors.

On the Trilio Cluster


This service runs and is active on every Trilio node.

[root@Upstream ~]# systemctl status wlm-workloads
● wlm-workloads.service - workloadmanager workloads service
   Loaded: loaded (/etc/systemd/system/wlm-workloads.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-06-10 13:42:42 UTC; 1 weeks 4 days ago
 Main PID: 12779 (workloadmgr-wor)
    Tasks: 17
   CGroup: /system.slice/wlm-workloads.service
           ├─12779 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
           ├─12982 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
           ├─12983 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
           ├─12984 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf


This service runs on the Master Node of the Trilio Cluster.

[root@Upstream ~]# systemctl status wlm-api
● wlm-api.service - Cluster Controlled wlm-api
   Loaded: loaded (/etc/systemd/system/wlm-api.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/wlm-api.service.d
   Active: active (running) since Thu 2020-04-16 22:30:11 UTC; 2 months 5 days ago
 Main PID: 11815 (workloadmgr-api)
    Tasks: 1
   CGroup: /system.slice/wlm-api.service
           └─11815 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-api --config-file=/etc/workloadmgr/workloadmgr.conf


This service runs on the Master Node of the Trilio Cluster

[root@Upstream ~]# systemctl status wlm-scheduler
● wlm-scheduler.service - Cluster Controlled wlm-scheduler
   Loaded: loaded (/etc/systemd/system/wlm-scheduler.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/wlm-scheduler.service.d
   Active: active (running) since Thu 2020-04-02 13:49:22 UTC; 2 months 20 days ago
 Main PID: 29439 (workloadmgr-sch)
    Tasks: 1
   CGroup: /system.slice/wlm-scheduler.service
           └─29439 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-scheduler --config-file=/etc/workloadmgr/workloadmgr.conf

Pacemaker Cluster Status

the pacemaker cluster is controlling and watching the VIP on the Trilio Cluster. It also controls on which node the wlm-api and wlm-scheduler service runs.

[root@Upstream ~]# pcs status
Cluster name: triliovault

Stack: corosync
Current DC: Upstream2 (version 1.1.20-5.el7_7.1-3c4c782f70) - partition with quorum
Last updated: Mon Jun 22 10:52:19 2020
Last change: Thu Apr 16 22:30:11 2020 by root via cibadmin on Upstream

3 nodes configured
9 resources configured

Online: [ Upstream Upstream2 Upstream3 ]

Full list of resources:

 virtual_ip     (ocf::heartbeat:IPaddr2):       Started Upstream
 virtual_ip_public      (ocf::heartbeat:IPaddr2):       Started Upstream
 virtual_ip_admin       (ocf::heartbeat:IPaddr2):       Started Upstream
 virtual_ip_internal    (ocf::heartbeat:IPaddr2):       Started Upstream
 wlm-api        (systemd:wlm-api):      Started Upstream
 wlm-scheduler  (systemd:wlm-scheduler):        Started Upstream
 Clone Set: lb_nginx-clone [lb_nginx]
     Started: [ Upstream ]
     Stopped: [ Upstream2 Upstream3 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Mount availability

The Trilio Cluster needs access to the Backup Target and should have the correct mount at all times.

[root@Upstream ~]# df -h
Filesystem             Size  Used Avail Use% Mounted on
devtmpfs               3.8G     0  3.8G   0% /dev
tmpfs                  3.8G   38M  3.8G   1% /dev/shm
tmpfs                  3.8G  427M  3.4G  12% /run
tmpfs                  3.8G     0  3.8G   0% /sys/fs/cgroup
/dev/vda1               40G  8.8G   32G  22% /
tmpfs                  773M     0  773M   0% /run/user/996
tmpfs                  773M     0  773M   0% /run/user/0  1008G  704G  254G  74% /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0=  483G   22G  462G   5% /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0y

The dmapi service

The dmapi service has its own Keystone endpoints, which should be checked in addition to the actual service status.

[root@upstreamcontroller ~(keystone_admin)]# openstack endpoint list | grep dmapi
| 47918c8df8854ed49c082e398a9572be | RegionOne | dmapi          | datamover    | True    | public    |                    |
| cca52aff6b2a4f47bcc84b34647fba71 | RegionOne | dmapi          | datamover    | True    | internal  |                    |
| e9aa6630bfb74a9bb7562d4161f4e07d | RegionOne | dmapi          | datamover    | True    | admin     |                    |

[root@upstreamcontroller ~(keystone_admin)]# curl
{"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}
[root@upstreamcontroller ~(keystone_admin)]# systemctl status tvault-datamover-api.service
● tvault-datamover-api.service - TrilioData DataMover API service
   Loaded: loaded (/etc/systemd/system/tvault-datamover-api.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2020-04-12 12:31:11 EDT; 2 months 9 days ago
 Main PID: 11252 (python)
    Tasks: 2
   CGroup: /system.slice/tvault-datamover-api.service
           ├─11252 /usr/bin/python /usr/bin/dmapi-api
           └─11280 /usr/bin/python /usr/bin/dmapi-api

The datamover service

The datamover service is running on each compute node and is integrated as nova compute service.

[root@upstreamcontroller ~(keystone_admin)]# openstack compute service list
| ID | Binary               | Host               | Zone     | Status   | State | Updated At                 |
|  7 | nova-conductor       | upstreamcontroller | internal | enabled  | up    | 2020-06-22T11:01:05.000000 |
|  8 | nova-scheduler       | upstreamcontroller | internal | enabled  | up    | 2020-06-22T11:01:04.000000 |
|  9 | nova-consoleauth     | upstreamcontroller | internal | enabled  | up    | 2020-06-22T11:01:01.000000 |
| 10 | nova-compute         | upstreamcompute1   | US-East  | enabled  | up    | 2020-06-22T11:01:09.000000 |
| 11 | nova-compute         | upstreamcompute2   | US-West  | enabled  | up    | 2020-06-22T11:01:09.000000 |
| 16 | nova-contego_3.0.172 | upstreamcompute2   | internal | enabled  | up    | 2020-06-22T11:01:07.000000 |
| 17 | nova-contego_3.0.172 | upstreamcompute1   | internal | enabled  | up    | 2020-06-22T11:01:02.000000 |
[root@upstreamcompute1 ~]# systemctl status tvault-contego
● tvault-contego.service - Tvault contego
   Loaded: loaded (/etc/systemd/system/tvault-contego.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-06-10 10:07:28 EDT; 1 weeks 4 days ago
 Main PID: 10384 (python)
    Tasks: 21
   CGroup: /system.slice/tvault-contego.service
           └─10384 /usr/bin/python /usr/bin/tvault-contego --config-file=/etc/nova/nova.conf --config-file=/etc/tvault-contego/tvault-contego.conf

Last updated