Healthcheck of Trilio
Trilio is composed out of multiple services, which can be checked in case of any errors.
On the Trilio Cluster
wlm-workloads
This service runs and is active on every Trilio node.
[root@Upstream ~]# systemctl status wlm-workloads
● wlm-workloads.service - workloadmanager workloads service
Loaded: loaded (/etc/systemd/system/wlm-workloads.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-06-10 13:42:42 UTC; 1 weeks 4 days ago
Main PID: 12779 (workloadmgr-wor)
Tasks: 17
CGroup: /system.slice/wlm-workloads.service
├─12779 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
├─12982 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
├─12983 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
├─12984 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
[...]
wlm-api
This service runs on the Master Node of the Trilio Cluster.
[root@Upstream ~]# systemctl status wlm-api
● wlm-api.service - Cluster Controlled wlm-api
Loaded: loaded (/etc/systemd/system/wlm-api.service; disabled; vendor preset: disabled)
Drop-In: /run/systemd/system/wlm-api.service.d
└─50-pacemaker.conf
Active: active (running) since Thu 2020-04-16 22:30:11 UTC; 2 months 5 days ago
Main PID: 11815 (workloadmgr-api)
Tasks: 1
CGroup: /system.slice/wlm-api.service
└─11815 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-api --config-file=/etc/workloadmgr/workloadmgr.conf
wlm-scheduler
This service runs on the Master Node of the Trilio Cluster
[root@Upstream ~]# systemctl status wlm-scheduler
● wlm-scheduler.service - Cluster Controlled wlm-scheduler
Loaded: loaded (/etc/systemd/system/wlm-scheduler.service; disabled; vendor preset: disabled)
Drop-In: /run/systemd/system/wlm-scheduler.service.d
└─50-pacemaker.conf
Active: active (running) since Thu 2020-04-02 13:49:22 UTC; 2 months 20 days ago
Main PID: 29439 (workloadmgr-sch)
Tasks: 1
CGroup: /system.slice/wlm-scheduler.service
└─29439 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-scheduler --config-file=/etc/workloadmgr/workloadmgr.conf
Pacemaker Cluster Status
the pacemaker cluster is controlling and watching the VIP on the Trilio Cluster. It also controls on which node the wlm-api and wlm-scheduler service runs.
[root@Upstream ~]# pcs status
Cluster name: triliovault
Stack: corosync
Current DC: Upstream2 (version 1.1.20-5.el7_7.1-3c4c782f70) - partition with quorum
Last updated: Mon Jun 22 10:52:19 2020
Last change: Thu Apr 16 22:30:11 2020 by root via cibadmin on Upstream
3 nodes configured
9 resources configured
Online: [ Upstream Upstream2 Upstream3 ]
Full list of resources:
virtual_ip (ocf::heartbeat:IPaddr2): Started Upstream
virtual_ip_public (ocf::heartbeat:IPaddr2): Started Upstream
virtual_ip_admin (ocf::heartbeat:IPaddr2): Started Upstream
virtual_ip_internal (ocf::heartbeat:IPaddr2): Started Upstream
wlm-api (systemd:wlm-api): Started Upstream
wlm-scheduler (systemd:wlm-scheduler): Started Upstream
Clone Set: lb_nginx-clone [lb_nginx]
Started: [ Upstream ]
Stopped: [ Upstream2 Upstream3 ]
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Mount availability
The Trilio Cluster needs access to the Backup Target and should have the correct mount at all times.
[root@Upstream ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.8G 0 3.8G 0% /dev
tmpfs 3.8G 38M 3.8G 1% /dev/shm
tmpfs 3.8G 427M 3.4G 12% /run
tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup
/dev/vda1 40G 8.8G 32G 22% /
tmpfs 773M 0 773M 0% /run/user/996
tmpfs 773M 0 773M 0% /run/user/0
10.10.2.20:/upstream 1008G 704G 254G 74% /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0=
10.10.2.20:/upstream2 483G 22G 462G 5% /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0y
The dmapi service
The dmapi service has its own Keystone endpoints, which should be checked in addition to the actual service status.
[root@upstreamcontroller ~(keystone_admin)]# openstack endpoint list | grep dmapi
| 47918c8df8854ed49c082e398a9572be | RegionOne | dmapi | datamover | True | public | http://10.10.2.10:8784/v2 |
| cca52aff6b2a4f47bcc84b34647fba71 | RegionOne | dmapi | datamover | True | internal | http://10.10.2.10:8784/v2 |
| e9aa6630bfb74a9bb7562d4161f4e07d | RegionOne | dmapi | datamover | True | admin | http://10.10.2.10:8784/v2 |
[root@upstreamcontroller ~(keystone_admin)]# curl http://10.10.2.10:8784/v2
{"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}
[root@upstreamcontroller ~(keystone_admin)]# systemctl status tvault-datamover-api.service
● tvault-datamover-api.service - TrilioData DataMover API service
Loaded: loaded (/etc/systemd/system/tvault-datamover-api.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2020-04-12 12:31:11 EDT; 2 months 9 days ago
Main PID: 11252 (python)
Tasks: 2
CGroup: /system.slice/tvault-datamover-api.service
├─11252 /usr/bin/python /usr/bin/dmapi-api
└─11280 /usr/bin/python /usr/bin/dmapi-api
The datamover service
The datamover service is running on each compute node and is integrated as nova compute service.
[root@upstreamcontroller ~(keystone_admin)]# openstack compute service list
+----+----------------------+--------------------+----------+----------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+----+----------------------+--------------------+----------+----------+-------+----------------------------+
| 7 | nova-conductor | upstreamcontroller | internal | enabled | up | 2020-06-22T11:01:05.000000 |
| 8 | nova-scheduler | upstreamcontroller | internal | enabled | up | 2020-06-22T11:01:04.000000 |
| 9 | nova-consoleauth | upstreamcontroller | internal | enabled | up | 2020-06-22T11:01:01.000000 |
| 10 | nova-compute | upstreamcompute1 | US-East | enabled | up | 2020-06-22T11:01:09.000000 |
| 11 | nova-compute | upstreamcompute2 | US-West | enabled | up | 2020-06-22T11:01:09.000000 |
| 16 | nova-contego_3.0.172 | upstreamcompute2 | internal | enabled | up | 2020-06-22T11:01:07.000000 |
| 17 | nova-contego_3.0.172 | upstreamcompute1 | internal | enabled | up | 2020-06-22T11:01:02.000000 |
+----+----------------------+--------------------+----------+----------+-------+----------------------------+
[root@upstreamcompute1 ~]# systemctl status tvault-contego
● tvault-contego.service - Tvault contego
Loaded: loaded (/etc/systemd/system/tvault-contego.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-06-10 10:07:28 EDT; 1 weeks 4 days ago
Main PID: 10384 (python)
Tasks: 21
CGroup: /system.slice/tvault-contego.service
└─10384 /usr/bin/python /usr/bin/tvault-contego --config-file=/etc/nova/nova.conf --config-file=/etc/tvault-contego/tvault-contego.conf
Last updated