Search…
Healthcheck of TrilioVault
TrilioVault is composed out of multiple services, which can be checked in case of any errors.

On the TrilioVault Cluster

wlm-workloads

This service runs and is active on every TrilioVault node.
1
[[email protected] ~]# systemctl status wlm-workloads
2
● wlm-workloads.service - workloadmanager workloads service
3
Loaded: loaded (/etc/systemd/system/wlm-workloads.service; enabled; vendor preset: disabled)
4
Active: active (running) since Wed 2020-06-10 13:42:42 UTC; 1 weeks 4 days ago
5
Main PID: 12779 (workloadmgr-wor)
6
Tasks: 17
7
CGroup: /system.slice/wlm-workloads.service
8
├─12779 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
9
├─12982 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
10
├─12983 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
11
├─12984 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
12
[...]
Copied!

wlm-api

This service runs and is active on every TrilioVault node.
1
[[email protected] ~]# systemctl status wlm-api
2
● wlm-api.service - Cluster Controlled wlm-api
3
Loaded: loaded (/etc/systemd/system/wlm-api.service; disabled; vendor preset: disabled)
4
Drop-In: /run/systemd/system/wlm-api.service.d
5
└─50-pacemaker.conf
6
Active: active (running) since Thu 2020-04-16 22:30:11 UTC; 2 months 5 days ago
7
Main PID: 11815 (workloadmgr-api)
8
Tasks: 1
9
CGroup: /system.slice/wlm-api.service
10
└─11815 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-api --config-file=/etc/workloadmgr/workloadmgr.conf
Copied!

wlm-scheduler

This service runs and is active on every TrilioVault node.
1
[[email protected] ~]# systemctl status wlm-scheduler
2
● wlm-scheduler.service - Cluster Controlled wlm-scheduler
3
Loaded: loaded (/etc/systemd/system/wlm-scheduler.service; disabled; vendor preset: disabled)
4
Drop-In: /run/systemd/system/wlm-scheduler.service.d
5
└─50-pacemaker.conf
6
Active: active (running) since Thu 2020-04-02 13:49:22 UTC; 2 months 20 days ago
7
Main PID: 29439 (workloadmgr-sch)
8
Tasks: 1
9
CGroup: /system.slice/wlm-scheduler.service
10
└─29439 /home/stack/myansible/bin/python /home/stack/myansible/bin/workloadmgr-scheduler --config-file=/etc/workloadmgr/workloadmgr.conf
Copied!

wlm-cron

This service is controlled by pacemaker and runs only on the master node
1
[[email protected] ~]# systemctl status wlm-cron
2
● wlm-cron.service - Cluster Controlled wlm-cron
3
Loaded: loaded (/etc/systemd/system/wlm-cron.service; disabled; vendor preset: disabled)
4
Drop-In: /run/systemd/system/wlm-cron.service.d
5
└─50-pacemaker.conf
6
Active: active (running) since Wed 2021-01-27 19:59:26 UTC; 6 days ago
7
Main PID: 23071 (workloadmgr-cro)
8
CGroup: /system.slice/wlm-cron.service
9
├─23071 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-cron --config-file=/etc/workloadmgr/workloadmgr.conf
10
└─23248 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-cron --config-file=/etc/workloadmgr/workloadmgr.conf
11
12
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: ● wlm-cron.service - Cluster Controlled wlm-cron
13
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: Loaded: loaded (/etc/systemd/system/wlm-cron.service; disabled; vendor preset: disabled)
14
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: Drop-In: /run/systemd/system/wlm-cron.service.d
15
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: └─50-pacemaker.conf
16
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: Active: active (running) since Wed 2021-01-27 19:59:26 UTC; 6 days ago
17
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: Main PID: 23071 (workloadmgr-cro)
18
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: CGroup: /system.slice/wlm-cron.service
19
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: ├─23071 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-cron --config-file=/etc/workloadmgr/workloadmgr.conf
20
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: ├─23248 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-cron --config-file=/etc/workloadmgr/workloadmgr.conf
21
Feb 03 19:28:43 tvm1-ansible-ussuri-ubuntu18-vagrant workloadmgr-cron[23071]: └─27145 /usr/bin/systemctl status wlm-cron
Copied!

Pacemaker Cluster Status

the pacemaker cluster is controlling and watching the VIP on the TrilioVault Cluster. It also controls on which node the wlm-api and wlm-scheduler service runs.
1
[[email protected] ~]# pcs status
2
Cluster name: triliovault
3
4
WARNINGS:
5
Corosync and pacemaker node names do not match (IPs used in setup?)
6
7
Stack: corosync
8
Current DC: tvm1-ansible-ussuri-ubuntu18-vagrant (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
9
Last updated: Wed Feb 3 19:20:02 2021
10
Last change: Wed Jan 27 20:00:12 2021 by root via crm_resource on tvm1-ansible-ussuri-ubuntu18-vagrant
11
12
1 node configured
13
6 resource instances configured
14
15
Online: [ tvm1-ansible-ussuri-ubuntu18-vagrant ]
16
17
Full list of resources:
18
19
virtual_ip (ocf::heartbeat:IPaddr2): Started tvm1-ansible-ussuri-ubuntu18-vagrant
20
virtual_ip_public (ocf::heartbeat:IPaddr2): Started tvm1-ansible-ussuri-ubuntu18-vagrant
21
virtual_ip_admin (ocf::heartbeat:IPaddr2): Started tvm1-ansible-ussuri-ubuntu18-vagrant
22
virtual_ip_internal (ocf::heartbeat:IPaddr2): Started tvm1-ansible-ussuri-ubuntu18-vagrant
23
wlm-cron (systemd:wlm-cron): Started tvm1-ansible-ussuri-ubuntu18-vagrant
24
Clone Set: lb_nginx-clone [lb_nginx]
25
Started: [ tvm1-ansible-ussuri-ubuntu18-vagrant ]
26
27
Daemon Status:
28
corosync: active/enabled
29
pacemaker: active/enabled
30
pcsd: active/enabled
31
Copied!

Mount availability

The TrilioVault Cluster needs access to the Backup Target and should have the correct mount at all times.
1
2
Filesystem Size Used Avail Use% Mounted on
3
devtmpfs 3.8G 0 3.8G 0% /dev
4
tmpfs 3.8G 38M 3.8G 1% /dev/shm
5
tmpfs 3.8G 427M 3.4G 12% /run
6
tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup
7
/dev/vda1 40G 8.8G 32G 22% /
8
tmpfs 773M 0 773M 0% /run/user/996
9
tmpfs 773M 0 773M 0% /run/user/0
10
10.10.2.20:/upstream 1008G 704G 254G 74% /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0=
11
10.10.2.20:/upstream2 483G 22G 462G 5% /var/triliovault-mounts/MTAuMTAuMi4yMDovdXBzdHJlYW0y
Copied!

The dmapi service

The dmapi service has its own Keystone endpoints, which should be checked in addition to the actual service status.
1
[[email protected] ~(keystone_admin)]# openstack endpoint list | grep dmapi
2
| 47918c8df8854ed49c082e398a9572be | RegionOne | dmapi | datamover | True | public | http://10.10.2.10:8784/v2 |
3
| cca52aff6b2a4f47bcc84b34647fba71 | RegionOne | dmapi | datamover | True | internal | http://10.10.2.10:8784/v2 |
4
| e9aa6630bfb74a9bb7562d4161f4e07d | RegionOne | dmapi | datamover | True | admin | http://10.10.2.10:8784/v2 |
5
6
[[email protected] ~(keystone_admin)]# curl http://10.10.2.10:8784/v2
7
{"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}
Copied!
1
[[email protected] ~(keystone_admin)]# systemctl status tvault-datamover-api.service
2
● tvault-datamover-api.service - TrilioData DataMover API service
3
Loaded: loaded (/etc/systemd/system/tvault-datamover-api.service; enabled; vendor preset: disabled)
4
Active: active (running) since Sun 2020-04-12 12:31:11 EDT; 2 months 9 days ago
5
Main PID: 11252 (python)
6
Tasks: 2
7
CGroup: /system.slice/tvault-datamover-api.service
8
├─11252 /usr/bin/python /usr/bin/dmapi-api
9
└─11280 /usr/bin/python /usr/bin/dmapi-api
Copied!

The datamover service

The datamover service is running on each compute node and is integrated as nova compute service.
1
[[email protected] ~(keystone_admin)]# openstack compute service list
2
+----+----------------------+--------------------+----------+----------+-------+----------------------------+
3
| ID | Binary | Host | Zone | Status | State | Updated At |
4
+----+----------------------+--------------------+----------+----------+-------+----------------------------+
5
| 7 | nova-conductor | upstreamcontroller | internal | enabled | up | 2020-06-22T11:01:05.000000 |
6
| 8 | nova-scheduler | upstreamcontroller | internal | enabled | up | 2020-06-22T11:01:04.000000 |
7
| 9 | nova-consoleauth | upstreamcontroller | internal | enabled | up | 2020-06-22T11:01:01.000000 |
8
| 10 | nova-compute | upstreamcompute1 | US-East | enabled | up | 2020-06-22T11:01:09.000000 |
9
| 11 | nova-compute | upstreamcompute2 | US-West | enabled | up | 2020-06-22T11:01:09.000000 |
10
| 16 | nova-contego_3.0.172 | upstreamcompute2 | internal | enabled | up | 2020-06-22T11:01:07.000000 |
11
| 17 | nova-contego_3.0.172 | upstreamcompute1 | internal | enabled | up | 2020-06-22T11:01:02.000000 |
12
+----+----------------------+--------------------+----------+----------+-------+----------------------------+
Copied!
1
[[email protected] ~]# systemctl status tvault-contego
2
● tvault-contego.service - Tvault contego
3
Loaded: loaded (/etc/systemd/system/tvault-contego.service; enabled; vendor preset: disabled)
4
Active: active (running) since Wed 2020-06-10 10:07:28 EDT; 1 weeks 4 days ago
5
Main PID: 10384 (python)
6
Tasks: 21
7
CGroup: /system.slice/tvault-contego.service
8
└─10384 /usr/bin/python /usr/bin/tvault-contego --config-file=/etc/nova/nova.conf --config-file=/etc/tvault-contego/tvault-contego.conf
Copied!
Last modified 8mo ago