Search…
TVO-4.2
Healthcheck of TrilioVault
TrilioVault is composed of multiple services, which can be checked in case of any errors.

Verify the TrilioVault Appliance

Verify the services are up

TrilioVault is using 4 main services on the TrilioVault Appliance:
  • wlm-api
  • wlm-scheduler
  • wlm-workloads
  • wlm-cron
1
systemctl | grep wlm
2
wlm-api.service loaded active running workloadmanager api service
3
wlm-cron.service loaded active running Cluster Controlled wlm-cron
4
wlm-scheduler.service loaded active running Cluster Controlled wlm-scheduler
5
wlm-workloads.service loaded active running workloadmanager workloads service
Copied!
Those can be verified to be up and running using the systemctl status command.
1
systemctl status wlm-api
2
######
3
● wlm-api.service - workloadmanager api service
4
Loaded: loaded (/etc/systemd/system/wlm-api.service; enabled; vendor preset: disabled)
5
Active: active (running) since Tue 2021-11-02 19:41:19 UTC; 2 months 21 days ago
6
Main PID: 4688 (workloadmgr-api)
7
CGroup: /system.slice/wlm-api.service
8
├─4688 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-api --config-file=/etc/workloadmgr/workloadmgr.conf
Copied!
1
systemctl status wlm-scheduler
2
######
3
● wlm-scheduler.service - Cluster Controlled wlm-scheduler
4
Loaded: loaded (/etc/systemd/system/wlm-scheduler.service; disabled; vendor preset: disabled)
5
Drop-In: /run/systemd/system/wlm-scheduler.service.d
6
└─50-pacemaker.conf
7
Active: active (running) since Sat 2022-01-22 13:49:28 UTC; 1 day 23h ago
8
Main PID: 9342 (workloadmgr-sch)
9
CGroup: /system.slice/wlm-scheduler.service
10
└─9342 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-scheduler --config-file=/etc/workloadmgr/workloadmgr.conf
Copied!
1
systemctl status wlm-workloads
2
######
3
● wlm-workloads.service - workloadmanager workloads service
4
Loaded: loaded (/etc/systemd/system/wlm-workloads.service; enabled; vendor preset: disabled)
5
Active: active (running) since Tue 2021-11-02 19:51:05 UTC; 2 months 21 days ago
6
Main PID: 606 (workloadmgr-wor)
7
CGroup: /system.slice/wlm-workloads.service
8
├─ 606 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-workloads --config-file=/etc/workloadmgr/workloadmgr.conf
Copied!
1
systemctl status wlm-cron
2
######
3
● wlm-cron.service - Cluster Controlled wlm-cron
4
Loaded: loaded (/etc/systemd/system/wlm-cron.service; disabled; vendor preset: disabled)
5
Drop-In: /run/systemd/system/wlm-cron.service.d
6
└─50-pacemaker.conf
7
Active: active (running) since Sat 2022-01-22 13:49:28 UTC; 1 day 23h ago
8
Main PID: 9209 (workloadmgr-cro)
9
CGroup: /system.slice/wlm-cron.service
10
├─9209 /home/stack/myansible/bin/python3 /home/stack/myansible/bin/workloadmgr-cron --config-file=/etc/workloadmgr/workloadmgr.conf
Copied!

Check the TrilioVault pacemaker and nginx cluster

The second component to check the TrilioVault Appliance's health is the nginx and pacemaker cluster.
1
pcs status
2
######
3
Cluster name: triliovault
4
5
WARNINGS:
6
Corosync and pacemaker node names do not match (IPs used in setup?)
7
8
Stack: corosync
9
Current DC: TVM1 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
10
Last updated: Mon Jan 24 13:42:01 2022
11
Last change: Tue Nov 2 19:07:04 2021 by root via crm_resource on TVM2
12
13
3 nodes configured
14
9 resources configured
15
16
Online: [ TVM1 TVM2 TVM3 ]
17
18
Full list of resources:
19
20
virtual_ip (ocf::heartbeat:IPaddr2): Started TVM2
21
virtual_ip_public (ocf::heartbeat:IPaddr2): Started TVM2
22
virtual_ip_admin (ocf::heartbeat:IPaddr2): Started TVM2
23
virtual_ip_internal (ocf::heartbeat:IPaddr2): Started TVM2
24
wlm-cron (systemd:wlm-cron): Started TVM2
25
wlm-scheduler (systemd:wlm-scheduler): Started TVM2
26
Clone Set: lb_nginx-clone [lb_nginx]
27
Started: [ TVM2 ]
28
Stopped: [ TVM1 TVM3 ]
29
30
Daemon Status:
31
corosync: active/enabled
32
pacemaker: active/enabled
33
pcsd: active/enabled
Copied!

Verify API connectivity of the TrilioVault Appliance

Checking the availability of the TrilioVault API on the chosen endpoints is recommended.
The following example curl command lists the available workload-types and verifies that the connection is available and working:
1
curl http://10.10.2.34:8780/v1/8e16700ae3614da4ba80a4e57d60cdb9/workload_types/detail -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-workloadmgrclient" -H "Accept: application/json" -H "X-Auth-Token: gAAAAABe40NVFEtJeePpk1F9QGGh1LiGnHJVLlgZx9t0HRrK9rC5vqKZJRkpAcW1oPH6Q9K9peuHiQrBHEs1-g75Na4xOEESR0LmQJUZP6n37fLfDL_D-hlnjHJZ68iNisIP1fkm9FGSyoyt6IqjO9E7_YVRCTCqNLJ67ZkqHuJh1CXwShvjvjw
Copied!
Please check the API guide for more commands and how to generate the X-Auth-Token.

Verify TrilioVault components on OpenStack

On OpenStack Ansible

Datamover-API service (dmapi)

The dmapi service has its own Keystone endpoints, which should be checked in addition to the actual service status.
1
[email protected]:~# openstack endpoint list | grep dmapi
2
| 190db2ce033e44f89de73abcbf12804e | US-WEST-2 | dmapi | datamover | True | public | https://osa-victoria-ubuntu20-2.triliodata.demo:8784/v2 |
3
| dec1a323791b49f0ac7901a2dc806ee2 | US-WEST-2 | dmapi | datamover | True | admin | http://10.10.10.154:8784/v2 |
4
| f8c4162c9c1246ffb0190d0d093c48af | US-WEST-2 | dmapi | datamover | True | internal | http://10.10.10.154:8784/v2 |
5
6
[email protected]:~# curl http://10.10.10.154:8784
7
{"versions": [{"id": "v2.0", "status": "SUPPORTED", "version": "", "min_version": "", "updated": "2011-01-21T11:33:21Z", "links": [{"rel": "self", "href": "h
Copied!
In order to check the dmapi service go to dmapi container which is residing on controller nodes and run below command
1
lxc-attach -n <dmapi-container-name> (go to dmapi conatiner)
2
[email protected]:~# systemctl status tvault-datamover-api.service
3
● tvault-datamover-api.service - TrilioData DataMover API service
4
Loaded: loaded (/lib/systemd/system/tvault-datamover-api.service; enabled; vendor preset: enabled)
5
Active: active (running) since Wed 2022-01-12 11:53:39 UTC; 1 day 17h ago
6
Main PID: 23888 (dmapi-api)
7
Tasks: 289 (limit: 57729)
8
Memory: 607.7M
9
CGroup: /system.slice/tvault-datamover-api.service
10
├─23888 /usr/bin/python3 /usr/bin/dmapi-api
11
├─23893 /usr/bin/python3 /usr/bin/dmapi-api
12
├─23894 /usr/bin/python3 /usr/bin/dmapi-api
13
├─23895 /usr/bin/python3 /usr/bin/dmapi-api
14
├─23896 /usr/bin/python3 /usr/bin/dmapi-api
15
├─23897 /usr/bin/python3 /usr/bin/dmapi-api
16
├─23898 /usr/bin/python3 /usr/bin/dmapi-api
17
├─23899 /usr/bin/python3 /usr/bin/dmapi-api
18
├─23900 /usr/bin/python3 /usr/bin/dmapi-api
19
├─23901 /usr/bin/python3 /usr/bin/dmapi-api
20
├─23902 /usr/bin/python3 /usr/bin/dmapi-api
21
├─23903 /usr/bin/python3 /usr/bin/dmapi-api
22
├─23904 /usr/bin/python3 /usr/bin/dmapi-api
23
├─23905 /usr/bin/python3 /usr/bin/dmapi-api
24
├─23906 /usr/bin/python3 /usr/bin/dmapi-api
25
├─23907 /usr/bin/python3 /usr/bin/dmapi-api
26
└─23908 /usr/bin/python3 /usr/bin/dmapi-api
27
28
Jan 12 11:53:39 controller-dmapi-container-08df1e06 systemd[1]: Started TrilioData DataMover API service.
29
Jan 12 11:53:40 controller-dmapi-container-08df1e06 dmapi-api[23888]: Could not load
30
Jan 12 11:53:40 controller-dmapi-container-08df1e06 dmapi-api[23888]: Could not load
31
Copied!

Datamover service (tvault-contego)

The datamover service is running on each compute node. Logging to compute node and run below command
1
[email protected]:~# systemctl status tvault-contego
2
● tvault-contego.service - Tvault contego
3
Loaded: loaded (/etc/systemd/system/tvault-contego.service; enabled; vendor preset: enabled)
4
Active: active (running) since Fri 2022-01-14 05:45:19 UTC; 2s ago
5
Main PID: 1489651 (python3)
6
Tasks: 19 (limit: 67404)
7
Memory: 6.7G (max: 10.0G)
8
CGroup: /system.slice/tvault-contego.service
9
├─ 998543 /bin/qemu-nbd -c /dev/nbd45 --object secret,id=sec0,data=payload-1234 --image-opts driver=qcow2,encrypt.format=luks,encrypt.key-secret=sec0,file>
10
├─ 998772 /bin/qemu-nbd -c /dev/nbd73 --object secret,id=sec0,data=payload-1234 --image-opts driver=qcow2,encrypt.format=luks,encrypt.key-secret=sec0,file>
11
├─ 998931 /bin/qemu-nbd -c /dev/nbd100 --object secret,id=sec0,data=payload-1234 --image-opts driver=qcow2,encrypt.format=luks,encrypt.key-secret=sec0,fil>
12
├─ 999147 /bin/qemu-nbd -c /dev/nbd35 --object secret,id=sec0,data=payload-1234 --image-opts driver=qcow2,encrypt.format=luks,encrypt.key-secret=sec0,file>
13
├─1371322 /bin/qemu-nbd -c /dev/nbd63 --object secret,id=sec0,data=payload-test1 --image-opts driver=qcow2,encrypt.format=luks,encrypt.key-secret=sec0,fil>
14
├─1371524 /bin/qemu-nbd -c /dev/nbd91 --object secret,id=sec0,data=payload-test1 --image-opts driver=qcow2,encrypt.format=luks,encrypt.key-secret=sec0,fil>
15
└─1489651 /openstack/venvs/nova-22.3.1/bin/python3 /usr/bin/tvault-contego --config-file=/etc/nova/nova.conf --config-file=/etc/tvault-contego/tvault-cont>
16
17
Jan 14 05:45:19 compute systemd[1]: Started Tvault contego.
18
Jan 14 05:45:20 compute sudo[1489653]: nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/openstack/venvs/nova-22.3.1/bin/nova-rootwrap /etc/nova/rootwrap.conf umou>
19
Jan 14 05:45:20 compute sudo[1489653]: pam_unix(sudo:session): session opened for user root by (uid=0)
20
Jan 14 05:45:21 compute python3[1489655]: umount: /var/triliovault-mounts/VHJpbGlvVmF1bHQ=: no mount point specified.
21
Jan 14 05:45:21 compute sudo[1489653]: pam_unix(sudo:session): session closed for user root
22
Jan 14 05:45:21 compute tvault-contego[1489651]: 2022-01-14 05:45:21.499 1489651 INFO __main__ [req-48c32a39-38d0-45b9-9852-931e989133c6 - - - - -] CPU Control group m>
23
Jan 14 05:45:21 compute tvault-contego[1489651]: 2022-01-14 05:45:21.499 1489651 INFO __main__ [req-48c32a39-38d0-45b9-9852-931e989133c6 - - - - -] I/O Control Group m>
24
lines 1-22/22 (END)
Copied!

On Kolla Ansible OpenStack

Datamover-API service (dmapi)

The dmapi service has its own Keystone endpoints, which should be checked in addition to the actual service status.
1
[email protected]:~# openstack endpoint list | grep dmapi
2
| 190db2ce033e44f89de73abcbf12804e | US-WEST-2 | dmapi | datamover | True | public | https://osa-victoria-ubuntu20-2.triliodata.demo:8784/v2 |
3
| dec1a323791b49f0ac7901a2dc806ee2 | US-WEST-2 | dmapi | datamover | True | admin | http://10.10.10.154:8784/v2 |
4
| f8c4162c9c1246ffb0190d0d093c48af | US-WEST-2 | dmapi | datamover | True | internal | http://10.10.10.154:8784/v2 |
5
6
[email protected]:~# curl http://10.10.10.154:8784
7
{"versions": [{"id": "v2.0", "status": "SUPPORTED", "version": "", "min_version": "", "updated": "2011-01-21T11:33:21Z", "links": [{"rel": "self", "href": "h
Copied!
Run the following command on “nova-api” nodes and make sure “triliovault_datamover_api” container is in started state.
1
[[email protected] ~]# docker ps | grep triliovault_datamover_api
2
3f979c15cedc trilio/centos-binary-trilio-datamover-api:4.2.50-victoria "dumb-init --single-…" 3 days ago Up 3 days triliovault_datamover_api
Copied!

Datamover service (tvault-contego)

Run the following command on "nova-compute" nodes and make sure the container is in a started state.
1
[[email protected] ~]# docker ps | grep triliovault_datamover
2
2f1ece820a59 trilio/centos-binary-trilio-datamover:4.2.50-victoria "dumb-init --single-…" 3 days ago Up 3 days triliovault_datamover
Copied!

Trilio Horizon integration

Run the following command on horizon nodes and make sure the container is in a started state.
1
[[email protected] ~]# docker ps | grep horizon
2
4a004c786d47 trilio/centos-binary-trilio-horizon-plugin:4.2.50-victoria "dumb-init --single-…" 3 days ago Up 3 days (unhealthy) horizon
Copied!

On Canonical OpenStack

Run the following command on MAAS nodes and make sure all trilio units like trilio-data-mover, trilio-dm-api, trilio-horizon-plugin, trilio-wlmare in active state
1
[email protected]:~# juju status | grep trilio
2
trilio-data-mover 4.2.51 active 3 trilio-data-mover jujucharms 9 ubuntu
3
trilio-dm-api 4.2.51 active 1 trilio-dm-api jujucharms 7 ubuntu
4
trilio-horizon-plugin 4.2.51 active 1 trilio-horizon-plugin jujucharms 6 ubuntu
5
trilio-wlm 4.2.51 active 1 trilio-wlm jujucharms 9 ubuntu
6
trilio-data-mover/8 active idle 172.17.1.5 Unit is ready
7
trilio-data-mover/6 active idle 172.17.1.6 Unit is ready
8
trilio-data-mover/7* active idle 172.17.1.7 Unit is ready
9
trilio-horizon-plugin/2* active idle 172.17.1.16 Unit is ready
10
trilio-dm-api/2* active idle 1/lxd/4 172.17.1.27 8784/tcp Unit is ready
11
trilio-wlm/2* active idle 7 172.17.1.28 8780/tcp Unit is ready
Copied!

On Red Hat OpenStack and TripleO

On controller node

Make sure the Trilio dmapi and horizon containers (shown below) are in a running state and no other Trilio container is deployed on controller nodes. If the containers are in restarting state or not listed by the following command then your deployment is not done correctly. Please note that the 'Horizon' container is replaced with the Trilio Horizon container. This container will have the latest OpenStack horizon + TrilioVault's horizon plugin.
1
On rhosp13 OS: docker ps | grep trilio-
2
On other (rhosp16 onwards/tripleo) : podman ps | grep trilio-
3
4
[[email protected] heat-admin]# podman ps | grep trilio-
5
e3530d6f7bec ucqa161.ctlplane.trilio.local:8787/trilio/trilio-datamover-api:4.2.47-rhosp16.1 kolla_start 2 weeks ago Up 2 weeks ago trilio_dmapi
6
f93f7019f934 ucqa161.ctlplane.trilio.local:8787/trilio/trilio-horizon-plugin:4.2.47-rhosp16.1 kolla_start 2 weeks ago Up 2 weeks ago horizon
Copied!

On compute node

Make sure the Trilio datamover container (shown below) is in a running state and no other Trilio container is deployed on compute nodes. If the containers are in restarting state or not listed by the following command then your deployment is not done correctly.
1
On rhosp13 OS: docker ps | grep trilio-
2
On other (rhosp/tripleo) : podman ps | grep trilio-
3
4
[[email protected] heat-admin]# podman ps | grep trilio-
5
4419b02e075c undercloud162.ctlplane.trilio.local:8787/trilio/trilio-datamover:dev-osp16.2-1-rhosp16.2 kolla_start 2 days ago Up 27 seconds ago trilio_datamover
Copied!

On overcloud

Please check dmapi endpoints on overcloud node.
1
(overcloudtrain1) [[email protected] ~]$ openstack endpoint list | grep datamover
2
| 218b2f92569a4d259839fa3ea4d6103a | regionOne | dmapi | datamover | True | internal | https://overcloudtrain1internalapi.trilio.local:8784/v2 |
3
| 4702c51aa5c24bed853e736499e194e2 | regionOne | dmapi | datamover | True | public | https://overcloudtrain1.trilio.local:13784/v2 |
4
| c8169025eb1e4954ab98c7abdb0f53f6 | regionOne | dmapi | datamover | True | admin | https://overcloudtrain1internalapi.trilio.local:8784/v2
Copied!