Search…
Restart TrilioVault Services
In complex environments it is sometimes necessary to restart a single service or the complete solution. Rarely is restarting the complete node, where a service is running possible or even the ideal solution.
This page describes the services running by TrilioVault and how to restart those.

TrilioVault Appliance Services

The TrilioVault Appliance is the controller of TrilioVault. Most services on the Appliance are running in a High Availability mode on a 3-node cluster.

wlm-api

The wlm-api service takes the API calls against the TrilioVault Appliance. It is running in active-active mode on all nodes of the TrilioVault cluster.
To restart the wlm-api service run on each TrilioVault node:
1
systemctl restart wlm-api
Copied!

wlm-scheduler

The wlm-scheduler service is taking job requests and identifies which TrilioVault node should take the request. It is running in active-active mode on all nodes of the TrilioVault cluster.
To restart the wlm-scheduler service run on each TrilioVault node:
1
systemctl restart wlm-scheduler
Copied!

wlm-workloads

The wlm-workloads service is the task worker of TrilioVault executing all jobs given to the TrilioVault node. It is running in active-active mode on all nodes of the TrilioVault cluster.
To restart the wlm-workloads service run on each TrilioVault node:
1
systemctl restart wlm-workloads
Copied!

wlm-cron

The wlm-cron service is responsible for starting scheduled Backups according to the configurtation of Tenant Workloads. It is running in active-passive mode and controlled by the pacemaker cluster.
To restart the wlm-workloads service run on the TrilioVault node with VIP assigned:
1
pcs resource restart wlm-cron
Copied!

VIP resources

The TrilioVault appliance is running 1 to 4 virtual IPs on the TrilioVault cluster. These are controlled by the pacemaker cluster and provided through NGINX.
To restart these resources the pacemaker NGINX resource is getting restarted:
1
pcs resource restart lb_nginx-clone
Copied!

RabbitMQ

The TrilioVault cluster is using RabbitMQ as messaging service. It is running in active-active mode on all nodes of the TrilioVault cluster.
RabbitMQ is a complex system in itself. This guide will only provide the basic commands to do a restart of a node and check the health of the cluster afterward. For complete documentation of how to restart RabbitMQ, please follow the official RabbitMQ documentation.
To restart a RabbitMQ node run on each TrilioVault node:
It is recommended to wait for the node to rejoin and sync with the cluster before restarting another RabbitMQ node.
1
[[email protected] ~]# rabbitmqctl stop
2
Stopping and halting node [email protected] ...
3
[[email protected] ~]# rabbitmq-server -detached
4
Warning: PID file not written; -detached was passed.
5
[[email protected] ~]# rabbitmqctl cluster_status
6
Cluster status of node [email protected] ...
9
{cluster_name,<<"[email protected]">>},
Copied!
When the complete cluster is getting stopped and restarted it is important to keep the order of nodes in mind. The last node to be stopped needs to be the first node to be started.

Galera Cluster (MariaDB)

The Galera Cluster is managing the TrilioVault MariaDB database. It is running in active-active mode on all nodes of the TrilioVault cluster.
Galera Cluster is a complex system in itself. This guide will only provide the basic commands to do a restart of a node and check the health of the cluster afterward. For complete documentation of how to restart Galera clusters, please follow the official Galera documentation.
When restarting Galera two different use-cases need to be considered:
    Restarting a single node
    Restarting the whole cluster

Restarting a single node

A single node can be restarted without any issues. It will automatically rejoin the cluster and sync against the remaining nodes.
The following commands will gracefully stop and restart the mysqld service.
After a restart will the cluster start the syncing process. Don't restart node after node to reach a complete cluster restart.
1
systemctl stop mysqld
2
systemctl start mysqld
Copied!
Check the cluster health after the restart.

Restarting the complete cluster

Restarting a complete cluster requires some additional steps as the Galera cluster is basically destroyed once all nodes have been shut down. It needs to be rebuild afterwards.
First gracefully shutdown the Galera cluster on all nodes:
1
systemctl stop mysqld
Copied!
The second step is to identify the Galera node with the latest dataset. This can be achieved by reading the grastate.dat file on the TrilioVault nodes.
When this documentation is followed the last mysqld service that got shut down will be the one with the latest dataset.
1
cat /var/lib/mysql/grastate.dat
2
3
# GALERA saved state
4
version: 2.1
5
uuid: 353e129f-11f2-11eb-b3f7-76f39b7b455d
6
seqno: 213576545367
7
safe_to_bootstrap: 1
Copied!
The value to check for are the seqno.
The node with the highest seqno is the node that contains the latest data. This node will also contain safe_to_bootstrap: 1 to indicate that the Galera cluster can be rebuild from this node.
On the identified node the new cluster is getting generated with the following command:
1
galera_new_cluster
Copied!
Running galera_new_cluster on the wrong node will lead to data loss as this command will set the node the command is issued on as the first node of the cluster. All nodes which join afterward will sync against the data of this first node.
After the command has been issued is the mysqld service running on this node. Now the other nodes can be restarted one by one. The started nodes will automatically rejoin the cluster and sync against the master node. Once a synced status has been reached is each node a primary node in the cluster.
1
systemctl start mysqld
Copied!
Check the Cluster health after all services are up again.

Verify Health of the Galera Cluster

Verify the cluster health by running the following commands inside each TrilioVault MariaDB. The values returned from these statements have to be the same for each node.
1
MariaDB [(none)]> show status like 'wsrep_incoming_addresses';
2
+--------------------------+-------------------------------------------------+
3
| Variable_name | Value |
4
+--------------------------+-------------------------------------------------+
5
| wsrep_incoming_addresses | 10.10.2.13:3306,10.10.2.14:3306,10.10.2.12:3306 |
6
+--------------------------+-------------------------------------------------+
7
1 row in set (0.01 sec)
8
9
MariaDB [(none)]> show status like 'wsrep_cluster_size';
10
+--------------------+-------+
11
| Variable_name | Value |
12
+--------------------+-------+
13
| wsrep_cluster_size | 3 |
14
+--------------------+-------+
15
1 row in set (0.00 sec)
16
17
MariaDB [(none)]> show status like 'wsrep_cluster_state_uuid';
18
+--------------------------+--------------------------------------+
19
| Variable_name | Value |
20
+--------------------------+--------------------------------------+
21
| wsrep_cluster_state_uuid | 353e129f-11f2-11eb-b3f7-76f39b7b455d |
22
+--------------------------+--------------------------------------+
23
1 row in set (0.00 sec)
24
25
MariaDB [(none)]> show status like 'wsrep_local_state_comment';
26
+---------------------------+--------+
27
| Variable_name | Value |
28
+---------------------------+--------+
29
| wsrep_local_state_comment | Synced |
30
+---------------------------+--------+
31
1 row in set (0.01 sec)
Copied!

Canonical workloadmgr container services

Canonical Openstack is not using the TrilioVault Appliance. In Canonical environments is the TrilioVault controller unit part of the JuJu deployment as workloadmgr container.
To restart the services inside this container the following commands are to be issued.

Single Node deployment

1
juju ssh <workloadmgr unit name>/<unit-number>
2
Systemctl restart wlm-api wlm-scheduler wlm-workloads wlm-cron
Copied!

HA deployment

On all nodes:
1
juju ssh <workloadmgr unit name>/<unit-number>
2
Systemctl restart wlm-api wlm-scheduler wlm-workloads
Copied!
On a single node:
1
juju ssh <workloadmgr unit name>/<unit-number>
2
crm_resource --restart -r res_trilio_wlm_wlm_cron
Copied!

TrilioVault dmapi service

The TrilioVault dmapi service is running on the Openstack controller nodes. Depending on the Openstack Distribution TrilioVault is installed on different commands are issued to restart the dmapi service.

RHOSP13

RHOSP13 is running the TrilioVault services as docker containers. The dmapi service can be restarted by issuing the following command on the host running the dmapi service.
1
docker restart trilio_dmapi
Copied!

RHOSP16

RHOSP16 is running the TrilioVault services as docker containers. The dmapi service can be restarted by issuing the following command on the host running the dmapi service.
1
podman restart trilio_dmapi
Copied!

Canonical

Canonical is running the TrilioVault services in JuJu controlled LXD containers. The dmapi service can be restarted by issuing the following command from the MASS node.
1
juju ssh <trilio-dm-api unit name>/<unit-number>
2
sudo systemctl restart tvault-datamover-api
Copied!

Kolla-Ansible Openstack

Kolla-Ansible Openstack is running the TrilioVault services as docker containers. The dmapi service can be restarted by issuing the following command on the host running the dmapi service.
1
docker restart triliovault_datamover_api
Copied!

Ansible Openstack

Ansible Openstack is running the TrilioVault services as LXD containers. The dmapi service can be restarted by issuing the following command on the host running the dmapi service.
1
lxc-stop -n <dmapi container name>
2
lxc-start -n <dmapi container name>
Copied!

TrilioVault datamover service (tvault-contego)

The TrilioVault datamover service is running on the Openstack compute nodes. Depending on the Openstack Distribution TrilioVault is installed on different commands are issued to restart the datamover service.

RHOSP13

RHOSP13 is running the TrilioVault services as docker containers. The datamover service can be restarted by issuing the following command on the compute node.
1
docker restart trilio_datamover
Copied!

RHOSP16

RHOSP16 is running the TrilioVault services as docker containers. The datamover service can be restarted by issuing the following command on the compute node.
1
podman restart trilio_datamover
Copied!

Canonical

Canonical is running the TrilioVault services in JuJu controlled LXD containers. The datamover service can be restarted by issuing the following command from the MASS node.
1
juju ssh <trilio-data-mover unit name>/<unit-number>
2
sudo systemctl restart tvault-contego
Copied!

Kolla-Ansible Openstack

Kolla-Ansible Openstack is running the TrilioVault services as docker containers. The dmapi service can be restarted by issuing the following command on the host running the dmapi service.
1
docker restart triliovault_datamover
Copied!

Ansible Openstack

Ansible Openstack is running the TrilioVault datamover service directly on the compute node. The datamover service can be restarted by issuing the following command on.
1
service tvault-contego restart
Copied!
Last modified 6mo ago