Shutdown/Restart the Trilio cluster
To gracefully shutdown/restart the Trilio cluster the following steps are recommended.
Verify no snapshots or restores are running
It is recommended to verify that no snapshots or restores are running on the Trilio Cluster.
Stopping or restarting the Trilio cluster will cancel all running actively running backup or restore jobs. These jobs will be marked as errored after the system has come up again.
This can be verified using the following two commands:
Identify the master node for the VIP(s) and wlm-cron service
The Trilio cluster is using the pacemaker service for setting the VIP(s) of the cluster and controlling the active node for the wlm-cron service. The identified node will be the last to shut down in case that the whole cluster gets shut down.
This can be checked using the following command:
In the following example is the master node the tvm1
Shutdown/Restart of a single node in the cluster
A single node in the cluster can be shut down or restarted without issues. All services will come up and the RabbitMQ and Galeera service will rejoin the remaining cluster.
When the master node gets shutdown or restarted the VIP(s) and the wlm-cron service will switch to one of the remaining cluster nodes.
Stop the services on the node
To speed up the shutdown/restart process it is recommended to stop the Trilio services, the RabbitMQ service, and the MariaDB service on the node.
The wlm-cron service and the VIP(s) are not getting stopped when only the master node gets rebooted or shut down. The pacemaker will automatically move the wlm-cron service and the VIP(s) to one of the remaining nodes.
Shutdown/Restart the node
After the services have been stopped the node can be restarted or shut down using standard Linux commands.
Restarting the complete cluster node by node
Restarting the whole cluster node by node follows the same procedure as restarting a single node, with the difference that each restarted node needs to be fully started again before the next node can be restarted.
Shutdown/Restart the complete cluster as a whole
When the complete cluster needs to get stopped and restarted at the same time the following procedure needs to be completed.
The procedure on a high level is:
Shutdown the two slave nodes
Shutdown the master node
Start the master node
Enable the Galeera cluster
Start the two slave nodes
Shutdown the two slave nodes
Before shutting down the two slave nodes it is recommended to stop running Trilio services, the RabbitMQ server, and the MariaDB on the nodes.
Afterward, the nodes can be shut down.
Shutdown the master node
Before shutting down the master node it is recommended to stop running Trilio services, the RabbitmQ server, the MariaDB, the wlm-cron and the VIP(s) resource in Pacemaker.
Afterward, the node can be shut down.
Start the master node
The first server that is getting booted will be the master node. It is highly recommended that the old master node will be booted first again.
Not booting the old mater node first again can lead to data loss when the Galeera Cluster is restarted.
Enable the Galeera cluster
Login into the freshly started master node and run the following command. This will restart the Galeera cluster with this node as master.
Start the slave nodes
After the master node has been booted and the Galeera cluster started the remaining nodes can be started and will automatically rejoin the Trilio cluster.
Last updated