LogoLogo
T4O-4.2
T4O-4.2
  • About Trilio for OpenStack
  • Trilio for OpenStack Architecture
  • Trilio 4.2 Release Notes
    • T4O 4.2 HF1 Release Notes
    • T4O 4.2 HF2 Release notes
    • T4O 4.2 HF3 Release Notes
    • T4O 4.2 HF4 Release Notes
    • T4O 4.2 HF5 Release Notes
    • T4O 4.2.6 Release Notes
    • T4O 4.2.7 Release Notes
    • T4O 4.2.8 Release Notes
  • Deployment Guide
    • Compatibility Matrix
    • Requirements
    • Trilio network considerations
    • Preparing the installation
    • Spinning up the Trilio VM
    • Installing Trilio Components
      • Installing on RHOSP
      • Installing on Canonical OpenStack
      • Installing on Kolla Openstack
      • Installing on Ansible Openstack
      • Installing on TripleO Train
    • Configuring Trilio
    • Apply the Trilio license
    • Advanced Ceph configurations
      • Additions for multiple CEPH configurations
      • Additions for multiple Ceph users
    • Post Installation Health-Check
    • Uninstall Trilio
      • Uninstalling from RHOSP
      • Uninstalling from Canonical OpenStack
      • Uninstalling from Kolla OpenStack
      • Uninstalling from Ansible OpenStack
    • Upgrade Trilio
      • Upgrading on RHOSP
      • Upgrading on Canonical OpenStack
      • Upgrading on Kolla OpenStack
      • Upgrading on Ansible OpenStack
      • Upgrading on TripleO Train [CentOS7]
      • Upgrade Trilio Appliance
    • Workload Encryption with Barbican
    • Multi-IP NFS Backup target mapping file configuration
    • Enabling T4O 4.1 or older backups when using NFS backup target
    • Install workloadmgr CLI client
    • Switch Backup Target on Kolla-ansible
    • Switch NFS Backing file
  • Trilio Appliance Administration Guide
    • Set Trilio GUI login banner
    • Trilio Appliance Dashboard
    • Set network accessibility of Trilio GUI
    • Reconfigure the Trilio Cluster
    • Change the Trilio GUI password
    • Reset the Trilio GUI password
    • Reinitialize Trilio
    • Download Trilio logs
    • Change Certificates used by Trilio
    • Restart Trilio Services
    • Shutdown/Restart the Trilio cluster
    • Clean up Trilio database
  • User Guide
    • Workloads
    • Snapshots
    • Restores
    • File Search
    • Snapshot Mount
    • Schedulers
    • E-Mail Notifications
  • Admin Guide
    • Backups-Admin Area
    • Workload Policies
    • Workload Quotas
    • Managing Trusts
    • Workload Import & Migration
    • Disaster Recovery
      • Example runbook for Disaster Recovery using NFS
    • Migrating encrypted Workloads
    • Rebasing existing workloads
  • Troubleshooting
    • Frequently Asked Questions
    • General Troubleshooting Tips
    • Using the workloadmgr CLI tool on the Trilio Appliance
    • Healthcheck of Trilio
    • Important log files
  • API GUIDE
    • Workloads
    • Snapshots
    • Restores
    • File Search
    • Snapshot Mount
    • Schedulers
    • E-Mail Notification Settings
    • Workload Policies
    • Workload Quotas
    • Managing Trusts
    • Workload Import and Migration
Powered by GitBook
On this page
  • Verify no snapshots or restores are running
  • Identify the master node for the VIP(s) and wlm-cron service
  • Shutdown/Restart of a single node in the cluster
  • Stop the services on the node
  • Shutdown/Restart the node
  • Restarting the complete cluster node by node
  • Shutdown/Restart the complete cluster as a whole
  • Shutdown the two slave nodes
  • Shutdown the master node
  • Start the master node
  • Enable the Galeera cluster
  • Start the slave nodes

Was this helpful?

Export as PDF
  1. Trilio Appliance Administration Guide

Shutdown/Restart the Trilio cluster

To gracefully shutdown/restart the Trilio cluster the following steps are recommended.

Verify no snapshots or restores are running

It is recommended to verify that no snapshots or restores are running on the Trilio Cluster.

Stopping or restarting the Trilio cluster will cancel all running actively running backup or restore jobs. These jobs will be marked as errored after the system has come up again.

This can be verified using the following two commands:

workloadmgr snapshot-list --all=True
workloadmgr restore-list

Identify the master node for the VIP(s) and wlm-cron service

The Trilio cluster is using the pacemaker service for setting the VIP(s) of the cluster and controlling the active node for the wlm-cron service. The identified node will be the last to shut down in case that the whole cluster gets shut down.

This can be checked using the following command:

pcs status

In the following example is the master node the tvm1

pcs status
Cluster name: triliovault

WARNINGS:
Corosync and pacemaker node names do not match (IPs used in setup?)

Stack: corosync
Current DC: tvm3 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Aug 26 12:10:32 2021
Last change: Thu Aug 26 08:02:51 2021 by root via crm_resource on tvm1

3 nodes configured
8 resource instances configured

Online: [ tvm1 tvm2 tvm3 ]

Full list of resources:

 virtual_ip     (ocf::heartbeat:IPaddr2):       Started tvm1
 virtual_ip_public      (ocf::heartbeat:IPaddr2):       Started tvm1
 virtual_ip_admin       (ocf::heartbeat:IPaddr2):       Started tvm1
 virtual_ip_internal    (ocf::heartbeat:IPaddr2):       Started tvm1
 wlm-cron       (systemd:wlm-cron):     Started tvm1
 Clone Set: lb_nginx-clone [lb_nginx]
     Started: [ tvm1 ]
     Stopped: [ tvm2 tvm3 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Shutdown/Restart of a single node in the cluster

A single node in the cluster can be shut down or restarted without issues. All services will come up and the RabbitMQ and Galeera service will rejoin the remaining cluster.

When the master node gets shutdown or restarted the VIP(s) and the wlm-cron service will switch to one of the remaining cluster nodes.

Stop the services on the node

To speed up the shutdown/restart process it is recommended to stop the Trilio services, the RabbitMQ service, and the MariaDB service on the node.

The wlm-cron service and the VIP(s) are not getting stopped when only the master node gets rebooted or shut down. The pacemaker will automatically move the wlm-cron service and the VIP(s) to one of the remaining nodes.

systemctl stop wlm-api
systemctl stop wlm-scheduler
systemctl stop wlm-workloads
systemctl stop mysqld
rabbitmqctl stop

Shutdown/Restart the node

After the services have been stopped the node can be restarted or shut down using standard Linux commands.

reboot
shutdown

Restarting the complete cluster node by node

Restarting the whole cluster node by node follows the same procedure as restarting a single node, with the difference that each restarted node needs to be fully started again before the next node can be restarted.

Shutdown/Restart the complete cluster as a whole

When the complete cluster needs to get stopped and restarted at the same time the following procedure needs to be completed.

The procedure on a high level is:

  • Shutdown the two slave nodes

  • Shutdown the master node

  • Start the master node

  • Enable the Galeera cluster

  • Start the two slave nodes

Shutdown the two slave nodes

Before shutting down the two slave nodes it is recommended to stop running Trilio services, the RabbitMQ server, and the MariaDB on the nodes.

systemctl stop wlm-api
systemctl stop wlm-scheduler
systemctl stop wlm-workloads
systemctl stop mysqld
rabbitmqctl stop

Afterward, the nodes can be shut down.

shutdown

Shutdown the master node

Before shutting down the master node it is recommended to stop running Trilio services, the RabbitmQ server, the MariaDB, the wlm-cron and the VIP(s) resource in Pacemaker.

systemctl stop wlm-api
systemctl stop wlm-scheduler
systemctl stop wlm-workloads
systemctl stop mysqld
rabbitmqctl stop
pcs resource stop wlm-cron
pcs resource stop lb_nginx-clone

Afterward, the node can be shut down.

shutdown

Start the master node

The first server that is getting booted will be the master node. It is highly recommended that the old master node will be booted first again.

Not booting the old mater node first again can lead to data loss when the Galeera Cluster is restarted.

Enable the Galeera cluster

Login into the freshly started master node and run the following command. This will restart the Galeera cluster with this node as master.

galera_new_cluster

Start the slave nodes

After the master node has been booted and the Galeera cluster started the remaining nodes can be started and will automatically rejoin the Trilio cluster.

PreviousRestart Trilio ServicesNextClean up Trilio database

Last updated 1 year ago

Was this helpful?