TrilioVault 4.1 HF1 Release Notes

Changelog

Enhancements

Support for passwordless SMTP servers

TrilioVault for Openstack supports sending notification emails upon succeeded or failed backup/restore jobs. The required SMTP server configuration enforced the usage of a password for the SMTP user.

A password is no longer necessary when the SMTP server doesn’t need it.

Support for Multi-Attach Volumes

Cinder supports a Volume Type, which allows attaching the same Volume to multiple instances simultaneously. Backups and Restore for this Volume Type failed. TrilioVault for Openstack is now providing base support for this Volume Type.

Cinder Boot Volumes with Multi-Attach activated are not yet supported.

This only allows the backup and restoration of Multi-Attach Volumes. TrilioVault will handle the Volume like any single attached Volume for now. For example, a multi-attach volume connected to 2 VMs will get backed up and restored twice.

Increased the timeout for data transfer

TrilioVault for Openstack is tracking the progress of backups and restore using a tracking file. If this file is not getting updated within a defined timeframe, the TrilioVault data transfer fails. This timeframe got extended from 10 minutes to 20 minutes.

This value will become configurable in TVO 4.1 SP1

Misleading and expected errors are no longer shown in logs by default

TrilioVault for Openstack was logging many system error messages, which were actually expected and handled internally without impacting the actual functions of the solution.

These error messages were misleading in the normal troubleshooting process. These error messages aren’t logged anymore by default. They can be reactivated using the debug mode for logging.

Image upload timeout window has been increased and made configurable

The upload of TrilioVault backups is limited in time to prevent stalling workloads with a stuck upload process.

This timeout window has been increased from 10 hours to 48h by default and is configurable in the workloadmgr config file.

[DEFAULT]
max_wait_for_upload = 48

Restart wlm-workloads after setting this value.

Grace time to retrigger a Snapshot in case of deactivated Global Job Scheduler

When the Global Job Scheduler is deactivated, no backups are triggered. The Global Job Scheduler contains a grace time for missed Snapshots. All Snapshots that were supposed to be triggered within this grace time before activation of the Global Job Scheduler are retriggered.

This grace period is now configurable in the workloadmgr config file.

[global_job_scheduler]
misfire_grace_time = 600

After setting this value, restart the wlm-cron service.

Increased the amount of dmapi workers and made it configurable

In highly used environments, the dmapi worker got identified as a potential bottleneck. The amount of default workers used by a dmapi service has been increased to 16 and is configurable in the dmapi config file.

[DEFAULT]
dmapi_workers = 16

Restart the dmapi service after setting the configuration manually.

The upgrade process of RHOSP and Kolla Ansible will automatically set this value.

Added new settings in haproxy configuration for dmapi

Response times in highly used environments might be slow, leading to the dmapi service timing out in the haproxy connection. The default values of haproxy are not always suitable in that case.

The haproxy configuration for the dmapi service has been extended to the following values.

retries 5
timeout http-request 10m
timeout queue 10m
timeout connect 10m
timeout client 10m
timeout server 10m
timeout check 10m
balance roundrobin
maxconn 50000

Restart the haproxy service after setting the configuration manually.

The upgrade process of RHOSP and Kolla Ansible will automatically set these values.

Added retries for Neutron API calls

The Openstack Neutron service is highly used up to the point that API calls are timing out. TrilioVault backups and restores failed when any Neutron API call timed out.

TrilioVault will now retry Neutron API calls three times before failing a backup or restore.

Added retries and rescan for mounting temporary Volumes using Cinder Storages with multipathing activated

It was observed in multipathing environments that sometimes backups failed due to errors with the temporary Cinder Volumes during the following actions:

  • Create Cinder Volume out of Cinder Snapshot

  • Mount Cinder Volume to Compute Node

  • Unmount Cinder Volume from Compute Node

  • Delete Cinder Volume

During these operations in multipath environments, errors are now handled by rescanning the connected devices and retrying the internal commands.

The amount of retries is configurable in the tvault-contego config file.

[cinder]
http_retries = 10

Restart the tvault-contego service after manually setting the value.

The upgrade process of RHOSP and Kolla Ansible will automatically set these values.

Enhanced logging of TrilioVault for Openstack GUI

The TrilioVault for Openstack GUI uses the admin account to secure access to the features and functionalities located on the TrilioVault appliance. The following events are now getting logged by the TrilioVault Appliance:

  • Login attempts

  • Logout events

  • Password changes for the admin user

Added text banner to TrilioVault for Openstack GUI login page

The TrilioVault for Openstack login page can now is extendable to contain a text banner. This text banner is configurable on the TrilioVault appliance by editing the banner yaml file located under:

/etc/tvault-config/banner.yaml

The content of the file looks as follows:

header:
header_color: blue
body_text_color: "#DC143C"
body_text:
header_font_size: 25px
body_text_font_size: 22px

Restart tvault-config after changing the banner to activate it.

Fixed Bugs and issues

Local job scheduler stays disabled after workload creation with enabled job scheduler

An issue got fixed for rare occasions in which the status of the local job scheduler of a single workload was disabled, despite the workload created with an enabled job scheduler.

Global Job Scheduler is shown as active even when wlm-cron service is down

An issue got fixed for the Global Job Scheduler returning enabled or disabled even when the wlm- cron service is deactivated. The status returned in this scenario is now an error message showing the wlm-cron service status.

The documentation link available inside the TrilioVault Appliance was still pointing to the old outdated documentation webpage. The link has been updated to point towards the correct documentation.

Restore VMs into different Availability Zone failed

An issue got fixed, which prevented the restoration of VMs into different Availability Zones in the case of the original Availability Zone no longer being available.

Stopping workloadmgr service and all ongoing worker tasks

An issue got fixed, which lead to stall service jobs being left behind upon restart of workloadmgr services.

Mounting of LVM configured disks into the FRM failed

An issue got fixed, which prevented the correct mounting and access of Volumes partitioned and configured by LVM.

Upload of Backups failed intermittently using S3

An issue got fixed, which lead to a race condition between upload threads in the S3 fuse plugin, which led to backups failing during the upload phase.

Multipathing not enabled by default in the Data-Mover container used in RHOSP and Kolla Ansible Openstack

An issue got fixed, which lead to multipathing not being enabled in the Data-Mover container used by RHOSP and Kolla Ansible.

Upgrading to 4.1 HF1 will automatically activate multipathing where feasible.

An inaccessible S3 mount point got created when the S3 endpoint is not available during deployment or configuration

An issue got fixed, which lead to the creation of a TrilioVault mount point, even when the provided S3 backup target is not reachable during deployment or configuration.

The deployment will still succeed, but the tvault-object-store service will be in a failed state.

Trustee role not correctly inherited from user groups to users

An issue got fixed, which prevented the detection of the TrilioVault trustee role for a user, who had this role inherited from a user group.

The configurator always trying to use Keystone internal endpoint

An issue got fixed, during which the chosen endpoint type did not get honored for Keystone and the configurator always reached out to the Keystone internal endpoint.

The following value can be set in the api-paste ini file located under:

/etc/workloadmgr/api-paste.ini

[filter:authtoken]
interface = internal

Afterward the wlm-workloads service needs to be restarted.

It is recommended to reconfigure the appliance to activate the fix

Disk integrity check failing with a false positive

An issue got identified, which leads to the disk integrity check failing, although there is no data loss.

Snapshots with a failed disk integrity check are currently no longer failing and instead show a warning in the log files about the failed disk integrity check.

A complete fix of the root cause is planned for 4.1 SP1.

Workload creation and Backup creation failed due to Latin characters like á

An issue got identified in which Latin characters like á did lead to a Workload not being created or a backup not succeeding.

This hotfix implements support of Latin characters for the following:

  • Calendar shown and used during workload creation for the job scheduler

  • Name and description of security groups

Full support for Latin characters comes in 4.1 SP1

Backups for multipath environments using the FC protocol failed

An issue got fixed, which prevented successful backups in environments using multipathing with the FC storage protocol.