TrilioVault for Openstack supports sending notification emails upon succeeded or failed backup/restore jobs. The required SMTP server configuration enforced the usage of a password for the SMTP user.
A password is no longer necessary when the SMTP server doesn’t need it.
Cinder supports a Volume Type, which allows attaching the same Volume to multiple instances simultaneously. Backups and Restore for this Volume Type failed. TrilioVault for Openstack is now providing base support for this Volume Type.
TrilioVault for Openstack is tracking the progress of backups and restore using a tracking file. If this file is not getting updated within a defined timeframe, the TrilioVault data transfer fails. This timeframe got extended from 10 minutes to 20 minutes.
TrilioVault for Openstack was logging many system error messages, which were actually expected and handled internally without impacting the actual functions of the solution.
These error messages were misleading in the normal troubleshooting process. These error messages aren’t logged anymore by default. They can be reactivated using the debug mode for logging.
Image upload timeout window has been increased and made configurable
The upload of TrilioVault backups is limited in time to prevent stalling workloads with a stuck upload process.
This timeout window has been increased from 10 hours to 48h by default and is configurable in the workloadmgr config file.
[DEFAULT]max_wait_for_upload = 48
When the Global Job Scheduler is deactivated, no backups are triggered. The Global Job Scheduler contains a grace time for missed Snapshots. All Snapshots that were supposed to be triggered within this grace time before activation of the Global Job Scheduler are retriggered.
This grace period is now configurable in the workloadmgr config file.
[global_job_scheduler]misfire_grace_time = 600
In highly used environments, the dmapi worker got identified as a potential bottleneck. The amount of default workers used by a dmapi service has been increased to 16 and is configurable in the dmapi config file.
[DEFAULT]dmapi_workers = 16
Response times in highly used environments might be slow, leading to the dmapi service timing out in the haproxy connection. The default values of haproxy are not always suitable in that case.
The haproxy configuration for the dmapi service has been extended to the following values.
retries 5timeout http-request 10mtimeout queue 10mtimeout connect 10mtimeout client 10mtimeout server 10mtimeout check 10mbalance roundrobinmaxconn 50000
The Openstack Neutron service is highly used up to the point that API calls are timing out. TrilioVault backups and restores failed when any Neutron API call timed out.
TrilioVault will now retry Neutron API calls three times before failing a backup or restore.
It was observed in multipathing environments that sometimes backups failed due to errors with the temporary Cinder Volumes during the following actions:
Create Cinder Volume out of Cinder Snapshot
Mount Cinder Volume to Compute Node
Unmount Cinder Volume from Compute Node
Delete Cinder Volume
During these operations in multipath environments, errors are now handled by rescanning the connected devices and retrying the internal commands.
The amount of retries is configurable in the tvault-contego config file.
[cinder]http_retries = 10
The TrilioVault for Openstack GUI uses the admin account to secure access to the features and functionalities located on the TrilioVault appliance. The following events are now getting logged by the TrilioVault Appliance:
Password changes for the admin user
Added text banner to TrilioVault for Openstack GUI login page
The TrilioVault for Openstack login page can now is extendable to contain a text banner. This text banner is configurable on the TrilioVault appliance by editing the banner yaml file located under:
The content of the file looks as follows:
header:header_color: bluebody_text_color: "#DC143C"body_text:header_font_size: 25pxbody_text_font_size: 22px
An issue got fixed for rare occasions in which the status of the local job scheduler of a single workload was disabled, despite the workload created with an enabled job scheduler.
An issue got fixed for the Global Job Scheduler returning enabled or disabled even when the wlm- cron service is deactivated. The status returned in this scenario is now an error message showing the wlm-cron service status.
The documentation link available inside the TrilioVault Appliance was still pointing to the old outdated documentation webpage. The link has been updated to point towards the correct documentation.
An issue got fixed, which prevented the restoration of VMs into different Availability Zones in the case of the original Availability Zone no longer being available.
An issue got fixed, which lead to stall service jobs being left behind upon restart of workloadmgr services.
An issue got fixed, which prevented the correct mounting and access of Volumes partitioned and configured by LVM.
An issue got fixed, which lead to a race condition between upload threads in the S3 fuse plugin, which led to backups failing during the upload phase.
An issue got fixed, which lead to multipathing not being enabled in the Data-Mover container used by RHOSP and Kolla Ansible.
An issue got fixed, which lead to the creation of a TrilioVault mount point, even when the provided S3 backup target is not reachable during deployment or configuration.
The deployment will still succeed, but the tvault-object-store service will be in a failed state.
An issue got fixed, which prevented the detection of the TrilioVault trustee role for a user, who had this role inherited from a user group.
An issue got fixed, during which the chosen endpoint type did not get honored for Keystone and the configurator always reached out to the Keystone internal endpoint.
The following value can be set in the api-paste ini file located under:
[filter:authtoken]interface = internal
Afterward the wlm-workloads service needs to be restarted.
An issue got identified, which leads to the disk integrity check failing, although there is no data loss.
Snapshots with a failed disk integrity check are currently no longer failing and instead show a warning in the log files about the failed disk integrity check.
An issue got identified in which Latin characters like á did lead to a Workload not being created or a backup not succeeding.
This hotfix implements support of Latin characters for the following:
Calendar shown and used during workload creation for the job scheduler
Name and description of security groups
An issue got fixed, which prevented successful backups in environments using multipathing with the FC storage protocol.