T4K S3 Fuse Plugin performance
This page describes how to measure the overhead from the TVKs S3 Plugin
Measuring Trilio S3 FUSE Plugin overhead
Everything being equal, there needs to be a way to measure the overhead of TVKs FUSE implementation. Since T4K is a software-only solution, we cannot measure FUSE implementation overhead in absolute numbers. The overhead depends on various factors, so we need to establish a baseline first.
In our experiment, we provisioned two identical servers, both running CentOS. We name these serverscompute1
and compute2
. A dedicated 1 GB network connects the two servers to minimize interference from other traffic. compute1
is the client and compute2
is the server. A server in our context runs an s3 endpoint and an NFS endpoint. MinIO, perhaps the most easy-to-use s3 implementation, can be launched with a simple command, and hence we will useminio
as our object-store.
We provision two directories on the same disk oncompute2
. /mnt/nfs_share
is exported as the NFS share and/mnt/miniodata
is used forminio
service for storing all objects. compute1
mounts /mnt/nfs_share
and also runs the S3 FUSE plugin and provides a mount point /mnt/miniomnt
. Our objective is to measure our S3 FUSE overhead w.r.t to NFS and AWS S3 API.
As part of the experiment, we create a 100GB file. 100 GB is big enough to smooth out the fixed costs associated with protocols. Moreover, most of the backup images tend to be large, and 100 GB is a good data sampling for this experiment.
First, we copy 100GB to NFS share. The throughput we achieved is 2G/min.
When measuring AWS S3 performance, we should avoid using multipart upload. Multipart upload performs well for large files. However, T4K FUSE plugin chunks files and manages these chunks as objects in the FUSE plugin. So we split the 100GB file into 32 MB chunks using the Linuxsplit
command and copy individual segments into thesegments
directory. We then upload the segments directory using theaws s3 cp
command as shown below. The throughput we achieved here is 1.42GB/min, which means a 30% overhead compared to NFS results.
Next, we copy the 100GB file to the S3 FUSE mount. The throughput achieved is very similar to theaws s3 cp
command. Hence we can confidently conclude that the S3 FUSE implementation adds little to no overhead compared to pure aws api
.
The Trilio backup images are in QCOW2 format and the process of creating QCOW2 images is slightly different than copying a file to S3 FUSE mount. So it is prudent to measure theqemu-img convert
performance on the S3 FUSE mount. The throughput for generating QCOW2 is almost the same as copying a file to s3 FUSE mount.
In conclusion, AWS S3 API performance is subjective and varies from one object store implementation to other. The performance depends on various factors, including the number of disks, types of disks, raw processing power, replication factor, consistency, etc. Trilio S3 FUSE implementation adds little to no overhead when compared to core AWS S3 API calls. As a result, Trilio performs backups at wire transfer.