By Glenn Wright
In recognition of the growing importance of cloud computing to KX technology users, we have evaluated a number of cloud storage solutions with kdb+. Future papers in this series will look at migration to other cloud vendors.
If you are currently migrating, or considering migrating an historical kdb+ database (HDB) to the Cloud, you will want to read this white paper on the kdb+ developers’ site which looks at popular storage solutions available within the Amazon Web Services (AWS) Cloud. The paper also compares and contrasts the performance of kdb+ on EC2 instances versus physical hardware.
Three key areas which should be considered when migrating a kdb+ HDB and analytics workloads to the Amazon Elastic Compute Cloud (EC2) are:
1) Performance and functionality attributes expected from using kdb+, and the associated HDB, in EC2.
2) The capabilities of available storage solutions working in the EC2 environment.
3) Performance attributes of EC2, and benchmark results.
This paper shows that kdb+ can be successfully migrated to AWS EC2. This paper will also be of interest to those who are looking at using kdb+ on demand, or for those who are starting out on kdb+ for the first time and are choosing to host it in the Cloud.
Here are some observations from running benchmark tests on a number of storage solutions available in AWS EC2:
1) Elastic Block Store (EBS)
EBS can be used to store HDB data, and is fully compliant with kdb+. It supports all of the POSIX semantics required.
2) EFS (NFS)
EFS (NFS) can be used to store HDB data, and is fully compliant with kdb+. Consider constraining any use of EFS to temporary store and not for runtime data access.
3) Amazon Storage Gateway (File mode)
Amazon Storage Gateway (File mode) can be used to store HDB data, and is fully compliant with kdb+. The AWS gateway exhibits significantly high operational latency and may not be suitable for runtime data access.
4) MapR-FS
MapR includes MapR-FS, which can be used to store HDB data, and is fully compliant with kdb+.
The throughput of Mapr-FS is highly scalable. The operational latency of this solution is significantly lower than seen with EFS and Storage Gateway but is higher than other solutions evaluated.
5) Goofys
Open source Goofys can be used to store HDB data, and is fully compliant with kdb+. Operational latency is high. Metadata latency figures are in the order of 100-200× higher than EBS.
6) S3FS
Open source S3FS can be used to store HDB data, and is fully compliant with kdb+. Operational latency is high. Metadata latency figures are in the order of 100-200× higher than EBS.
7) S3QL
S3QL is written in Python and does not pass our compliance tests.
8) ObjectiveFS
ObjectiveFS can be used to store HDB data, and is fully compliant with kdb+. It is a simple and elegant solution for the retention of old data on a slower, lower-cost S3 archive, which can be replicated by AWS, geographically or within availability zones.
9) WekaIO Matrix
WekaIO Matrix can be used to store HDB data, and is fully compliant with kdb+. Metadata operational latency: whilst noticeably worse than EBS, is one or two orders of magnitude better than EFS, Storage Gateway, and all of the open-source products. The other elements that distinguish this solution from others are block-like low operational latencies for some metadata functions, and good aggregate throughputs for the small random reads with kdb+.
10) Quobyte
Quobyte can be used to store HDB data, and is fully compliant with kdb+. Quobyte offers a shared namespace solution based on either locally-provisioned or EBS-style storage.