By Ferenc Bodon
DBOps is a public benchmark comparing the performance of a number of open-source database tools and technologies. It defines a set of reproducible tests to measure their performance executing typical queries that a data scientist would run. The technologies it compares against include;
- Python solutions: Dask, Modin, Panda, Pydata and Polars (Arrow),
- R-based solutions: Datatable, Dplyr, DuckDB and H2o
- Other solutions: Clickhouse, JuliaDF and Spark
The KX Insights integrated data management and streaming analytics platform for real-time decision-making is underpinned by a powerful time series database whose performance we evaluated against the DBOps benchmark. The tests included mathematical/statistical calculations, group-by operations, joins and advanced queries that included complex calculations including “top two”, regression, median, etc. In all cases, the data was held entirely in-memory and conducted over a range of data sets to show performance under different profiles as illustrated below:
The tests were conducted in June 2021, running on GCP VM n2-standard-32 (128 GB RAM, 32 cores, Intel(R) Xeon(R) CPU @ 2800 MHz). The implementation of each of the compared solutions was conducted independently by subject matter experts in those technologies to guarantee a fair comparison and avoid participating with suboptimal code.
Summary results illustrate how KX excels in all types of queries in the DBOps benchmark:
- First in 17 out of 18 categories
- First in 66% of all queries
- Average speed was order(s) of magnitude higher than several other solutions.
The tests showed some particularly interesting results against individual technologies. For instance, KX was up to 30 times faster than Pandas across all categories. The distribution of winnings is depicted below:
The DBOps benchmark focuses on in-memory query speed. It is important to note, however, that KX also excels at ingesting and querying both high-velocity streaming data and high-volume data at rest. Please visit STAC M3 tests for more details on its world records in query speeds on large volumes of data.
For more information or to discuss the results of the DBOps benchmark in further detail please contact email@example.com