kdb+/q taxi demo benchmark

Kx 1.1 billion taxi ride benchmark highlights advantages of kdb+ architecture

25 Jan 2017 | , , ,
Share on:

By Glenn Wright

Stellar performance in third-party benchmarks is a tradition at Kx, and now we can add a new benchmark to the list, the taxi ride benchmark developed by Mark Litwintschik.

This latest benchmark (available here) queries a 1.1 billion New York City taxi ride dataset. It captures fares, weather, pick up and drop off locations and times, among other things. This dataset which covers trip data between 2009 and the present day, is made publicly available by the New York City Taxi & Limousine Commission. One of the first individuals to visualize this data was Todd W. Schneider (available here).

In January, Mark tested Kx’s kdb+ database, with its built-in programming language q. Kx’s results were impressive. They clearly demonstrated the advantages of kdb+/q’s simple, scalable Lambda architecture, designed to manage massive quantities of real-time, streaming and historical data at record speeds. To the best of our knowledge, kdb+/q is the only technology that takes advantage of large-scale batch and stream-processing methods in one tool.

Mark has previously tested other software using the same dataset and queries, including Spark, MapD, Presto, PostgreSQL, RedShift and ElasticSearch. To compare these results with kdb+/q’s, it is important to weigh the software’s strengths as a computing engine for complex analytics and the type of hardware configurations it is typically used with.

Our results were over four orders of magnitude faster than any other CPU technology and comparable to GPU-based code. CPUs are still the mainstream choice for most data scientists who need to quickly load and analyze large datasets. Some are turning to GPUs for speed, but they are often finding that there is more coding complexity with GPUs complicating their analytics and increasing the resources required. Therefore it is impossible to make an apples-to-apples comparison between kdb+/q on CPU versus other technologies on GPUs.

In the financial services industry, where kdb+/q has been battle-tested on multi-petabyte datasets and machines, the applications are different than those currently adopting GPUs, they are more compute intensive and demand exceptional data ingest and manipulation speed. Financial firms are currently working almost exclusively with CPUs.

Here are some of Mark’s kdb+/q highlights from the benchmark:

Refreshing not to have a large configuration overhead.

A binary that works well right off the bat and can be tweaked with a couple of flags — a welcome relief from hours of tuning disparate configuration files spread over clusters.

Fastest query times of any CPU-based system, and a record set on one query.

Advantages when data locality is as optimized as it is in kdb+/q. It did an amazing job of breaking up the workload amongst the 1,024 CPU threads in the cluster — opening up a whole new world of optimizations.

To learn more about the Kx taxi demo, check out a video from the Intel Discovery Zone at SC’16 from InsideHPC (available here).

© 2018 Kx Systems
Kx® and kdb+ are registered trademarks of Kx Systems, Inc., a subsidiary of First Derivatives plc.

SUGGESTED ARTICLES

Sensors Working Overtime

11 Jan 2018 | , , , ,

Kx recently became an official team supplier to Aston Martin Red Bull Racing who is using Kx technology to handle mission-critical aerodynamic data. Below is an article published by Aston Martin Red Bull Racing on 11 January 2018 which explains the importance of this aero data, and working with Kx technology, for improving car performance for the F1 Team. It outlines how Kx’s in-memory, time series database software, capable of handling millions of events and measurements every second, provides a platform for analysing data on the RB14 and its successors.

kdb+ for industrial internet of things 4.0

Kx Insights: IIoT for Predictive Maintenance and Big Data

9 Jan 2018 | , , , , , ,

IIoT for predictive maintenance enables more extensive monitoring of equipment and processes at a much lower cost than traditional methods and delivers actionable warnings to prevent or minimize the consequences of an impending failure. Where IIoT for predictive maintenance is deployed in a well-designed program using Reliability Centered Maintenance (RCM) it will reduce surprise outages, lost production, extensive repairs, secondary damage and increase safety.

MiFID II and kdb+/Kx

MiFID II, The Day After

4 Jan 2018 | , , , , ,

Don’t pop the champagne yet for the launch of Europe’s Markets in Financial Instruments Directive (MiFID II). While January 3rd was the go-live date, and the industry is thought to be 80% compliant, there are major outstanding issues to contend with in 2018. Yesterday was simply a milestone marking the turn to the home stretch.
Although MiFID II has been in the works for many years, its Level II and Level III guidance was still being finished in the second half of 2017, which has meant an uncomfortable level of ambiguity for those who must implement MiFID II.