kdb+/q taxi demo benchmark

Kx 1.1 billion taxi ride benchmark highlights advantages of kdb+ architecture

25 Jan 2017 | , , ,
Share on:

By Glenn Wright

Stellar performance in third-party benchmarks is a tradition at Kx, and now we can add a new benchmark to the list, the taxi ride benchmark developed by Mark Litwintschik.

This latest benchmark (available here) queries a 1.1 billion New York City taxi ride dataset. It captures fares, weather, pick up and drop off locations and times, among other things. This dataset which covers trip data between 2009 and the present day, is made publicly available by the New York City Taxi & Limousine Commission. One of the first individuals to visualize this data was Todd W. Schneider (available here).

In January, Mark tested Kx’s kdb+ database, with its built-in programming language q. Kx’s results were impressive. They clearly demonstrated the advantages of kdb+/q’s simple, scalable Lambda architecture, designed to manage massive quantities of real-time, streaming and historical data at record speeds. To the best of our knowledge, kdb+/q is the only technology that takes advantage of large-scale batch and stream-processing methods in one tool.

Mark has previously tested other software using the same dataset and queries, including Spark, MapD, Presto, PostgreSQL, RedShift and ElasticSearch. To compare these results with kdb+/q’s, it is important to weigh the software’s strengths as a computing engine for complex analytics and the type of hardware configurations it is typically used with.

Our results were over four orders of magnitude faster than any other CPU technology and comparable to GPU-based code. CPUs are still the mainstream choice for most data scientists who need to quickly load and analyze large datasets. Some are turning to GPUs for speed, but they are often finding that there is more coding complexity with GPUs complicating their analytics and increasing the resources required. Therefore it is impossible to make an apples-to-apples comparison between kdb+/q on CPU versus other technologies on GPUs.

In the financial services industry, where kdb+/q has been battle-tested on multi-petabyte datasets and machines, the applications are different than those currently adopting GPUs, they are more compute intensive and demand exceptional data ingest and manipulation speed. Financial firms are currently working almost exclusively with CPUs.

Here are some of Mark’s kdb+/q highlights from the benchmark:

Refreshing not to have a large configuration overhead.

A binary that works well right off the bat and can be tweaked with a couple of flags — a welcome relief from hours of tuning disparate configuration files spread over clusters.

Fastest query times of any CPU-based system, and a record set on one query.

Advantages when data locality is as optimized as it is in kdb+/q. It did an amazing job of breaking up the workload amongst the 1,024 CPU threads in the cluster — opening up a whole new world of optimizations.

To learn more about the Kx taxi demo, check out a video from the Intel Discovery Zone at SC’16 from InsideHPC (available here).

© 2018 Kx Systems
Kx® and kdb+ are registered trademarks of Kx Systems, Inc., a subsidiary of First Derivatives plc.


Retail range optimization with kdb+

Kx Retail Insights: The next generation of data-driven range optimization

20 Feb 2018 | , ,

Range optimization is one of the most important decisions for retailers in the digital age. The changing physical environment is a key driver in the need for intelligent range planning. Shrinking physical footprints require range reduction and optimization in store, whilst supplier integration online requires merchandising capability to personalize the ever growing assortment to the individual.

kdb+ utility to search codebase

Kdb+ Utilities: Q code Workspace Utilities

6 Feb 2018 | , ,

If you are a kdb+/q developer, you will find the workspace utilities created by Kx Managing Director and Senior Solution Architect Leslie Goldsmith to be a valuable resource. This is the first in a series of blog posts that give a quick introduction to several utilities available at Leslie Goldsmith’s GitHub. In this part of the series we look at an essential tool which contains routines for summarizing and searching the contents of a workspace, ws.