Benchmarking kdb+ on Raspberry Pi

24 May 2017 |
Share on:

By Hugh Hyndman

A couple of weeks ago, a business associate generously gave me a Raspberry Pi 3. It came with a 4-core processor, 1GB of memory, and 32GB Micro SDHC card (i.e., the card used in phones and cameras).

Given that kdb+’s memory footprint is little over 500KB, it seemed like a perfect opportunity to experiment with how well kdb+ runs on this small box. So, I installed Raspbian, a Debian-based OS, on the Raspberry Pi, as well as the 32-bit evaluation version of kdb+ 3.4.

I wanted to solve a problem that would be in kdb+’s sweet spot, that is, something to do with low-latency, high-performance time-series processing and analytics. I recalled a benchmark produced by InfluxData, where they had provided comparisons of InfluxDB versus their competition, such as Cassandra, ElasticSearch, MongoDB, and OpenTSDB. Fortunately, they made public (on GitHub) some Go-based code that generates test data, performs ingestion, and invokes pre-defined queries. It seemed like a good starting point to see how kdb+ on a Raspberry Pi compares to other products running on servers.

Data Ingestion

InfluxData’s data generator simulates operational metrics coming from a server farm, and results in a number of time series, organized by various infrastructure categories, such as CPU, disk, kernel, and network performance statistics. Using their tools, I generated data for 1,000 servers, with a 10-second measurement over a period of a day. I mapped each one of these categories to a separate kdb+ table.

The following is a summary of the columns and sample rows for one of the tables: cpu. This table, as generated, is completely denormalized and consumes too much space. Instead, I would have taken the repeated values in columns such as region, datacenter, etc., placed them in a separate hosts table, and then used table joins.

I used kdb+ on my Mac for the initial tests and generated roughly 77 million rows spread evenly across 9 tables, resulting in 8.64 million records per table. That is small for kdb+ standards, but given that I was going to run some tests on the Raspberry Pi, I figured this size would be sufficient. I chose the cpu table as my focus for ingestion and queries.

I started a kdb+ process on the Raspberry Pi, opened a single socket connection to it from my Mac, and proceeded to load the data. Over my office LAN, running 100 Mbps, the ingestion rate was roughly 60,000 rows per second into the kdb+ real-time database. The write-down rate to the Raspberry Pi’s Micro SDHC storage was 15,000 rows per second.

I didn’t bother to compress the data with kdb+’s table-specific or column-specific features, but the speed of writing to the Micro SDHC would have been much improved if I had compressed the data to what I estimate would be 15% of the original size.

Preparing the Queries

The published benchmark against Cassandra documents three queries. These queries are all quite similar in functionality, each building on each other to increase the volume of data to be processed. They are described as follows:

  1. For a random host, determine the maximum usage_user across random 1-hour intervals, grouped by minute,
  2. Same as 1, except across 12-hour intervals,
  3. Same as 2, except for 8 hosts.

Since all of the benchmark queries narrowed the search to a limited set of hosts, I placed a parted-attribute on the hostname column.

It turns out that I only need to ingest data for the three columns required by the queries: hostname, ts and usage_user instead of the 20 that were generated; it would have made no difference in query performance since kdb+ is a column-store database and the number of columns has no impact on query performance.

I implemented all three queries in a single kdb+ function, named run_query, which runs q-sql against the cpu table. The following is basically what is invoked inside of that function. As you can see, there isn’t much of a leap required for a developer to understand the query.

select max usage_user by ts.minute, hostname from cpu where hostname in hosts, ts within tsrange

Instead of going through the pain of putting together some test harnesses to simulate query load from multiple clients, I thought I would run the queries from the kdb+ console to get a true indication of database performance – not cluttered with network latency.

To do this, I needed to generate a list of random query parameters on which to apply to my query function. I wrote a function called gen_parms, whose listing is below.

MAXHOSTS:1000 // Number of hosts in generated data
HOSTNAMES:`$"host_" ,/: string each til MAXHOSTS; // Symbols enumerating all hosts in table
EPOCH:2016.01.01D0 // Data starts at this timestamp
// @desc Generates random set of host and timestamp range values to be used as query parameters
// @param nparms {long} Specifies the number of parameter sets to generate
// @param aggdur {ts}   Specifies a timespan of interest for aggregation
// @param nhosts {long} Specifies the number of random host numbers to generate
// @return {table}      Returns a table with the following columns:
//      hosts - list of host names of length
//      range – list of timestamp range pairs
    hostnums:(nparms,nhosts)#(nparms*nhosts)?MAXHOSTS; // 2D array of random host numbers
    hosts:HOSTNAMES[hostnums]; // Map to host names resulting in array of names
    startts:EPOCH+nparms?1D0-aggdur; // Generate random start times
    endts:startts+aggdur-1; // Add the aggregation duration (less 1 nanosecond)
    range:startts,'endts; // Concatenate each start & end ts to make a list of timestamp pairs
    ([] hosts;range) // Return table with hostnames and timestamp range as columns

The above code is explained by the comments, except for perhaps the last line. Tables, whether they are in-memory or on-disk, are first class objects in kdb+. Here, I simply combined two lists and made them columns in a table. Below, I generate the parameters for 2,500 queries.

q)parms:gen_parms[2500; 0D01:00:00; 1) // Generate query parms for 1-hour window and 1 host

q)3#parms // Display the first three rows of the parms table
hosts    range
host_908 2016.01.01D12:54:05.797607023 2016.01.01D13:54:05.797607022
host_360 2016.01.01D00:59:02.973200418 2016.01.01D01:59:02.973200417
host_522 2016.01.01D01:21:18.532082308 2016.01.01D02:21:18.532082307

Running the Queries

Below, I run the queries from the kdb+ console. The “\t” at the beginning of the command asks for the total execution time (in milliseconds) to be reported.

q)\t run_query each parms // Iterate through each parameter row invoking the query

The total elapsed time taken to run the 2,500 queries was 1.532 seconds, so each query took an average of 0.613 milliseconds (or 613 microseconds).

In order to get parallelism and exploit the Pi’s 4 cores, I made use of kdb+’s built-in MapReduce function, which is called peach (for parallel each).

q)\t run_query peach parms; // Invoke the query across CPU cores

A definite improvement in throughput: the total elapsed time taken is 0.452 seconds, resulting in a throughput rate of 5,531 queries/second. The run below changes the parameters a bit, generating 2,500 query parameters with random 12-hour timestamp windows and random host names.

q)parms:gen_parms[2500; 0D12; 1] // Parameters for 2500 runs, 12-hour window, one host
q)\t run_query peach parms; // Invoke the query across CPU cores

In this case, it took 2.377 seconds to complete 2,500 queries, resulting in a throughput rate of 1,052 queries per second.

Query Results

To complete the tests, I ran all three queries making use of 1, 2, and 4 cores. The results are summarized in the chart below. Note that the y-axis indicates the number of queries that can be processed in one second – query throughput.

I ran the same queries on a server running kdb+ on CentOS Linux release 7.3.1611, running on 4 cores of a single E5-2667 v3 Xeon processor (20MB cache, 3.2 Ghz), 64GB of PC4-17000P DDR4-2133 RAM, and a SAS 10K 300GB disk. These results provide some perspective of the performance differences between a Raspberry Pi and a bigger box running kdb+. I placed a scale break in the Y-axis so not to dwarf the Raspberry Pi performance numbers.

As I expected, there is some degradation of performance as the volume of data increases, but all-in-all, I had to remind myself that I was running all these tests on a computer designed for hobbyists and students.

I also included the InfluxData results to tie things back to their published Cassandra comparison document although, in fairness, their query throughput included network latency for communications between the test harness clients and the database server (I ran both the client and kdb+ server on one box). The configuration they specified was: Ubuntu 16.04 LTS, with a single E5-1271v3 Xeon processor (quad-core, 8MB cache, 3.6GHz), 32GB of PC3-12800 1600MHz DDR3 RAM, and a single 1.2TB Intel 750 NVMe SSD.


In summary, kdb+ is well-known as the industry’s best performing time-series database on the workstation, in the data center, and in the cloud. It powers low-latency, high-volume solutions for 19 of the world’s top 20 investment banks.

This little experiment clearly demonstrates that kdb+ technology is also a perfect fit for edge computing, running on small appliances, including Industrial IoT gateways, hubs, equipment data acquisition units, and even on-board smart sensors. Combining its capabilities for event streaming, data filtering, and complex event processing rounds out the Kx technology offering positioning it to fit vertically and horizontally across all computing platforms.

Hugh Hyndman is the Director of IoT Solutions at Kx Systems, based out of Toronto. Hugh has been involved with high-performance big data computing for most of his career. His current focus is to help companies supercharge their software systems and products by injecting Kx technologies into their stack. If you are interested in OEM or partnership opportunities, please contact Hugh through .