Tech Talk: Thoughts on Tables in kdb+

29 Nov 2016 | ,
Share on:

by Paul Kerrigan

Kx engineer Paul Kerrigan has been working with kdb+ for the past year, since he joined Kx. He has experience implementing kdb+ in the areas of risk management, pharmaceutical manufacturing and retail analytics predictive modeling. In this blog post about programming in kdb+ Paul tackles the topic of tables. 

The existence of tables as a native type in kdb+ provides an elegant, powerful framework for working with vast amounts of diverse data. Paul’s blog post is a good way to get a better understanding of how tables work in kdb+.  

Editor’s note: All code snippets shown are created in the Kx Analyst development environment.

At its simplest, a table is a convenient way of expressing relational data in a human-readable format. Indeed, in many languages, this is all a table implies – a series of rows of data stored either in memory or on-disk for retrieval.

In kdb+, however, a table represents an interesting and highly malleable data structure. The implementation is elegant – a table is a collection of named columns, implemented as a dictionary of vectors. This representation is the source of the t`c notation for accessing all values in a column `c of a given table t, and, to the experienced kdb+ developer, gives rise to the idea of simple table arithmetic – if all non-key columns of a table are numeric, it is possible to perform all vector-enabled arithmetic operations on the rows or columns of that table as entities – for example:

pk-initialise-by

pk-normalise

pk-its-also-possible

A table without a key column can also be thought of as a list or vector of dictionaries whose keys are represented by the table columns – this view is readily reinforced by the fact that an individual table row can be retrieved by using standard kdb+ vector indices – t[0] will return the first row of the table as a dictionary, just as L[0] will return the first item of any list you care to define. A keyed table can be thought of in much the same fashion, except that there is an explicit I/O mapping (meaning a reference between a pair of lists, for example a dictionary defined as d:`x`y!5 11 means  “in the context of d, x gets 5 and y gets 11”) between the value of the key column and the values of the other columns of the table – it is hence a dictionary of dictionaries.

This is a useful frame of mind to be working from, primarily for the reason that all functions which you have written to apply to dictionaries are now entirely applicable to tables using adverbs, and vector operations are also valid. As a result, one can extend a function which applies atomically to apply to a list using any of the inbuilt adverbs in kdb+. Even better, it is possible to make dictionary operation apply to a vector of dictionaries. This is illustrated below:

pk-user-defined

A table can also be considered a vehicle for instructions; this means that instead of filling the cells with simple numbers and characters, they can be filled with references to functions or even other tables – this idea can be utilized to create a function which successively runs the instructions contained in a table based on the functions and arguments therein:

pk-we-begin

pk-we-define

As we have seen, when using tables the line between code and data can be blurred, as happens so often in functional programming – which we will discuss in a follow-up blog in this series.

 

 

SUGGESTED ARTICLES