Kdb+ interface to Kafka

20 Jun 2017 | , , , , ,
Share on:


By Sergey Vidyuk

Kx recently open-sourced a Kafka interface to kdb+ on GitHub under the Apache2 license that eases integration of applications using Kafka with kdb+. Here is a description of the interface.

Kdb+kafka = kfk library

Libkfk is a comprehensive interface to Apache Kafka for kdb+ based on the librdkafka library – similar to 20+ other language bindings.

This interface provides yet another way of combining the ingest, real-time and historical data processing capabilities of kdb+ with external data streams and consumers.

Notable features include:
* Multiple clients, publishers and subscribers in the same process
* Minimal library overhead to utilize the multi-threaded C API
* Message data decoded and delivered
* Metadata, statistics and error list exposed for full production control and monitoring

Kfk library strives to expose the maximum amount of information to the user to give them control and understanding of the data flow within and beyond the application. To spot optimization possibilities remember that events within your application are a data feed in themselves for storing and facilitating further analysis.

What is Kafka?

Tibco RV, RabbitMQ, MQ, 29West LBM(UMS), JMS….If any of these sound familiar, you can just append another one to the list: Kafka.

At a high level, Kafka is just another messaging system which uses brokers to decouple producers and consumers of data. Such environments have existed in the financial world for a long time. Kafka differs in that it provides production quality, performant systems with free, open-source code.

kdb+ interface to Kafka

illustration by Sergey Vidyuk


Kafka Terminology

If you are unfamiliar with enterprise messaging systems, it is useful to review a summary of the main concepts. This might make library usage easier due to almost direct mapping from the API to the main concepts in Kafka. 

  • Maintains feeds of messages in topics
  • Contains processes that publish messages to topics which are called producers
  • Has processes called consumers that subscribe to topics and process the feed of published messages  
  • Runs as a cluster of a lot more servers each of which is called a broker
  • Topics are broken up into ordered commit logs called partitions. Ordering is only guaranteed within the partition for a topic
  • Multiple consumers can subscribe to the topic and are responsible for managing their own offset within the partition
  • Consumers can be organized into consumer groups
  • Delivery is at least once


Here are a few minimalistic examples of producer/consumer and the kind of messages that a user would expect in their application. 

Simple Producer

\l kfk.q
// setup config dictionary
kfk_cfg:(!) . flip(
  (`metadata.broker.list;`localhost:9092);		// mandatory
producer:.kfk.Producer[kfk_cfg]			 // create producer
test_topic:.kfk.Topic[producer;`test;()!()]	 // create topic for producer

show "Publishing on topic:",string .kfk.TopicName test_topic;
.kfk.Pub[test_topic;.kfk.PARTITION_UA;string .z.p;""];
show "Published 1 message";
show producer_meta`topics;

Simple Consumer

\l kfk.q

kfk_cfg:(!) . flip(
    	(`metadata.broker.list;`localhost:9092);         // broker connection
        (`group.id;`0);				        // consumer group

// create consumer
// override data callback. Empty function by default.
// your despatch and processing logic goes here(and runs on main thread)
    data,::enlist msg;}
// subscribe to a topic “test” with automatic partitioning
.kfk.Sub[client;`test;enlist .kfk.PARTITION_UA];

show client_meta`topics;

Normal data message

mtype    | `
topic    | `test
partition| 0i
offset   | 1065
msgtime  | 0Np
data     | "2017.06.07D16:08:51.805544000"
key      | `byte$()
rcvtime  | 2017.06.07D16:08:51.866241000

End of batch message

mtype    | `_PARTITION_EOF
topic    | `test
partition| 0i
offset   | 1076
msgtime  | 0Np
data     | ""
key      | `byte$()
rcvtime  | 2017.06.07D16:08:52.881488000


In this blog post you have learned some of the main features of libkfk to interface with Kafka, a tiny bit about what Kafka is, and the main terminology to feel comfortable using the library. Minimalistic examples of producer and consumer would be a good starting point to try libkfk in your setup. We encourage you to do that and both file issues and suggest changes on GitHub. It is easy to get started with a local test Kafka broker by following these “setup testing instance” instructions.

Code improvements, examples, documentation and steps to make setup easier are very welcome!



Sergey Vidyuk is a senior software developer and expert in data management. He works in the Kx R&D team on the next generation of kdb+/q. Prior to joining Kx, Sergey was the CTO at Saxo Bank in London and previously worked for many years as a senior developer in other financial institutions.


Kx Product Insights: Inter-Trading Alert

5 Dec 2018 | , , ,

by Aidan O’Neill Kx has a broad list of products and solutions built on the time-series database platform kdb+ that capitalize on its high-performance capabilities when analyzing very large datasets. Kx for Surveillance is a robust platform widely used by financial institutions for monitoring trades for regulatory compliance. The Surveillance platform instantly detects known trading […]

Kx extends relationship with NASA Frontier Development Lab and the SETI Institute

The Exploration of Space Weather at NASA FDL with kdb+

4 Dec 2018 | , , , ,

Our society is dependent on GNSS services for navigation in everyday life, so it is critically important to know when signal disruptions might occur. Physical models have struggled to predict astronomic scintillation events. One method for making predictions is to use machine learning (ML) techniques. This article describes how kdb+ and embedPy were used in the ML application.