Kdb+ interface to Kafka

20 Jun 2017 | , , ,
Share on:


By Sergey Vidyuk

Kx recently open-sourced a Kafka interface to kdb+ on GitHub under the Apache2 license that eases integration of applications using Kafka with kdb+. Here is a description of the interface.

Kdb+kafka = kfk library

Libkfk is a comprehensive interface to Apache Kafka for kdb+ based on the librdkafka library – similar to 20+ other language bindings.

This interface provides yet another way of combining the ingest, real-time and historical data processing capabilities of kdb+ with external data streams and consumers.

Notable features include:
* Multiple clients, publishers, and subscribers in the same process
* Minimal library overhead to utilize the multi-threaded C API
* Message data decoded and delivered
* Metadata, statistics and error list exposed for full production control and monitoring

Kfk library strives to expose the maximum amount of information to the user to give them control and understanding of the data flow within and beyond the application. To spot optimization possibilities remember that events within your application are a data feed in themselves for storing and facilitating further analysis.

What is Kafka?

Tibco RV, RabbitMQ, MQ, 29West LBM(UMS), JMS….If any of these sound familiar, you can just append another one to the list: Kafka.

At a high level, Kafka is just another messaging system that uses brokers to decouple producers and consumers of data. Such environments have existed in the financial world for a long time. Kafka differs in that it provides production quality, performant systems with free, open-source code.

kdb+ interface to Kafka

illustration by Sergey Vidyuk


Kafka Terminology

If you are unfamiliar with enterprise messaging systems, it is useful to review a summary of the main concepts. This might make library usage easier due to almost direct mapping from the API to the main concepts in Kafka. 

  • Maintains feeds of messages in topics
  • Contains processes that publish messages to topics which are called producers
  • Uses processes called consumers that subscribe to topics and process the feed of published messages  
  • Runs as a cluster of a lot more servers each of which is called a broker
  • Topics are broken up into ordered commit logs called partitions. Ordering is only guaranteed within the partition for a topic
  • Multiple consumers can subscribe to the topic and are responsible for managing their own offset within the partition
  • Consumers can be organized into consumer groups
  • Delivery is at least once


Here are a few minimalistic examples of producer/consumer and the kind of messages that a user would expect in their application. 

Simple Producer

\l kfk.q
// setup config dictionary
kfk_cfg:(!) . flip(
  (`metadata.broker.list;`localhost:9092);		// mandatory
producer:.kfk.Producer[kfk_cfg]			 // create producer
test_topic:.kfk.Topic[producer;`test;()!()]	 // create topic for producer

show "Publishing on topic:",string .kfk.TopicName test_topic;
.kfk.Pub[test_topic;.kfk.PARTITION_UA;string .z.p;""];
show "Published 1 message";
show producer_meta`topics;

Simple Consumer

\l kfk.q

kfk_cfg:(!) . flip(
    	(`metadata.broker.list;`localhost:9092);         // broker connection
        (`group.id;`0);				        // consumer group

// create consumer
// override data callback. Empty function by default.
// your despatch and processing logic goes here(and runs on main thread)
    data,::enlist msg;}
// subscribe to a topic “test” with automatic partitioning
.kfk.Sub[client;`test;enlist .kfk.PARTITION_UA];

show client_meta`topics;

Normal data message

mtype    | `
topic    | `test
partition| 0i
offset   | 1065
msgtime  | 0Np
data     | "2017.06.07D16:08:51.805544000"
key      | `byte$()
rcvtime  | 2017.06.07D16:08:51.866241000

End of batch message

mtype    | `_PARTITION_EOF
topic    | `test
partition| 0i
offset   | 1076
msgtime  | 0Np
data     | ""
key      | `byte$()
rcvtime  | 2017.06.07D16:08:52.881488000


In this blog post, you have learned some of the main features of libkfk to interface with Kafka, a tiny bit about what Kafka is, and the main terminology to feel comfortable using the library. Minimalistic examples of producer and consumer would be a good starting point to try libkfk in your setup. We encourage you to do that and both file issues and suggest changes on GitHub. It is easy to get started with a local test Kafka broker by following these “setup testing instance” instructions.

Code improvements, examples, documentation, and steps to make setup easier are very welcome!



Sergey Vidyuk is a senior software developer and expert in data management. He works in the Kx R&D team on the next generation of kdb+/q. Prior to joining Kx, Sergey was the CTO at Saxo Bank in London and previously worked for many years as a senior developer in other financial institutions.


Fusion: New Interfaces

14 Jul 2020 | , ,

Fusion is a critical component of the Kx streaming analytics platform. As well as an overview of the new functionality in Fusion this blog gives an outline of the motivation behind the initiative and why Kx will continue to support it.