kdb+ kafka interface

Kdb+ interface to Kafka

20 Jun 2017 | , , , , ,
Share on:

 

By Sergey Vidyuk

Kx recently open-sourced a Kafka interface to kdb+ on GitHub under the Apache2 license that eases integration of applications using Kafka with kdb+. Here is a description of the interface.

Kdb+kafka = kfk library

Libkfk is a comprehensive interface to Apache Kafka for kdb+ based on the librdkafka library – similar to 20+ other language bindings.

This interface provides yet another way of combining the ingest, real-time and historical data processing capabilities of kdb+ with external data streams and consumers.

Notable features include:
* Multiple clients, publishers and subscribers in the same process
* Minimal library overhead to utilize the multi-threaded C API
* Message data decoded and delivered
* Metadata, statistics and error list exposed for full production control and monitoring

Kfk library strives to expose the maximum amount of information to the user to give them control and understanding of the data flow within and beyond the application. To spot optimization possibilities remember that events within your application are a data feed in themselves for storing and facilitating further analysis.

What is Kafka?

Tibco RV, RabbitMQ, MQ, 29West LBM(UMS), JMS….If any of these sound familiar, you can just append another one to the list: Kafka.

At a high level, Kafka is just another messaging system which uses brokers to decouple producers and consumers of data. Such environments have existed in the financial world for a long time. Kafka differs in that it provides production quality, performant systems with free, open-source code.

kdb+ interface to Kafka

illustration by Sergey Vidyuk

 

Kafka Terminology

If you are unfamiliar with enterprise messaging systems, it is useful to review a summary of the main concepts. This might make library usage easier due to almost direct mapping from the API to the main concepts in Kafka. 

  • Maintains feeds of messages in topics
  • Contains processes that publish messages to topics which are called producers
  • Has processes called consumers that subscribe to topics and process the feed of published messages  
  • Runs as a cluster of a lot more servers each of which is called a broker
  • Topics are broken up into ordered commit logs called partitions. Ordering is only guaranteed within the partition for a topic
  • Multiple consumers can subscribe to the topic and are responsible for managing their own offset within the partition
  • Consumers can be organized into consumer groups
  • Delivery is at least once

Examples

Here are a few minimalistic examples of producer/consumer and the kind of messages that a user would expect in their application. 

Simple Producer

\l kfk.q
// setup config dictionary
kfk_cfg:(!) . flip(
  (`metadata.broker.list;`localhost:9092);		// mandatory
  (`statistics.interval.ms;`10000);
  (`queue.buffering.max.ms;`1);
  (`fetch.wait.max.ms;`10)
  );
producer:.kfk.Producer[kfk_cfg]			 // create producer
test_topic:.kfk.Topic[producer;`test;()!()]	 // create topic for producer

show "Publishing on topic:",string .kfk.TopicName test_topic;
.kfk.Pub[test_topic;.kfk.PARTITION_UA;string .z.p;""];
show "Published 1 message";
producer_meta:.kfk.Metadata[producer];
show producer_meta`topics;

Simple Consumer

\l kfk.q

kfk_cfg:(!) . flip(
    	(`metadata.broker.list;`localhost:9092);         // broker connection
        (`group.id;`0);				        // consumer group
        (`queue.buffering.max.ms;`1);
        (`fetch.wait.max.ms;`10);(`statistics.interval.ms;`10000));

// create consumer
client:.kfk.Consumer[kfk_cfg];
data:();
// override data callback. Empty function by default.
// your despatch and processing logic goes here(and runs on main thread)
.kfk.consumecb:{[msg]
    msg[`data]:"c"$msg[`data];
    msg[`rcvtime]:.z.p;
    data,::enlist msg;}
// subscribe to a topic “test” with automatic partitioning
.kfk.Sub[client;`test;enlist .kfk.PARTITION_UA];

client_meta:.kfk.Metadata[client];
show client_meta`topics;

Message
Normal data message

mtype    | `
topic    | `test
partition| 0i
offset   | 1065
msgtime  | 0Np
data     | "2017.06.07D16:08:51.805544000"
key      | `byte$()
rcvtime  | 2017.06.07D16:08:51.866241000

End of batch message

mtype    | `_PARTITION_EOF
topic    | `test
partition| 0i
offset   | 1076
msgtime  | 0Np
data     | ""
key      | `byte$()
rcvtime  | 2017.06.07D16:08:52.881488000

Summary

In this blog post you have learned some of the main features of libkfk to interface with Kafka, a tiny bit about what Kafka is, and the main terminology to feel comfortable using the library. Minimalistic examples of producer and consumer would be a good starting point to try libkfk in your setup. We encourage you to do that and both file issues and suggest changes on GitHub. It is easy to get started with a local test Kafka broker by following these “setup testing instance” instructions.

Code improvements, examples, documentation and steps to make setup easier are very welcome!

Sources
https://github.com/KxSystems/kafka
https://github.com/edenhill/librdkafka
https://kafka.apache.org/
https://www.slideshare.net/ConfluentInc/deep-dive-into-apache-kafka-66821186
https://www.slideshare.net/jhols1/kafka-atlmeetuppublicv2

 

Sergey Vidyuk is a senior software developer and expert in data management. He works in the Kx R&D team on the next generation of kdb+/q. Prior to joining Kx, Sergey was the CTO at Saxo Bank in London and previously worked for many years as a senior developer in other financial institutions.

© 2017 Kx Systems
Kx® and kdb+ are registered trademarks of Kx Systems, Inc., a subsidiary of First Derivatives plc.

SUGGESTED ARTICLES

Kx collaborating with Fintech startup chartiq

Collaboration: The Dominant Trend in Finance

13 Dec 2017 | , , , ,

In December we are re-blogging some of our favorite content from Kx partners and affiliated companies, starting with this article on the ChartIQ blog. ChartIQ is an agile FinTech company that sells an advanced HTML5 charting library used in technical data analysis, trading configurations and for charting in the capital markets industry. Kx offers a ChartIQ integration as an addition to our Dashboards. In Collaboration: The Dominant Trend in Finance, ChartIQ’s Hanni Chehak writes about the rise of FinTech companies, and the role collaboration plays as FinTech companies are increasingly disrupting the traditional banking sector.

Water system workers with kdb+ historical database

Kdb+ Use Case: Machine Learning Water System Maintenance Application

6 Dec 2017 | , , , ,

Kdb+ is being used much more widely in machine learning applications today. Its ability to quickly ingest and process data, particularly large, fragmented datasets, is one way that developers are adding kdb+ to their technology stack of artificial intelligence and machine learning tools.
For Australian kdb+ developer Sherief Khorshid, who also develops machine learning systems, incorporating kdb+ into a predictive maintenance application gave him the edge in a hackathon win that landed him a cash prize and a contract with the Water Corporation of Western Australia.

kdb+ FFI

Kdb+ FFI: Access external libraries more easily from q

22 Nov 2017 | , , ,

Following on from the hugely popular Python library and interface embedPy and PyQ, Kx has released an FFI as part of the Fusion for kdb+ interfaces. As with embedPy and PyQ, this FFI is open-sourced under the Apache 2 license.
The kdb+ FFI is a foreign function interface library for loading and calling dynamic libraries from q code. It has been adapted and expanded upon from a library originally written by Alex Belopolsky of Enlightenment Research. With the kdb+ FFI you can now call your favorite C/C++ libraries directly from q without the overhead of having to compile shared objects and load into q using the 2: command.