Market Data Magic With Kdb Insight SDK

Market data magic with kdb Insights SDK

作者

Ryan Siegler

Data Scientist

ポイント

  1. Enable seamless analytics across real-time and historical time-series data using a single engine.
  2. Modular components like microservices, REST APIs, and CI/CD-ready packaging empower developers to build, deploy, and manage custom analytics pipelines and applications.
  3. Built for modern DevOps environments, it supports containerized deployment and Kubernetes orchestration across AWS, GCP, and Azure.

Imagine building a real-time analytics platform that ingests, processes, and analyzes billions of events per day, without wrestling with a tangled web of open-source tools. No more late nights debugging Kafka-Spark-Redis pipelines. No more glue code. Just pure, high-performance streaming, all in one place.

In this blog, I will introduce kdb Insight SDK, a comprehensive solution that enables you to deploy, scale, and manage real-time data pipelines with a single, unified technology stack. We’ll explore its architecture, walk through a hands-on demo, and explore why it’s a game-changer for anyone building mission-critical, real-time applications.

Kdb Insight Sdk Architecture

At the heart of kdb Insights SDK lies a modern, microservices-driven architecture designed to make real-time data engineering both powerful and approachable. Unlike traditional setups that require integrating a patchwork of open-source tools, kdb Insights SDK delivers a unified platform where every core capability is purpose-built and seamlessly connected.

Key components

Stream processor (SP)

The stream processor (SP) performs real-time ingestion of live data, often from sources like Kafka, REST APIs, or direct feeds, and applies high-speed transformations and analytics using the q language. It enables developers to perform complex computations, enrichments, or filtering to ensure only relevant, actionable data flows downstream.

Reliable transport and tickerplant

  • Reliable transport (RT) provides a fault-tolerant messaging backbone and should be chosen when reliability, replay are critical. It utilizes consensus protocols (such as Raft) to ensure data delivery for slow or disconnected consumers
  • kdb tickerplant (TP) (tick.q), offers ultra-low-latency for high-frequency environments that demand the absolute minimum latency. This should be used as an alternative to RT when you can manage failover and recovery yourself

Storage manager (SM)

The storage manager (SM) orchestrates the lifecycle of data, from hot in-memory storage to cold, cost-efficient archives. It ensures that real-time and historical data are always available, automatically managing data migration, compaction, and tiering.  It manages the real-time database (RDB) which stores the most recent, real-time data in memory, the intra-day database (IDB) which acts as an intermediate, on-disk tier for recent data within the current day, and the historical database (HDB) which stores historical data, typically partitioned by date to disk or object storage for cost efficiency.

Data access process (DAP)

The data access process (DAP) provides a federated, read-only interface to all data, regardless of where it resides (RDB, IDB, HDB). Whether data is streaming in real time or archived on disk, DAP lets you access it through a unified API  using q, SQL, or Python (via PyKX).

Service gateway (SG), resource coordinator (RC), and aggregator

  • The service gateway (SG) acts as a single entry point for all client queries and API requests, routing them to the appropriate microservices
  • The resource coordinator (RC) manages routing and orchestrates query execution by determining which DAPs are best suited to fulfill each request
  • The aggregator collects and merges partial results from multiple DAPs into a single, unified response, enabling seamless federation of data across distributed sources

Typical workflow

  1. Data is ingested via the stream processor, which performs real-time transformations before passing to either the reliable transport or tickerplant for sequencing and delivery.
  2. The storage manager ensures that data is efficiently persisted and tiered for both immediate and long-term access.
  3. When a query or API request is made, the service gateway receives the request and collaborates with the resource coordinator to identify the relevant data access process.
  4. Each data access process processes its portion of the data before the aggregator merges the partial results into a single response for clients.

This architecture enables unified, high-performance access to both real-time and historical data, all orchestrated within a single tool  without the integration complexities of traditional multi-tool stacks.

Example deployment

Let’s explore deploying, operating, and extending a real-time kdb Insights SDK architecture using the runbook-kdb-insights repository.

To begin, we will clone the repository and prepare the environment:

Bash
git clone https://github.com/RyanSieglerKX/runbook-kdb-insights.git
cd runbook-kdb-insights
mkdir -p data/db data/logs lic
chmod 777 -R data

We will also need to copy our free kdb license into the lic directory and configure the KXI CLI:

Bash
cp /path/to/k[4,c,x].lic lic/
# Edit ~/.insights/cli-config to:
[default]
usage = microservices
hostname = http://localhost:8080

Next, we will install KXI CLI and configure ‘~/.insights/cli-config’:

Bash
[default]
usage = microservices
hostname = http://localhost:8080

We need to ensure that Docker is installed and running with the WSL integration setting enabled. We also need to authenticate with the KX Docker registry:

Bash
docker login portal.dl.kx.com -u <user> -p <bearer token>

If you would like to query and call APIs with q, we will also need to install kdb+/q:

Bash
export QHOME=~/q
export PATH=~/q/l64/:$PATH

Next, we will build the microservices stack, launching the core architecture configured in compose.yaml. This includes the reliable transport (RT), storage manager (SM), data access (DA), and supporting services:

Bash
docker compose up

With our microservices installed, we can begin to visualize system health by monitoring with Grafana:

Bash
docker compose -f compose-metrics.yaml up

By opening localhost:3000 in a browser, we can view the Grafana dashboards, which will provide real-time visibility into throughput, latency, and resource usage:

Data ingestion

Once deployed, we can begin publishing data directly into the system using reliable transport (RT). In this instance, a simple CSV file named trade.csv:

Bash
kxi publish --mode rt --file-format csv --table trade --data config/trade.csv --endpoint :localhost:5002

We can also begin a synthetic Kafka feed via the stream processor (SP), which will ingest sample trade and quote data and apply real-time q transformations before parsing to downstream services:

Bash
docker compose -f compose-stream.yaml up

Data query

Let’s now explore some basic querying. To begin, we will use SQL via the CLI:

Bash
kxi query --sql 'SELECT * FROM trade'
kxi query --sql 'SELECT count(*) FROM quote'

We can also query using q:

Q(kdb+データベース)
q) gw:hopen `:localhost:5050
q) gw(`.kxi.sql;enlist[`query]!enlist"SELECT * FROM trade WHERE (sym = 'AAPL')";`;(0#`)!())

One of the more powerful features is the ability to deploy custom analytics as microservices,  exposing q functions as RESTful APIs.

There are two defined in the file ./custom/1.0.0/

The first, .example.daAPI: provides a simple function that multiplies a specified column in a given table:

Bash
curl -X POST http://localhost:8080/example/daAPI   -H 'Content-Type: application/json'   -d '{
    "table": "trade",
    "column": "price",
    "multiplier": 10
  }'

The second, .custom.aj performs an aj (as-of join) between the trades and quotes table for a given symbol:

Bash
curl -X POST http://localhost:8080/custom/aj   -H 'Content-Type: application/json'   -d '{
    "tradesTable": "trade",
    "quotesTable": "quote",
    "sym": "AAPL"
  }'

You can also call these APIs from q:

Q(kdb+データベース)
q) gw(`.custom.aj;(`tradesTable;`quotesTable;`sym)!(`trade;`quote;`AAPL);`;(0#`)!())

Finally, we can view the real-time logs for each microservice:

Bash
docker compose logs -f kxi-rt    # Reliable Transport
docker compose logs -f kxi-sm    # Storage Manager
docker compose logs -f kxi-da    # Data Access
docker compose logs -f sp-worker # Stream Processor

System cleanup

Bash
docker compose down --remove-orphans
./RESET_DB.sh

kdb Insights SDK offers a range of advantages that distinguish it from traditional, multi-tool data stacks, particularly for teams developing real-time, high-performance analytics applications at scale. Built on the kdb+ engine, it delivers a single, integrated platform for ingesting, processing, storing, and querying time-series data with native support for languages including q, SQL, and Python.

The result is a faster time-to-market for new analytics applications, more reliable and actionable insights, and the agility to adapt as business requirements change, all while simplifying your data stack and reducing the total cost of ownership.

Learn more about kdb Insights SDK and read my other blogs on kx.com.

 

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

Capital Markets

As a customer of KX for 10+ years, they knew they could rely on KX’s team and its real-time database to easily migrate into the cloud.

詳細を読む 概要 Japanese Bank


AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

当社のチームが以下の実現をサポートします:

  • ストリーミング、リアルタイム、および過去データに最適化された設計
  • エンタープライズ向けのスケーラビリティ、耐障害性、統合性、そして高度な分析機能
  • 幅広い開発言語との統合に対応する充実したツール群

専門担当者によるデモをリクエスト

*」は必須フィールドを示します

本フォームを送信いただくと、KXの製品・サービス、お知らせ、イベントに関する営業・マーケティング情報をお受け取りいただけます。プライバシーポリシーからお手続きいただくことで購読解除も可能です。当社の個人情報の収集・使用に関する詳しい情報については、プライバシーポリシーをご覧ください。

このフィールドは入力チェック用です。変更しないでください。

タイムシリーズ分野におけるG2認定リーダー