Market Data Magic With Kdb Insight SDK

Market data magic with kdb Insights SDK

Author

Ryan Siegler

Data Scientist

Key Takeaways

  1. Enable seamless analytics across real-time and historical time-series data using a single engine.
  2. Modular components like microservices, REST APIs, and CI/CD-ready packaging empower developers to build, deploy, and manage custom analytics pipelines and applications.
  3. Built for modern DevOps environments, it supports containerized deployment and Kubernetes orchestration across AWS, GCP, and Azure.

Imagine building a real-time analytics platform that ingests, processes, and analyzes billions of events per day, without wrestling with a tangled web of open-source tools. No more late nights debugging Kafka-Spark-Redis pipelines. No more glue code. Just pure, high-performance streaming, all in one place.

In this blog, I will introduce kdb Insight SDK, a comprehensive solution that enables you to deploy, scale, and manage real-time data pipelines with a single, unified technology stack. We’ll explore its architecture, walk through a hands-on demo, and explore why it’s a game-changer for anyone building mission-critical, real-time applications.

Kdb Insight Sdk Architecture

At the heart of kdb Insights SDK lies a modern, microservices-driven architecture designed to make real-time data engineering both powerful and approachable. Unlike traditional setups that require integrating a patchwork of open-source tools, kdb Insights SDK delivers a unified platform where every core capability is purpose-built and seamlessly connected.

Key components

Stream processor (SP)

The stream processor (SP) performs real-time ingestion of live data, often from sources like Kafka, REST APIs, or direct feeds, and applies high-speed transformations and analytics using the q language. It enables developers to perform complex computations, enrichments, or filtering to ensure only relevant, actionable data flows downstream.

Reliable transport and tickerplant

  • Reliable transport (RT) provides a fault-tolerant messaging backbone and should be chosen when reliability, replay are critical. It utilizes consensus protocols (such as Raft) to ensure data delivery for slow or disconnected consumers
  • kdb tickerplant (TP) (tick.q), offers ultra-low-latency for high-frequency environments that demand the absolute minimum latency. This should be used as an alternative to RT when you can manage failover and recovery yourself

Storage manager (SM)

The storage manager (SM) orchestrates the lifecycle of data, from hot in-memory storage to cold, cost-efficient archives. It ensures that real-time and historical data are always available, automatically managing data migration, compaction, and tiering.  It manages the real-time database (RDB) which stores the most recent, real-time data in memory, the intra-day database (IDB) which acts as an intermediate, on-disk tier for recent data within the current day, and the historical database (HDB) which stores historical data, typically partitioned by date to disk or object storage for cost efficiency.

Data access process (DAP)

The data access process (DAP) provides a federated, read-only interface to all data, regardless of where it resides (RDB, IDB, HDB). Whether data is streaming in real time or archived on disk, DAP lets you access it through a unified API  using q, SQL, or Python (via PyKX).

Service gateway (SG), resource coordinator (RC), and aggregator

  • The service gateway (SG) acts as a single entry point for all client queries and API requests, routing them to the appropriate microservices
  • The resource coordinator (RC) manages routing and orchestrates query execution by determining which DAPs are best suited to fulfill each request
  • The aggregator collects and merges partial results from multiple DAPs into a single, unified response, enabling seamless federation of data across distributed sources

Typical workflow

  1. Data is ingested via the stream processor, which performs real-time transformations before passing to either the reliable transport or tickerplant for sequencing and delivery.
  2. The storage manager ensures that data is efficiently persisted and tiered for both immediate and long-term access.
  3. When a query or API request is made, the service gateway receives the request and collaborates with the resource coordinator to identify the relevant data access process.
  4. Each data access process processes its portion of the data before the aggregator merges the partial results into a single response for clients.

This architecture enables unified, high-performance access to both real-time and historical data, all orchestrated within a single tool  without the integration complexities of traditional multi-tool stacks.

Example deployment

Let’s explore deploying, operating, and extending a real-time kdb Insights SDK architecture using the runbook-kdb-insights repository.

To begin, we will clone the repository and prepare the environment:

Bash
git clone https://github.com/RyanSieglerKX/runbook-kdb-insights.git
cd runbook-kdb-insights
mkdir -p data/db data/logs lic
chmod 777 -R data

We will also need to copy our free kdb license into the lic directory and configure the KXI CLI:

Bash
cp /path/to/k[4,c,x].lic lic/
# Edit ~/.insights/cli-config to:
[default]
usage = microservices
hostname = http://localhost:8080

Next, we will install KXI CLI and configure ‘~/.insights/cli-config’:

Bash
[default]
usage = microservices
hostname = http://localhost:8080

We need to ensure that Docker is installed and running with the WSL integration setting enabled. We also need to authenticate with the KX Docker registry:

Bash
docker login portal.dl.kx.com -u <user> -p <bearer token>

If you would like to query and call APIs with q, we will also need to install kdb+/q:

Bash
export QHOME=~/q
export PATH=~/q/l64/:$PATH

Next, we will build the microservices stack, launching the core architecture configured in compose.yaml. This includes the reliable transport (RT), storage manager (SM), data access (DA), and supporting services:

Bash
docker compose up

With our microservices installed, we can begin to visualize system health by monitoring with Grafana:

Bash
docker compose -f compose-metrics.yaml up

By opening localhost:3000 in a browser, we can view the Grafana dashboards, which will provide real-time visibility into throughput, latency, and resource usage:

Data ingestion

Once deployed, we can begin publishing data directly into the system using reliable transport (RT). In this instance, a simple CSV file named trade.csv:

Bash
kxi publish --mode rt --file-format csv --table trade --data config/trade.csv --endpoint :localhost:5002

We can also begin a synthetic Kafka feed via the stream processor (SP), which will ingest sample trade and quote data and apply real-time q transformations before parsing to downstream services:

Bash
docker compose -f compose-stream.yaml up

Data query

Let’s now explore some basic querying. To begin, we will use SQL via the CLI:

Bash
kxi query --sql 'SELECT * FROM trade'
kxi query --sql 'SELECT count(*) FROM quote'

We can also query using q:

Q (kdb+ database)
q) gw:hopen `:localhost:5050
q) gw(`.kxi.sql;enlist[`query]!enlist"SELECT * FROM trade WHERE (sym = 'AAPL')";`;(0#`)!())

One of the more powerful features is the ability to deploy custom analytics as microservices,  exposing q functions as RESTful APIs.

There are two defined in the file ./custom/1.0.0/

The first, .example.daAPI: provides a simple function that multiplies a specified column in a given table:

Bash
curl -X POST http://localhost:8080/example/daAPI   -H 'Content-Type: application/json'   -d '{
    "table": "trade",
    "column": "price",
    "multiplier": 10
  }'

The second, .custom.aj performs an aj (as-of join) between the trades and quotes table for a given symbol:

Bash
curl -X POST http://localhost:8080/custom/aj   -H 'Content-Type: application/json'   -d '{
    "tradesTable": "trade",
    "quotesTable": "quote",
    "sym": "AAPL"
  }'

You can also call these APIs from q:

Q (kdb+ database)
q) gw(`.custom.aj;(`tradesTable;`quotesTable;`sym)!(`trade;`quote;`AAPL);`;(0#`)!())

Finally, we can view the real-time logs for each microservice:

Bash
docker compose logs -f kxi-rt    # Reliable Transport
docker compose logs -f kxi-sm    # Storage Manager
docker compose logs -f kxi-da    # Data Access
docker compose logs -f sp-worker # Stream Processor

System cleanup

Bash
docker compose down --remove-orphans
./RESET_DB.sh

kdb Insights SDK offers a range of advantages that distinguish it from traditional, multi-tool data stacks, particularly for teams developing real-time, high-performance analytics applications at scale. Built on the kdb+ engine, it delivers a single, integrated platform for ingesting, processing, storing, and querying time-series data with native support for languages including q, SQL, and Python.

The result is a faster time-to-market for new analytics applications, more reliable and actionable insights, and the agility to adapt as business requirements change, all while simplifying your data stack and reducing the total cost of ownership.

Learn more about kdb Insights SDK and read my other blogs on kx.com.

 

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

ADSS Logo
Capital Markets

ADSS leverages KX real-time data platform to accelerate its transformational growth strategy.

Read More About ADSS
Axi logo
Capital Markets

Axi uses KX to capture, analyze, and visualize streaming data in real-time and at scale.

Read More About Axi


Demo the world’s fastest database for vector, time-series, and real-time analytics

Start your journey to becoming an AI-first enterprise with 100x* more performant data and MLOps pipelines.

  • Process data at unmatched speed and scale
  • Build high-performance data-driven applications
  • Turbocharge analytics tools in the cloud, on premise, or at the edge

*Based on time-series queries running in real-world use cases on customer environments.

Book a demo with an expert

"*" indicates required fields

By submitting this form, you will also receive sales and/or marketing communications on KX products, services, news and events. You can unsubscribe from receiving communications by visiting our Privacy Policy. You can find further information on how we collect and use your personal data in our Privacy Policy.

This field is for validation purposes and should be left unchanged.

A verified G2 leader for time-series