Democratizing analytics: Natural language querying with kdb+

Back to Blog

Democratizing analytics: Natural language querying with kdb+

Author

Michael Ryaboy

Developer Advocate

Published

10 June, 2025

Reading Time

Key Takeaways

kdb+ is engineered for ultra-low latency and high-throughput environments
Combining kdb+ with natural language capability democratizes complex query creation.
Automating visualizations can be achieved via a simple function or using tools such as pandas-ai

In finance, billion-row time series datasets are common, which is why financial institutions have long trusted kdb+, the world’s fastest time series database for mission-critical workloads. Unlike traditional solutions, which move data to external tools for processing, kdb+ executes queries directly using its vectorized language, q. This minimizes latency and avoids the overhead of data movement, but can require careful planning and specialist skills when crafting complex queries.

In this blog, I will demonstrate how to democratize this process, combining kdb+ and PyKX (KX’s Python interface for kdb+) with natural language understanding to accelerate time to value.

I’ll show you how to:

Import millions of rows into a kdb+ table
Generate queries via an LLM
Convert returned data frames to pandas
Create a plot function

If you’d like to follow along, you can do so via my colab notebook.

Let’s begin.

Data ingestion

To begin, we will install a free personal license of PyKX (KX’s Python interface for kdb+). Then, for simplicity and convenience, ingest a sample stock dataset containing 3 million minute-level time-series records.

import pykx as kx
# Load a sample CSV and push it into q as 'trades'
pandas_tbl = pd.read_csv(
    'https://huggingface.co/datasets/opensporks/stocks/resolve/main/stocks.csv',
    parse_dates=['timestamp']
)
tbl = kx.toq(pandas_tbl)        # DataFrame → q Table
tbl = tbl.rename(columns={"close": "closePrice"})
kx.q['trades'] = tbl             # assign to q global

Let’s examine the results:

# get table description and save as text stored in a variable:
table_desc = tbl.dtypes
print(table_desc)

# get min and last date
min_date = kx.q('first trades`timestamp').pd()
max_date = kx.q('last trades`timestamp').pd()

print(f"\nMinimum timestamp: {min_date}")
print(f"Maximum timestamp: {max_date}")

Scaling Analytics Datatypes

As you can see from the output, tickers, floats, and timestamps are returned.

Natural language translation

Next, we’ll create a translator that converts plain English questions into qSQL queries. I’m using this method because LLMs are not yet fully proficient in writing q, kdb+’s native query language. However, with qSQL being somewhat ANSI compliant, we can overcome this limitation by prompting our LLM with the subtle differences.

SYSTEM_PROMPT = 'Create a valid sql query based on this table schema and natural language query'

class TranslateResponse(BaseModel):
    qsql: str

def translate_to_qsql(nl_query: str, table: str, schema: str) -> str:
    system = (
        SYSTEM_PROMPT
    )
    user = (
        f"Table: {table}\n"
        f"Schema: {schema}\n"
        f"Question: {nl_query}"
    )
    resp = client.beta.chat.completions.parse(
        model="o3",
        messages=[
            {"role": "system", "content": system},
            {"role": "user",   "content": user},
        ],
        response_format=TranslateResponse,
    )
    return resp.parsed.qsql

Our system prompt now contains detailed instructions about the qSQL syntax and best practices. This will ensure valid and efficient queries for kdb+. This is very much a work in progress, so I welcome any feedback or suggestions from community members.

Automated visualization

When queries are processed and returned, we’ll want AI to generate the appropriate visualization based on their associated data structures. In this instance, I have created my own visualization function; however, you might also consider using pandas-ai, which is a more robust and feature-rich solution.

def visualize_df_with_plotly(df, model="gpt-4o", sample_size=20, temperature=0):
    """
    Given a pandas DataFrame df and an OpenAI-compatible client,
    infers a schema and sample, asks the model to generate Plotly Express code,
    and then executes and displays the resulting chart.
    """
    # 1. Build schema and sample, converting timestamps to strings
    sample_df = df.head(sample_size).copy()
    for col, dtype in sample_df.dtypes.items():
        if "datetime" in str(dtype).lower() or "timestamp" in str(dtype).lower():
            sample_df[col] = sample_df[col].astype(str)
    sample = sample_df.to_dict(orient="list")
    schema_auto = {col: str(dtype) for col, dtype in df.dtypes.items()}

# 2. Prepare prompts
    system = (
        "You're an expert Python data-visualizer. "
        "Given a DataFrame schema and sample data, "
        "write self-contained code using Plotly Express that "
        "takes the df and, creates a fig named fig, "
        "Write code that assigns df to fig, "
        "but do not call fig.show(). I will handle displaying. "
        "Feel free to show any other aggregations and print as numbers as you feel is needed as long as the figure is made"
        "GIVE ME ONLY CODE AND NOTHING ELSE."
        "Do not rewrite the values in the df, assume it is global in a dataframe called df"
        "Think about what the user wants in the query and provide it"
        "Never hardcode values"
    )
    user = (
        f"Schema: {json.dumps(schema_auto)}\n\n"
        f"Sample data: {json.dumps(sample, default=str)}"
    )
    # 3. Call the chat completion API
    resp = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system},
            {"role": "user",   "content": user}
        ],
        temperature=temperature,
    )
    # 4. Extract the code from the response
    chart_code = resp.choices[0].message.content
    if chart_code.startswith("```"):
        chart_code = chart_code.split("\n", 1)[1].rsplit("\n```", 1)[0]
    print(chart_code)
    # 5. Execute the generated code
    local_vars = {"df": df, "px": px}
    exec(chart_code, {}, local_vars)
    # 6. Display the figure
    fig = local_vars.get("fig")
    if fig:
        fig.show()
    else:
        raise RuntimeError("The generated code did not produce a fig object.")

Plotting

Finally, we will combine everything into a powerful new function named query_and_plot, which will enable plot outputs to be generated from a natural language query.

def query_and_plot(nl_query: str, table: str = "trades") -> None:
    """
    Given a natural-language question about the kdb+ table,
    translates it to q-SQL, runs it, and renders an appropriate chart.
    """
    # --- translate to q-SQL ---
    # derive schema from the table in q
    q_tbl = tbl
    schema = str(q_tbl.dtypes)
    print(schema)

    # call the existing translator (assumes translate_to_qsql is defined globally)
    qsql = translate_to_qsql(nl_query, table, schema)
    print("🔧 Generated q-SQL:\n", qsql)
    # --- execute query ---
    result = kx.q.sql(qsql)
    df = result.pd()
    print("Resulting df After AI Query")
    display(df.head())
    print("\n── Resulting df Description ──")
    display(df.describe(include="all"))
    visualize_df_with_plotly(df)

Real-world testing

Let’s now test the solution with some real-world examples. Remember, LLMs can experience inconsistent outputs, meaning results may differ.

Example 1: Cumulative returns

To begin, I will ask the agent to chart the cumulative returns for both Tesla (TSLA) and Apple (AAPL).

query_and_plot("Chart the cumulative returns for TSLA and AAPL")

Scaling Analytics - Cumulative returns

What this generates:

A q-SQL query that calculates cumulative returns
A multi-line chart showing performance over time

It also automatically handles date parsing and percentage calculations.

Example 2: Statistical analysis

Next, I will ask the agent to calculate the average price of each stock in our dataset.

query_and_plot("what is the average price for each stock?"

Scaling Analytics - Statistical analysis

We can also ask which date had the highest closing price for each.

query_and_plot("Which day had the highest closing price for each stock?")

Scaling Analytics Avg Price By Sym

What this generates:

A grouping query to calculate averages by stock symbol
An appropriate bar chart or table visualization
Clear labelling and formatting

By combining kdb+’s performance with AI’s natural language capabilities, we not only create a powerful tool for time series analysis but also democratize complex query building to anyone who can communicate in plain English. Both serve to enhance decision-making and accelerate new possibilities for innovation and growth.

If you enjoyed this blog, please check out my others on kx.com and connect with me on Slack. You can also find all of the code used in this blog via my colab notebook.

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

Capital Markets

ADSS leverages KX real-time data platform to accelerate its transformational growth strategy.

Capital Markets

Axi uses KX to capture, analyze, and visualize streaming data in real-time and at scale.

Start your journey to becoming an AI-first enterprise with 100x* more performant data and MLOps pipelines.

Process data at unmatched speed and scale
Build high-performance data-driven applications
Turbocharge analytics tools in the cloud, on premise, or at the edge

*Based on time-series queries running in real-world use cases on customer environments.

Book a demo with an expert

"*" indicates required fields

First Name*

Last Name*

Company*

Job Title*

Business Telephone*

Business Email*

Industry*

Country*

How can KX help you?*

How did you hear about us?*

By submitting this form, you will also receive sales and/or marketing communications on KX products, services, news and events. You can unsubscribe from receiving communications by visiting our Privacy Policy. You can find further information on how we collect and use your personal data in our Privacy Policy.

CAPTCHA

Phone

This field is for validation purposes and should be left unchanged.

Democratizing analytics: Natural language querying with kdb+

Developer

Key Takeaways

Data ingestion

Natural language translation

Automated visualization

Plotting

Real-world testing

Example 1: Cumulative returns

What this generates:

Example 2: Statistical analysis

What this generates:

Customer Stories

Demo the world’s fastest database for vector, time-series, and real-time analytics

Start your journey to becoming an AI-first enterprise with 100x* more performant data and MLOps pipelines.

Book a demo with an expert

A verified G2 leader for time-series

Modernizing infrastructures that mix Python and q

The ultimate guide to choosing embedding models for AI applications

Apex innovators: How hedge funds can evolve analytics at speed and scale

Outrun the competition: Winning the digital assets race

PyKX 3.0: Easier to use and more powerful than ever

Supercharging your quants with real-time analytics

Apex innovators: How hedge funds can evolve analytics at speed and scale

Developer

Key Takeaways

Data ingestion

Natural language translation

Automated visualization

Plotting

Real-world testing

Example 1: Cumulative returns

What this generates:

Example 2: Statistical analysis

What this generates:

Democratizing analytics: Natural language querying with kdb+

Crypto analytics at scale: How B2C2 trades smarter in a 24/7 market

Mastering kdb+ compression: Insights from the financial industry

Customer Stories

Demo the world’s fastest database for vector, time-series, and real-time analytics

Start your journey to becoming an AI-first enterprise with 100x* more performant data and MLOps pipelines.

Book a demo with an expert

A verified G2 leader for time-series