Back to Blog

PyKX 3.0: Easier to use and more powerful than ever

Author

Daniel Baker

Head of Builder Content

Published

12 November, 2024

Reading Time

Since its initial release in February 2022, PyKX has allowed customers to extend the power of kdb+ to Python developers. While this has been successful in dramatically changing the ways that many organizations interact with our technology, there have been several requests to expand the library.

With the release of PyKX 3.0, we have made significant strides in addressing these requests. In this blog, I’ll outline the features included in the release and showcase some examples for the larger updates.

What’s new with PyKX?

This release is the culmination of 6 months of development by the PyKX team. The enhancements and updates to the library are wide-ranging, but the two headline features are as follows:

A significant upgrade to the PyKX query API to support Python first syntax, increasing the number of users who can develop analytics to query kdb+ on-disk databases and in-memory tables.
The addition of a streaming module allows users to develop high-performance streaming applications for high-velocity data ingestion and persistence.

In addition to these headline features, updates have been made in the following areas:

Migration of all beta features introduced in PyKX 2.x to full production support (see below)
Enhancements to the IPC reconnection logic allow users to control how reconnection attempts are made at a more granular level
The addition of Python first functionality for:
- Reordering columns
- Detecting invalid column names in kdb+ tables
- Constructing temporal kdb+ objects
- Calling single-character operators in the q language
- Generate/modify enumeration vectors and atoms
The addition of functionality to allow q first development within a Python Jupyter Kernel gives q developers more flexibility in where they can develop code
The addition of support for the Python help command on all PyKX keywords
Improvements to the workflow for installing PyKX licenses, allowing users to point to an already downloaded license
Addition of the function ‘kx.util.install_q’ to allow users to download the q binary and required library to a location of choice

View the full release notes.

Query API upgrades

Querying data with PyKX prior to version 3.0 provided users with a few options:

Use the Pandas Like API to query in-memory datasets Pythonically.
Query in-memory/on-disk databases using SQL.
Adopt and learn some basic q to allow analytic queries to be generated.

The upgrades to PyKX provided with the 3.0 release allow a full Python experience for users looking to query massive on-disk databases or in-memory tables. This significantly improves the ease of use of PyKX.

A few examples of this updated syntax are as follows:

Calculate by symbol the daily open, high, low, close, and volume information for minutely data saved to disk in a table named “minutely”.

daily_ohlc = db.minutely.select(
    columns = kx.Column('open').first() &
              kx.Column('close').last() &
              kx.Column('high').max() &
              kx.Column('low').min() &
              kx.Column('volume').sum(),
    by = kx.Column('date') & kx.Column('sym')
    ).sort_values('date_info').reset_index()

Calculate the volatility of daily prices by symbol from the queried data above.

daily_ohlc.select(
    kx.Column('close').ratios().drop(1).log().dev().name('volatility'),
    by = kx.Column('sym')
    ).sort_values('volatility')

Delete from an in-memory table any location where minutely trade volume for eur_usd exceeds the average volume.

daily_ohlc.delete(
    where = [
            (kx.Column('sym') == 'eur_usd'),
            (kx.Column('volume') > kx.Column('volume').avg())
            ])

The above examples show a few of the key behaviors facilitated by this API

Analytics can be applied to and chained off ‘kx.Column’ objects to generate complex analytics
Comparisons between columns and values are supported using familiar Python syntax
Queries of type select, exec, update, and delete are all supported with this new syntax

For documentation on this API and an outline of some more complex examples, see our technical documentation.

Streaming with PyKX

One of the most powerful aspects of kdb+/q is its ability to combine high-velocity real-time data with historical data in streaming applications; in most literature relating to kdb+, this is referred to as a tickerplant infrastructure. Prior to PyKX 3.0, users could enhance their existing infrastructures by deploying PyKX as a q extension; however, this process was not formalized or standardized.

PyKX 3.0 introduces a simple syntax and standardized approach for the creation of streaming workflows orchestrated from Python. Contained within the TICK module, this allows users to complete the following at the most basic level:

Capture and log raw ingested data to facilitate data replay in failure scenarios.
Generate a real-time database that persists data at the end of the day.
Query historical data.

Once users are happy that they can capture and persist their data, more complex operations can be added.

Include real-time stream analytics to collect insights from data or alert on issues with mission-critical use cases.
Add complex query APIs to processes, allowing for analysis on fast or vast data.
Generate analytics that spans real-time and historical data by adding a query gateway.

In the below examples, we show some of the basic operations that can be completed, for a more detailed worked example see here for our breakdown of building your first real-time ingestion infrastructure.

Generate a tickerplant, real-time database, and historical database using the “basic” command

>>> trade = kx.schema.builder({
...     'time': kx.TimespanAtom,
...     'sym': kx.SymbolAtom,
...     'price': kx.FloatAtom,
...     'volume': kx.LongAtom
...     })
>>> basic = kx.tick.BASIC(tables={'trade': trade}, database='db')
>>> basic.start()

Add a query API to the historical database (HDB) generated above to get the count of trades for a supplied symbol

>>> def symbol_count(symbol):
...     data = kx.q['trade'].exec(
...         columns = kx.Column('sym').count(),
...         where = kx.Column('sym') == symbol
...     return data
>>> basic.hdb.register_api('symbol_count', symbol_count)

Add a real-time processor calculating derived analytics for your real-time data

>>> def postprocessor(table, data):
...     kx.q['agg'] = kx.q[table].select(
...         columns = kx.Column('price').min().name('min_px') &
...                   kx.Column('price').max().name('max_px'),
...         by = kx.Column('sym'))
>>> rtp = kx.tick.RTP(port=5014,
...                   subscriptions = ['trade'],
...                   libraries={'kx': 'pykx'},
...                   vanilla=False)
>>> rtp.start({'tickerplant': 'localhost:5010'})

For documentation on this API and an outline of some more complex examples, see our technical documentation.

At the outset of this release, we aimed to provide all library users, newcomers, and those supporting its development with enhancements that fit into their day-to-day operations but equally allow for new use cases and new users to be onboarded.

As always, this release has been entirely driven by interactions with our clients and discussions with users at Python and KX events. Over the coming weeks, we will be releasing deep-dive blogs on the Query and Streaming APIs alongside a more general deep dive into some of the smaller but equally powerful updates.

If you wish to discuss anything PyKX-related, you can contact the PyKX development team through:

GitHub Issues/Discussions on the PyKX repository
Joining the KX Community Slack and contacting us in the #pykx channel
Opening a forum discussion on the KX Learning Hub

Demo the world’s fastest database for vector, time-series, and real-time analytics

Start your journey to becoming an AI-first enterprise with 100x* more performant data and MLOps pipelines.

Process data at unmatched speed and scale
Build high-performance data-driven applications
Turbocharge analytics tools in the cloud, on premise, or at the edge

*Based on time-series queries running in real-world use cases on customer environments.

Book a demo with an expert

"*" indicates required fields

First Name*

Last Name*

Company*

Job Title*

Business Telephone*

Business Email*

Industry*

Country*

How can KX help you?*

How did you hear about us?*

By submitting this form, you will also receive sales and/or marketing communications on KX products, services, news and events. You can unsubscribe from receiving communications by visiting our Privacy Policy. You can find further information on how we collect and use your personal data in our Privacy Policy.

CAPTCHA

Comments

This field is for validation purposes and should be left unchanged.

KDB-X Public Preview: The next-gen kdb+ is here

The ultimate guide to choosing embedding models for AI applications

KDB-X Public Preview: The next-gen kdb+ is here

Apex innovators: How hedge funds can evolve analytics at speed and scale

KDB-X Public Preview: The next-gen kdb+ is here

KDB-X Public Preview: The next-gen kdb+ is here

Apex innovators: How hedge funds can evolve analytics at speed and scale

Developer

What’s new with PyKX?

Query API upgrades

Streaming with PyKX

Analytic development using PyKX – Part 1

Modernizing infrastructures that mix Python and q

Transforming data science with PyKX: A comprehensive guide to onboarding

Demo the world’s fastest database for vector, time-series, and real-time analytics

Start your journey to becoming an AI-first enterprise with 100x* more performant data and MLOps pipelines.

Book a demo with an expert