PyKX 3.0: Easier to use and more powerful than ever before

Conor McCarthyVP Data Science
12 11月 2024 | 6 minutes

Since its initial release in February 2022, PyKX has allowed customers to extend the power of kdb+ to Python developers. While this has been successful in dramatically changing the ways that many organizations interact with our technology, there have been several requests to expand the library.

With the release of PyKX 3.0, we have made significant strides in addressing these requests. In this blog, I’ll outline the features included in the release and showcase some examples for the larger updates.

What’s new with PyKX?

This release is the culmination of 6 months of development by the PyKX team. The enhancements and updates to the library are wide-ranging, but the two headline features are as follows:

  1. A significant upgrade to the PyKX query API to support Python first syntax, increasing the number of users who can develop analytics to query kdb+ on-disk databases and in-memory tables.
  1. The addition of a streaming module allows users to develop high-performance streaming applications for high-velocity data ingestion and persistence.

In addition to these headline features, updates have been made in the following areas:

  • Enhancements to the IPC reconnection logic allow users to control how reconnection attempts are made at a more granular level
  • The addition of Python first functionality for:
    • Reordering columns
    • Detecting invalid column names in kdb+ tables
    • Constructing temporal kdb+ objects
    • Calling single-character operators in the q language
    • Generate/modify enumeration vectors and atoms
  • The addition of functionality to allow q first development within a Python Jupyter Kernel gives q developers more flexibility in where they can develop code
  • The addition of support for the Python help command on all PyKX keywords
  • Improvements to the workflow for installing PyKX licenses, allowing users to point to an already downloaded license
  • Addition of the function ‘kx.util.install_q’ to allow users to download the q binary and required library to a location of choice

View the full release notes.

Query API upgrades

Querying data with PyKX prior to version 3.0 provided users with a few options:

  1. Use the Pandas Like API to query in-memory datasets Pythonically.
  2. Query in-memory/on-disk databases using SQL.
  3. Adopt and learn some basic q to allow analytic queries to be generated.

The upgrades to PyKX provided with the 3.0 release allow a full Python experience for users looking to query massive on-disk databases or in-memory tables. This significantly improves the ease of use of PyKX.

A few examples of this updated syntax are as follows:

  1. Calculate by symbol the daily open, high, low, close, and volume information for minutely data saved to disk in a table named “minutely”.


daily_ohlc = db.minutely.select(
    columns = kx.Column('open').first() &
              kx.Column('close').last() &
              kx.Column('high').max() &
              kx.Column('low').min() &
              kx.Column('volume').sum(),
    by = kx.Column('date') & kx.Column('sym')
    ).sort_values('date_info').reset_index()

  1. Calculate the volatility of daily prices by symbol from the queried data above.


daily_ohlc.select(
    kx.Column('close').ratios().drop(1).log().dev().name('volatility'),
    by = kx.Column('sym')
    ).sort_values('volatility')

  1. Delete from an in-memory table any location where minutely trade volume for eur_usd exceeds the average volume.


daily_ohlc.delete(
    where = [
            (kx.Column('sym') == 'eur_usd'),
            (kx.Column('volume') > kx.Column('volume').avg())
            ])

The above examples show a few of the key behaviors facilitated by this API

  • Analytics can be applied to and chained off ‘kx.Column’ objects to generate complex analytics
  • Comparisons between columns and values are supported using familiar Python syntax
  • Queries of type select, exec, update, and delete are all supported with this new syntax

For documentation on this API and an outline of some more complex examples, see our technical documentation.

Streaming with PyKX

One of the most powerful aspects of kdb+/q is its ability to combine high-velocity real-time data with historical data in streaming applications; in most literature relating to kdb+, this is referred to as a tickerplant infrastructure. Prior to PyKX 3.0, users could enhance their existing infrastructures by deploying PyKX as a q extension; however, this process was not formalized or standardized.

PyKX 3.0 introduces a simple syntax and standardized approach for the creation of streaming workflows orchestrated from Python. Contained within the TICK module, this allows users to complete the following at the most basic level:

  • Capture and log raw ingested data to facilitate data replay in failure scenarios.
  • Generate a real-time database that persists data at the end of the day.
  • Query historical data.

Once users are happy that they can capture and persist their data, more complex operations can be added.

  • Include real-time stream analytics to collect insights from data or alert on issues with mission-critical use cases.
  • Add complex query APIs to processes, allowing for analysis on fast or vast data.
  • Generate analytics that spans real-time and historical data by adding a query gateway.

In the below examples, we show some of the basic operations that can be completed, for a more detailed worked example see here for our breakdown of building your first real-time ingestion infrastructure.

  1. Generate a tickerplant, real-time database, and historical database using the “basic” command


>>> trade = kx.schema.builder({
...     'time': kx.TimespanAtom,
...     'sym': kx.SymbolAtom,
...     'price': kx.FloatAtom,
...     'volume': kx.LongAtom
...     })
>>> basic = kx.tick.BASIC(tables={'trade': trade}, database='db')
>>> basic.start()

  1. Add a query API to the historical database (HDB) generated above to get the count of trades for a supplied symbol


>>> def symbol_count(symbol):
...     data = kx.q['trade'].exec(
...         columns = kx.Column('sym').count(),
...         where = kx.Column('sym') == symbol
...     return data
>>> basic.hdb.register_api('symbol_count', symbol_count)

  1. Add a real-time processor calculating derived analytics for your real-time data


>>> def postprocessor(table, data):
...     kx.q['agg'] = kx.q[table].select(
...         columns = kx.Column('price').min().name('min_px') &
...                   kx.Column('price').max().name('max_px'),
...         by = kx.Column('sym'))
>>> rtp = kx.tick.RTP(port=5014,
...                   subscriptions = ['trade'],
...                   libraries={'kx': 'pykx'},
...                   vanilla=False)
>>> rtp.start({'tickerplant': 'localhost:5010'})


For documentation on this API and an outline of some more complex examples, see our technical documentation.

At the outset of this release, we aimed to provide all library users, newcomers, and those supporting its development with enhancements that fit into their day-to-day operations but equally allow for new use cases and new users to be onboarded.

As always, this release has been entirely driven by interactions with our clients and discussions with users at Python and KX events. Over the coming weeks, we will be releasing deep-dive blogs on the Query and Streaming APIs alongside a more general deep dive into some of the smaller but equally powerful updates.

If you wish to discuss anything PyKX-related, you can contact the PyKX development team through:

Start your journey to becoming an AI-first Enterprise with a personal demo.

Our team can help you to:

エラー: コンタクトフォームが見つかりません。