For over a decade, integrating Python and q has been crucial to the infrastructures of many of our largest clients. Seamlessly combining these technologies provides access to the world’s fastest time-series database and the efficiency of its programming language q, with an extensive open-source community and new developers proficient in Python.
In this blog, I’ll explore the historical integration of Python and q, identify potential issues with these integrations, and discuss how onboarding PyKX can enhance current infrastructures and create new opportunities for those integrating these languages.
Before PyKX
Before the release of PyKX, there were three principal interfaces for integrating kdb+/q with Python. In the following section, I’ll briefly overview each and highlight some common challenges encountered while using them in real-world applications.
qPython: Providing an IPC interface between kdb+ servers and Python users, qPython provides a Python-first interface for the query and retrieval of data from existing kdb+ infrastructures. Operating as a pure Python library, users can integrate kdb+ data into their analytic workflows and retrieve tabular data as Pandas DataFrame’s or vectors as NumPy arrays.
Despite its strengths, qPython faces several challenges that can complicate its use in production environments beyond specific scenarios:
- Data serialization and deserialization is handled by logic written in Python and does not leverage the C APIs of kdb+ or Python to accelerate processing
- All analytic processing must be completed on the server. This is not uncommon in production environments with strict rules on data egress; however, in cases where users wish to analyze or retrieve large data volumes, data transfer time can be significant
- qPython is no longer supported; created by Exxeleron the library is officially in maintenance mode.
EmbedPy: For users who require access to Python functionality from a q session, EmbedPy provides the ability to deploy machine learning, statistical analysis tools, and plotting functionality.
The API provided by EmbedPy was built specifically for q developers to integrate Python into their workflows as the secondary language. EmbedPy is extremely stable and provides huge value to consumers but in more complex workflows suffers from two issues that can prove limiting:
- EmbedPy only supports the conversions between NumPy/python types and kdb+. Support is not provided for table conversions to Pandas DataFrames directly, which makes the onboarding of tabular data more complex/inefficient
- EmbedPy does not allow users to directly query kdb+ data from the Python analytics they develop. Users that are applying analytics via embedPy must manipulate the data in q before the application of their Python analytics
PyQ: Described as a Python-first interface to kdb+, PyQ enables the integration of Python and q code within a single application. Installable via PyPi and Anaconda, it was considered the most Python-friendly tool for integrating Python and q code, allowing objects to share the same memory space.
PyQ, however, suffers from a fundamental issue that limits its usage to large organizations with existing kdb+ infrastructures:
- PyQ is not a Python library but an executable running atop q, which presents itself as a Python process. This difference is subtle but means Python applications can’t rely on PyQ as a dependency within their code. Any code using PyQ must be run either in a q process or via the PyQ binary
Besides, the API for interacting with kdb+ data is also limited in scope; it provides access to q keywords and operators but has limited Python-first capabilities.
Introducing PyKX
Released to clients initially in 2021 and made available on PyPi in June 2023, PyKX replaces each of the above interfaces with enhanced functionality.
Replacing qPython: PyKX provides an equivalent IPC-only API to qPython for retrieving and presenting data to users querying kdb+ servers. Independent benchmarking indicates an 8-10x performance increase when compared to qPython.
These numbers are for queries typically performed using qPython and returning the data as a Pandas dataframe using the existing infrastructure, allowing users to process more queries and return larger datasets.
Much of this performance improvement comes from using the q C API for data deserialization and conversion to Pythonic types. In addition to this, the return type for objects from these queries is now a PyKX data type, as the data is not forced to be returned as a Pandas Dataframe (as is the case with qPython for tabular types); users can choose to convert the data to a NumPy recarray or PyArrow Table if their use-case requires it.
Replacing EmbedPy: PyKX provides a modality that replaces EmbedPy called PyKX under q. Unlike qPython, where performance upgrades have been the primary motivation for users adopting PyKX, in the case of EmbedPy, there have been three primary drivers of use:
- As PyKX provides the ability to convert to/from Python, NumPy, Pandas, and PyArrow data formats for all kdb+ types, users can onboard a significantly more diverse range of data formats than supported when using embedPy. This is exemplified by a client working in an investment bank who is using PyKX in this form to consume XML and Excel data within a real-time processing engine
- At its core, EmbedPy facilitated users calling Python libraries from within a q session. While PyKX also provides this ability, it’s worth remembering that PyKX itself is a Python library that can execute q code. As we’ll see later, this allows for some complex workflows that provide real value to users but, importantly, removes the restriction that Python analytics need to be passed to kdb+ via q directly. At their core, this is how Python-first integration with KX Dashboards and the Insights Enterprise Stream Processor work
- PyKX provides more flexibility in how users can develop at the boundary between Python and q. The addition of the .pykx.console, for example, allows flexible workflows to operate at the barrier between the languages (see below):
q)\l pykx.q
q).pykx.console[]
>>> import numpy as np
>>> nparray = np.array([1, 2, 3])
>>> quit()
q).pykx.get[`nparray]`
123
Replacing PyQ: PyQ isn’t replaced by PyKX in the same way as the previous libraries, where flexibility in data format management and performance enhancements are the key drivers. PyKX changes how users wishing to run analytics on kdb+ data in a Python-first environment can operate. This extends to the wider variety of analyses that can be performed and their locations.
Instead of a q process emulating Python, PyKX embeds the q programming language into Python via a shared object. This change means that PyKX can be deployed in Linux, Mac, or Windows environments anywhere that Python can be run.
It provides a similar syntax to PyQ for the execution of q keywords or the running of q code in a Python-first way, but in all other regards, it is significantly more consumable by Python-first users.
- The Pandas Like API provides users with the ability to interrogate their kdb+ data using familiar syntax
- Tight integrations with Numpy provide users with the ability to call NumPy analytics directly with data stored/queried in kdb+ format
- The availability of Database Maintenance functionality (currently in beta but moving to production in November) provides the ability to create and modify a kdb+ Partitioned Database without expert q knowledge
- Users with existing q analytics can access their code in a Python-first manner using the context interface
Where does PyKX extend the use cases for Python and q integration?
The biggest change that has been facilitated by PyKX is how q and Python interact within a q executable. When operating this way, users can generate Python functions that send queries to kdb+ or call q functions that run Python code.
We use PyKX under q to enhance the q processes in both cases.
To illustrate how we’ve used this functionality to improve the flexibility of customers’ existing systems and upgrade the KX product suite to allow for the execution of Python code, we’ll discuss two cases, both using similar techniques to extend existing q processes:
- A user has an existing sandbox environment enabling quant developers familiar with q to access all historical data and run analysis. Instead of rearchitecting their solution, the client wants to open this infrastructure to a wider team of developers proficient in Python. This is facilitated through access to the remote function execution module of PyKX. This is currently provided as a beta feature but will be available as a fully supported feature with the release of PyKX 3.0 later this year
- Before v2.2.0, KX Dashboards did not support using Python analytics as a data source. To add this functionality, we extended PyKX with some helper functions, enabling Dashboards to treat Python as a first-class language
In both cases, the backend operations that occur on the q processes can be described in the following steps:
- The user passes as a string, a Python function taking N-supplied arguments
- This Python code is passed to an internal function that uses the Python library ast2json to retrieve from the name of the first defined function in the string. Check out our Dashboards integration to learn more here
- The Python code defining the users’ function is executed, and PyKX is used to “get” the function via .pykx.get
- A q process can then use this retrieved function to run user-defined Python code. This wrapped execution can be seen here
Fundamentally, both the integration with Dashboards and the remote function execution logic operate in this way. The primary difference is in how the user writes the Python code, which is passed to the underlying q process:
- In the integration with Dashboards Direct, the function is passed back to the q process as part of an HTTP call, which defines a q function that is called on any invocation of the dashboards widget
- For the remote function execution example, a user defines a Python function in their local development environment using a decorator to signal that the function defined should be run on a remote q process. Any use of this function after definition will result in the logic being run on the remote server
For example, running the logic on a session via port 5050
>>> import os
>>> os.environ['PYKX_BETA_FEATURES'] = 'True'
>>> import pykx as kx
>>> session = kx.remote.session()
>>> session.create(port=5050)
>>> @kx.remote.function(session)
... def user_function(x, y):
... return x+y
>>> user_function(10, 20)
pykx.LongAtom(pykx.q('30'))
Learn more about PyKX
For years, users have had several methods to integrate Python with q/kdb+. Now, PyKX offers a unified library that addresses the issues of older interfaces and enhances both performance and flexibility in solving business problems.
If you would like to learn more about PyKX, we have a consolidated list of the articles, videos and, blogs relating to the library available here.