Common Python practices for analytical development mirror what has made q an industry leader in solving high-performance, computationally intensive vector-oriented analytical problems. The modern paradigm for most data scientists and engineers is to use dynamically typed languages like Python and incorporate vector operations from the likes of NumPy and table types from libraries like Pandas and PyArrow. This has always been q’s approach to solving problems, allowing developers to effectively express their ideas and make their analytics more efficient.
PyKX maintains the functionality and performance of q while offering developers a Pythonic medium to design their vector-oriented analytics more elegantly and efficiently. When using PyKX for creating functions and analytics developers can leverage several different feature APIs that harness the power of q. In this blog, we will explore how data can be queried and transformed using both the qSQL and SQL PyKX APIs as well as native q functions applied using the context interface and Pandas-like API.
Post-trade analytics walkthrough
To highlight the features listed above we will develop an analytic regularly used across capital markets, a calculation of slippage.
Slippage is an important metric used when doing post-trade analysis, it can be used to track order and execution performance against the market. To do this we usually compare the execution price or the average executed price to a benchmark price. In this example, we will use the midpoint as the benchmark price and compare that to the average price (avgPX) immediately before or at the same time of the trade.
To begin, we will load a historical database containing quote and execution data using the PyKX database API.
>>> import os
>>> os.environ['PYKX_BETA_FEATURES']='True'
>>> import pykx as kx
>>> db=kx.DB(path='database')
To complete our final slippage calculation later we first need to join our data from both the executions table and the quote table, as well as calculate the midpoint.
There are various ways that data can be queried within PyKX, for this example we will use the SQL and qSQL APIs.
>>> e = kx.q.sql("SELECT time,sym,side,avgPrice from execsUS WHERE date=2023.11.10 and sym = 'AAPL'")
>>> e.head(3)
pykx.Table(pykx.q('
time sym side avgPrice
------------------------------------------------
2023.11.10D14:30:03.462500000 AAPL 2 183.99
2023.11.10D14:30:03.470000000 AAPL 2 183.99
2023.11.10D14:30:03.470000000 AAPL 2 183.99
'))
>>> q = kx.q.qsql.select(db.quoteUS,
... columns = {'time': 'time',
... 'sym' : 'sym',
... 'midpoint': 'fills (askPrice + bidPrice)%2'},
... where = ['date=2023.11.10',
... 'sym=`AAPL',
... 'time>=14:30'])
>>> q.head(3)
pykx.Table(pykx.q('
time sym midpoint
-------------------------------------------
2023.11.10D14:30:00.000400000 AAPL 183.68
2023.11.10D14:30:00.000400000 AAPL 183.595
2023.11.10D14:30:00.007600000 AAPL 183.875
'))
Now that we have both the execution and quote tables for a specified date and symbol we can join the data temporally using an asof join. In the cell below using the q native aj function via the context interface.
>>> res = kx.q.aj('time', e, q)
>>> res.head(3)
pykx.Table(pykx.q('
time sym side avgPrice midpoint
---------------------------------------------------------
2023.11.10D14:30:03.462500000 AAPL SELL 183.99 183.965
2023.11.10D14:30:03.470000000 AAPL SELL 183.99 183.965
2023.11.10D14:30:03.470000000 AAPL SELL 183.99 183.965
Alternatively, if you’re more familiar with Python, you can use the pandas-like API within PyKX to run the asof join using merge_asof. This API allows you to apply pandas-like syntax to PyKX objects and reap the performance benefits of kdb+
>>> res = e.merge_asof(q, on="time")
We can then use some further elements of the pandas-like API to calculate slippage, this is done in the following steps:
- Add a new dummy column ‘slippage’ to contain data after final analysis
- Using iloc select buy and sell data subsets
- Update the content of the ‘slippage’ column with the calculated difference between midpoint and average price depending on side
- a. avgPrice – midpoint for all BUY executions
- b. midpoint – avgPrice for all SELL executions
- Calculate the result in basis points as 10000*(calculated difference / midpoint)
>>> res['slippage'] = kx.FloatAtom.null
>>> buytab = res.iloc[res['side'] == 'BUY']
>>> selltab = res.iloc[res['side'] == 'SELL']
>>> res.iloc[res['side'] == 'BUY', 'slippage'] = buytab['avgPrice']-buytab['midpoint']
>>> res.iloc[res['side'] == 'SELL', 'slippage'] = selltab['midpoint'] - selltab['avgPrice']
>>> res['slippage'] = 10000 * res['slippage'] / res['midpoint']
By combining the above analytics, we create a new analytic and use a variable input for analysis of any security.
>>> def calculateSlippage(symbol):
... e = kx.q.sql("SELECT time,sym,side,avgPrice from execsUS WHERE date=2023.11.10 and sym = $1", symbol)
... kx.q["s"] = symbol
... q = kx.q.qsql.select(db.quoteUS,
... columns = {'time': 'time',
... 'sym': 'sym',
... 'midpoint' : 'fills (askPrice + bidPrice)%2'},
... where = ['date=2023.11.10', 'sym=s', 'time>=14:30'])
... res = kx.q.aj('time', e, q)
... res['slippage'] = kx.FloatAtom.null
... buytab = res.iloc[res['side'] == 'BUY']
... selltab = res.iloc[res['side'] == 'SELL']
... res.iloc[res['side'] == 'BUY', 'slippage'] = buytab['avgPrice']-buytab['midpoint']
... res.iloc[res['side'] == 'SELL', 'slippage'] = selltab['midpoint'] - selltab['avgPrice']
... res['slippage'] = 10000 * res['slippage'] / res['midpoint']
... return res
>>> calculateSlippage('AAPL')
PyKX combines the syntactical quality and familiarity of Python with the performance and efficiency of kdb+ and q. Many users of PyKX will want to blend kdb+ with the Python libraries that they already use, and PyKX makes this integration seamless by providing a Python-first approach to all aspects of q’s functional expressive programming language.
To learn more, please visit https://code.kx.com/pykx/