Mastering TAQ data analysis with kdb+

作者

Michaela Woods

Developer Advocate

Processing and analyzing large financial datasets can be challenging, especially when dealing with Trade and Quote (TAQ) data, which provides quants and analysts with rich information regarding market activity. In this blog, I will demonstrate how kdb+, the world’s fastest analytical database, can mitigate these challenges and walk through common calculations such as VWAP and OHLC.

kdb+ offers several benefits for quants, from its small footprint and high-performant scalable architecture to the elegance and simplicity of its vector programming language q.

Let’s begin

Configure the environment

Step 1: Prerequisites

If you wish to follow along, you’ll need to download and install kdb+ on your machine. You can do so for free here.

Step 2: Download TAQ data

With our prerequisites met, we can now download a daily TAQ file from the NYSE. (Note: These files can be large, so downloading may take some time).

Bash
wget https://ftp.nyse.com/Historical%20Data%20Samples/DAILY%20TAQ/EQY_US_ALL_TRADE_20240702.gz

Step 3: Clone the repository

Next, we will clone our kdb-taq repository, including the scripts and utilities required to process our data.

Bash
git clone https://github.com/KxSystems/kdb-taq.git
cd kdb-taq

This repository contains a script (tq.q) that helps process and load the data.

Step 4: Data preparation

With the repository cloned, we will now decompress our TAQ data and organize it into a usable directory for kdb+.

Bash
mkdir SRC
mv /path/to/EQY_US_ALL_TRADE_20240702.gz SRC/
gzip -d SRC/*

Step 5: Process the data

We will now use the tq.q script from the repository to process the raw TAQ data. We will also use the flags -s to specify the number of processing threads and SRC to define the file directory.

Q(kdb+データベース)
q tq.q -s 8 SRC

Processing will read and parse the raw data into a format suitable for querying.

Step 5: Load data into kdb+

Loading data into kdb+ is done using the \l command:

Q(kdb+データベース)
q)\l tq

Query the data

Now that our data has been successfully ingested into kdb+, we can run a few simple queries. Let’s begin by viewing the schema and structure of our trade table.

We will use the meta command to return column names and data types.

Q(kdb+データベース)
q)meta trade

c                                 | t f a
----------------------------------| -----
date                              | d    
Time                              | n    
Exchange                          | c    
Symbol                            | s   p
SaleCondition                     | s    
TradeVolume                       | i    
TradePrice                        | e    
TradeStopStockIndicator           | b    
TradeCorrectionIndicator          | h    
SequenceNumber                    | i    
TradeId                           | C    
SourceofTrade                     | c    
TradeReportingFacility            | b    
ParticipantTimestamp              | n    
TradeReportingFacilityTRFTimestamp| n    
TradeThroughExemptIndicator       | b

Example: Count the number of trades by stock

In our first example, we will explore the number of trades executed against each symbol and return the highest price.

Q(kdb+データベース)
q)select numTrade:count i, maxPrice:max TradePrice by Symbol from trade

Symbol | numTrade maxPrice
-------| -----------------
A      | 29655    133.03  
AA     | 36623    41.15   
AAA    | 51       25.07   
AAAU   | 2786     23.13   
AACG   | 108      0.86    
AACI   | 6        11.45   
AACT   | 1033     11.38   
AACT.U | 1        10.745  
AACT.WS| 24       0.1365  
....

Query structures can affect performance. We can, however, use the function \t before our q expression to retrieve query duration.

Q(kdb+データベース)
q)\t select numTrade:count i, maxPrice:max TradePrice by Symbol from trade

113

Advanced analysis

Now that we understand how to load and query our TAQ data let’s explore more advanced calculations.

Example: Volume profile

Volume profiling helps traders understand the concentration of trades at various price levels throughout the day. By doing so, they can ascertain volume over time and make more informed decisions on entry and exit markers.

Using xbar, we can group the size of orders into 5-minute intervals:

Q(kdb+データベース)
q)select sum TradeVolume by 5 xbar time.minute from trade where symbol = `AAPL

minute| TradeVolume
------| -----------
04:00 | 10805      
04:05 | 2257       
04:10 | 1078       
04:15 | 2216       
04:20 | 1365       
04:25 | 4717       
...

We can also highlight the rolling sums for a symbol throughout the day.

Q(kdb+データベース)
q)select time, sums TradeVolume from trade where symbol = `AAPL

Time                 TradeVolume
--------------------------------
0D04:00:00.017646926 1          
0D04:00:00.020101346 21         
0D04:00:00.022979538 22         
0D04:00:00.022987522 42         
0D04:00:00.023776991 67         
0D04:00:00.023920135 69 
....

We can also visualize these events using tools such as KX Analyst or KX Developer (available for free).

As you can see, the steep increases in the curve highlight periods of significant trading activity, reflecting market volatility or a heightened interest in a particular symbol. This is also helpful for analyzing intraday trends and periods of concentrated trading volume.

Example: Volume-weighted average

Volume-weighted averages reflect price movement more accurately by incorporating trading volume at different price levels. This is particularly useful in scenarios where traders wish to understand if movement is supported by strong market participation or a few solitary trades.

Let’s calculate the weighted average over time intervals using the function wavg.

 

Q(kdb+データベース)
q)select TradeVolume wavg TradePrice by Symbol from trade

Symbol | TradePrice
-------| ----------
A      | 126.0225  
AA     | 40.73105  
AAA    | 25.03718  
AAAU   | 23.01037  
AACG   | 0.8178756 
AACI   | 11.37241  
...

We can also query the weighted average over different time intervals.

Q(kdb+データベース)
q)select LastPrice:last TradePrice, 
         WeightedPrice:TradeVolume wavg TradePrice
   by 15 xbar Time.minute 
   from trade 
   where Symbol = `IBM

minute| LastPrice WeightedPrice
------| -----------------------
04:00 | 175.02    175.02       
04:30 | 174.99    174.99       
05:15 | 174.98    174.98       
05:30 | 174.93    174.8051     
05:45 | 174.96    174.91       
06:00 | 174.96    174.9558     
...

In this instance, we can see IBM’s daily price movements, with the “LastPrice” and “WeightedPrice” remaining closely aligned.

This indicates stable pricing with minimal influence from the trade volume, suggesting a moderate upward trend and positive market sentiment for IBM’s stock price.

Example: Open-high-low-close

Analysts often use methods such as OHLC to track securities’ short-term trends. Let’s create a simple query that calculates OHLC for a particular date, sym, and vwap over each 5-minute interval.

Q(kdb+データベース)
q)select low:min TradePrice,
         open:first TradePrice,
         close:last TradePrice,
         high:max TradePrice,
         volume:sum TradeVolume,
         vwap:TradeVolume wavg TradePrice 
   by 5 xbar Time.minute  
   from trade where Symbol=`AAPL

minute| low      open     close    high     volume  vwap    
------| ----------------------------------------------------
04:00 | 215.89   216.6    216.11   216.6    10805   216.2432
04:05 | 216.1    216.18   216.27   216.27   2257    216.151 
04:10 | 216.2    216.28   216.35   216.35   1078    216.2547
04:15 | 216.28   216.3    216.39   216.46   2216    216.3775
04:20 | 216.24   216.38   216.26   216.42   1365    216.3638
04:25 | 216.21   216.28   216.4    216.43   4717    216.323 
.....

We can also extract the query as a function, enabling analysts to retrieve the results for a particular stock.

Q(kdb+データベース)
q)ohlcLookup:{[symbol] 
    select low:min TradePrice,
           open:first TradePrice,
           close:last TradePrice,
           high:max TradePrice,
           volume:sum TradeVolume,
           vwap:TradeVolume wavg TradePrice 
        by 5 xbar Time.minute  
        from trade where Symbol=symbol}

q)ohlcLookup[`MSFT]

minute| low      open     close    high     volume  vwap    
------| ----------------------------------------------------
04:00 | 455.29   456.2    455.61   456.7    5742    455.8347
04:05 | 455.11   455.5    455.32   455.67   1352    455.4394
04:10 | 455.07   455.33   455.37   455.47   1879    455.238 
...

q)ohlcLookup[`GOOG]
minute| low      open     close    high     volume vwap    
------| ---------------------------------------------------
04:00 | 183.94   184.01   184.14   184.38   2315   184.0851
04:05 | 183.97   184.07   184.02   184.07   373    184.0005
04:10 | 184.06   184.06   184.1    184.23   239    184.1581
...

By leveraging kdb+’s high-performance architecture and the simplicity of its vector programming language, q, analysts can efficiently perform various calculations such as volume profiling, volume-weighted averages, and open-high-low-close (OHLC) analysis to gain meaningful insights from massive datasets quickly and effectively.

If you have any questions or would like to know more, why not join our Slack community or begin your certification journey with our free curriculum on the KX Academy

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

Capital Markets

As a customer of KX for 10+ years, they knew they could rely on KX’s team and its real-time database to easily migrate into the cloud.

詳細を読む 概要 Japanese Bank


AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

当社のチームが以下の実現をサポートします:

  • ストリーミング、リアルタイム、および過去データに最適化された設計
  • エンタープライズ向けのスケーラビリティ、耐障害性、統合性、そして高度な分析機能
  • 幅広い開発言語との統合に対応する充実したツール群

専門担当者によるデモをリクエスト

*」は必須フィールドを示します

本フォームを送信いただくと、KXの製品・サービス、お知らせ、イベントに関する営業・マーケティング情報をお受け取りいただけます。プライバシーポリシーからお手続きいただくことで購読解除も可能です。当社の個人情報の収集・使用に関する詳しい情報については、プライバシーポリシーをご覧ください。

このフィールドは入力チェック用です。変更しないでください。

タイムシリーズ分野におけるG2認定リーダー