Mastering TAQ data analysis with kdb+

Author

Michaela Woods

Developer Advocate

Processing and analyzing large financial datasets can be challenging, especially when dealing with Trade and Quote (TAQ) data, which provides quants and analysts with rich information regarding market activity. In this blog, I will demonstrate how kdb+, the world’s fastest analytical database, can mitigate these challenges and walk through common calculations such as VWAP and OHLC.

kdb+ offers several benefits for quants, from its small footprint and high-performant scalable architecture to the elegance and simplicity of its vector programming language q.

Let’s begin

Configure the environment

Step 1: Prerequisites

If you wish to follow along, you’ll need to download and install kdb+ on your machine. You can do so for free here.

Step 2: Download TAQ data

With our prerequisites met, we can now download a daily TAQ file from the NYSE. (Note: These files can be large, so downloading may take some time).

Q (kdb+ database)
wget https://ftp.nyse.com/Historical%20Data%20Samples/DAILY%20TAQ/EQY_US_ALL_TRADE_20240702.gz

Step 3: Clone the repository

Next, we will clone our kdb-taq repository, including the scripts and utilities required to process our data.

Q (kdb+ database)
git clone https://github.com/KxSystems/kdb-taq.git
cd kdb-taq

This repository contains a script (tq.q) that helps process and load the data.

Step 4: Data preparation

With the repository cloned, we will now decompress our TAQ data and organize it into a usable directory for kdb+.

Q (kdb+ database)
mkdir SRC
mv /path/to/EQY_US_ALL_TRADE_20240702.gz SRC/
gzip -d SRC/*

Step 5: Process the data

We will now use the tq.q script from the repository to process the raw TAQ data. We will also use the flags -s to specify the number of processing threads and SRC to define the file directory.

Q (kdb+ database)
q tq.q -s 8 SRC

Processing will read and parse the raw data into a format suitable for querying.

Step 5: Load data into kdb+

Loading data into kdb+ is done using the \l command:

Q (kdb+ database)
q)\l tq

Query the data

Now that our data has been successfully ingested into kdb+, we can run a few simple queries. Let’s begin by viewing the schema and structure of our trade table.

We will use the meta command to return column names and data types.

Q (kdb+ database)
q)meta trade

c                                 | t f a
----------------------------------| -----
date                              | d    
Time                              | n    
Exchange                          | c    
Symbol                            | s   p
SaleCondition                     | s    
TradeVolume                       | i    
TradePrice                        | e    
TradeStopStockIndicator           | b    
TradeCorrectionIndicator          | h    
SequenceNumber                    | i    
TradeId                           | C    
SourceofTrade                     | c    
TradeReportingFacility            | b    
ParticipantTimestamp              | n    
TradeReportingFacilityTRFTimestamp| n    
TradeThroughExemptIndicator       | b

Example: Count the number of trades by stock

In our first example, we will explore the number of trades executed against each symbol and return the highest price.

Q (kdb+ database)
q)select numTrade:count i, maxPrice:max TradePrice by Symbol from trade

Symbol | numTrade maxPrice
-------| -----------------
A      | 29655    133.03  
AA     | 36623    41.15   
AAA    | 51       25.07   
AAAU   | 2786     23.13   
AACG   | 108      0.86    
AACI   | 6        11.45   
AACT   | 1033     11.38   
AACT.U | 1        10.745  
AACT.WS| 24       0.1365  
....

Query structures can affect performance. We can, however, use the function \t before our q expression to retrieve query duration.

Q (kdb+ database)
q)\t select numTrade:count i, maxPrice:max TradePrice by Symbol from trade

113

Advanced analysis

Now that we understand how to load and query our TAQ data let’s explore more advanced calculations.

Example: Volume profile

Volume profiling helps traders understand the concentration of trades at various price levels throughout the day. By doing so, they can ascertain volume over time and make more informed decisions on entry and exit markers.

Using xbar, we can group the size of orders into 5-minute intervals:

Q (kdb+ database)
q)select sum TradeVolume by 5 xbar time.minute from trade where symbol = `AAPL

minute| TradeVolume
------| -----------
04:00 | 10805      
04:05 | 2257       
04:10 | 1078       
04:15 | 2216       
04:20 | 1365       
04:25 | 4717       
...

We can also highlight the rolling sums for a symbol throughout the day.

Q (kdb+ database)
q)select time, sums TradeVolume from trade where symbol = `AAPL

Time                 TradeVolume
--------------------------------
0D04:00:00.017646926 1          
0D04:00:00.020101346 21         
0D04:00:00.022979538 22         
0D04:00:00.022987522 42         
0D04:00:00.023776991 67         
0D04:00:00.023920135 69 
....

We can also visualize these events using tools such as KX Analyst or KX Developer (available for free).

As you can see, the steep increases in the curve highlight periods of significant trading activity, reflecting market volatility or a heightened interest in a particular symbol. This is also helpful for analyzing intraday trends and periods of concentrated trading volume.

Example: Volume-weighted average

Volume-weighted averages reflect price movement more accurately by incorporating trading volume at different price levels. This is particularly useful in scenarios where traders wish to understand if movement is supported by strong market participation or a few solitary trades.

Let’s calculate the weighted average over time intervals using the function wavg.

 

Q (kdb+ database)
q)select TradeVolume wavg TradePrice by Symbol from trade

Symbol | TradePrice
-------| ----------
A      | 126.0225  
AA     | 40.73105  
AAA    | 25.03718  
AAAU   | 23.01037  
AACG   | 0.8178756 
AACI   | 11.37241  
...

We can also query the weighted average over different time intervals.

Q (kdb+ database)
q)select LastPrice:last TradePrice, 
         WeightedPrice:TradeVolume wavg TradePrice
   by 15 xbar Time.minute 
   from trade 
   where Symbol = `IBM

minute| LastPrice WeightedPrice
------| -----------------------
04:00 | 175.02    175.02       
04:30 | 174.99    174.99       
05:15 | 174.98    174.98       
05:30 | 174.93    174.8051     
05:45 | 174.96    174.91       
06:00 | 174.96    174.9558     
...

In this instance, we can see IBM’s daily price movements, with the “LastPrice” and “WeightedPrice” remaining closely aligned.

This indicates stable pricing with minimal influence from the trade volume, suggesting a moderate upward trend and positive market sentiment for IBM’s stock price.

Example: Open-high-low-close

Analysts often use methods such as OHLC to track securities’ short-term trends. Let’s create a simple query that calculates OHLC for a particular date, sym, and vwap over each 5-minute interval.

Q (kdb+ database)
q)select low:min TradePrice,
         open:first TradePrice,
         close:last TradePrice,
         high:max TradePrice,
         volume:sum TradeVolume,
         vwap:TradeVolume wavg TradePrice 
   by 5 xbar Time.minute  
   from trade where Symbol=`AAPL

minute| low      open     close    high     volume  vwap    
------| ----------------------------------------------------
04:00 | 215.89   216.6    216.11   216.6    10805   216.2432
04:05 | 216.1    216.18   216.27   216.27   2257    216.151 
04:10 | 216.2    216.28   216.35   216.35   1078    216.2547
04:15 | 216.28   216.3    216.39   216.46   2216    216.3775
04:20 | 216.24   216.38   216.26   216.42   1365    216.3638
04:25 | 216.21   216.28   216.4    216.43   4717    216.323 
.....

We can also extract the query as a function, enabling analysts to retrieve the results for a particular stock.

Q (kdb+ database)
q)ohlcLookup:{[symbol] 
    select low:min TradePrice,
           open:first TradePrice,
           close:last TradePrice,
           high:max TradePrice,
           volume:sum TradeVolume,
           vwap:TradeVolume wavg TradePrice 
        by 5 xbar Time.minute  
        from trade where Symbol=symbol}

q)ohlcLookup[`MSFT]

minute| low      open     close    high     volume  vwap    
------| ----------------------------------------------------
04:00 | 455.29   456.2    455.61   456.7    5742    455.8347
04:05 | 455.11   455.5    455.32   455.67   1352    455.4394
04:10 | 455.07   455.33   455.37   455.47   1879    455.238 
...

q)ohlcLookup[`GOOG]
minute| low      open     close    high     volume vwap    
------| ---------------------------------------------------
04:00 | 183.94   184.01   184.14   184.38   2315   184.0851
04:05 | 183.97   184.07   184.02   184.07   373    184.0005
04:10 | 184.06   184.06   184.1    184.23   239    184.1581
...

By leveraging kdb+’s high-performance architecture and the simplicity of its vector programming language, q, analysts can efficiently perform various calculations such as volume profiling, volume-weighted averages, and open-high-low-close (OHLC) analysis to gain meaningful insights from massive datasets quickly and effectively.

If you have any questions or would like to know more, why not join our Slack community or begin your certification journey with our free curriculum on the KX Academy

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

ADSS Logo
Capital Markets

ADSS leverages KX real-time data platform to accelerate its transformational growth strategy.

Read More
Axi logo
Capital Markets

Axi uses KX to capture, analyze, and visualize streaming data in real-time and at scale.

Read More
Automotive & Fleet Telematics

BWT Alpine F1® team drives success with real-time data analytics.

Read More


Accelerate your journey to AI-driven innovation with a tailored KX demo.

Our team can help you to:

  • Designed for streaming, real-time, and historical data
  • Enterprise scale, resilience, integration, and analytics
  • An extensive suite of developer language integrations

Book a demo with an expert

"*" indicates required fields

By submitting this form, you will also receive sales and/or marketing communications on KX products, services, news and events. You can unsubscribe from receiving communications by visiting our Privacy Policy. You can find further information on how we collect and use your personal data in our Privacy Policy.

This field is for validation purposes and should be left unchanged.

A verified G2 leader for time-series

Recognized by G2 as a ‘Momentum Leader’ for time series databases, and stream analytics, as ‘Leader’ for time series Intelligence, and as ‘High Performer’ for columnar databases—KX is driving innovation in real-time data analytics.

Read Reviews

// social // social