Detect crypto patterns with KDB-X Temporal Similarity Search

Tutorial: Detect crypto patterns with KDB-X TSS

Samuel Bruce

作者

Samuel Bruce

kdb+ developer

ポイント

  1. TSS enables analysts to detect recurring shapes and motifs in time-series data. This is especially valuable in volatile crypto markets, where past behaviors often repeat under similar conditions.
  2. TSS helps uncover signals by comparing historical and current data "shapes" to identify potential inflection points.
  3. KDB-X can handle both overlapping data windows and multi-day patterns, providing deeper insights into long-term trends.

Digital assets, such as Bitcoin (BTC), are well-known for their dramatic price fluctuations. In just a matter of hours, the market can pivot from euphoria to panic, with sharp rallies followed by swift corrections. For many, this volatility is a source of risk, but it can also be seen as an opportunity, especially when historical patterns begin to repeat.

Unlike traditional markets, which are regulated, mature, and relatively liquid, the crypto market is still evolving. Trading takes place 24/7 across fragmented exchanges with lower liquidity, making it susceptible to outsized movements from relatively small trades. When coupled with high leverage and real-time reactions to news and social media sentiment, price action in digital assets often seems chaotic and unpredictable.

Yet despite this chaos, market behavior often falls into familiar rhythms. Rallies build in stages, corrections follow predictable patterns, and “shapes” of movement tend to precede key inflection points.

In this blog, I will demonstrate how Temporal Similarity Search (TSS) in KDB-X can help identify these “shapes” of time series data, uncovering similar patterns and recurring motifs, and how dynamic querying can enable analysts to better understand market behavior in real time.

Time Series Windows in KDB X

If you would like to follow along, you can do so by downloading and installing the KDB-X Community Edition public preview from https://kdb-x.kx.com and a sample Bitcoin dataset from Kaggle. You can also follow the full tutorial on GitHub.

Load and prepare data

From a q session, we load the ai-libs initialization script:

q
\l ai-libs/init.q

Next, we will load three one-minute datasets (BTC, ETH, and LTC) into table memory:

q
tab:(" *SFFFFF";enlist ",") 0: `$":gemini_BTCUSD_2020_1min.csv";
tab,:(" *SFFFFF";enlist ",") 0: `$":gemini_ETHUSD_2020_1min.csv";
tab,:(" *SFFFFF";enlist ",") 0: `$":gemini_LTCUSD_2020_1min.csv";

Once done, we parse and clean the time column, then reorder and sort the table so that we can save our data to disk, partitioning by date:

q
time:sum@/:(("D";"N")$'/:" " vs/: tab`Date);
tab:`time xcols update time:time from delete Date from tab;
tab:`time xasc `time`sym`open`high`low`close`volume xcol tab;
q
dts:asc exec distinct `date$time from tab;
{[dt;t]
    (hsym `$"cryptodb/",string[dt],"/trade/") set .Q.en[`:cryptodb] `time xasc select from t where dt=`date$time;
    }[;tab] each dts;
.Q.chk[`:cryptodb];
delete tab from `.;
.Q.gc[];

Finally, we’ll load the on-disk data back into our process:

q
.Q.lo[`:cryptodb;0;0];

We can validate our work by performing a simple lookup:

q
first select from trade where date=first date, i=0
date  | 2020.01.01
time  | 2020.01.01D00:00:00.000000000
sym   | `sym$`BTCUSD
open  | 7165.9
high  | 7170.79
low   | 7163.3
close | 7163.3
volume| 0.00793095

As we can see, the first record returned highlights a one-minute snapshot of BTC/USD trading on January 1st, 2020, including the timestamp, price range (open, high, low, close), and traded volume.

Perform TSS

With our data partitioned, we can begin searching for patterns of interest. To achieve this, we will create a V-shaped float vector and a pattern length of 64 to simulate market unpredictability:

q
q:abs neg[32]+til 64;
k:10000;

For data that is non-continuous in time across date partitions, such as the NYSE, which doesn’t have 24-hour trading days, analysts are not typically interested in pattern matches that cross the date boundary. However, for markets that trade continuously across midnight, such as BTC, they may wish to find patterns that span multiple partitions, which means they may need to account for patterns in the overlap.

We will demonstrate both cases.

Query data

Let’s perform a series of queries to test our deployment. First, we will perform TSS on each daily partition, identifying where the random pattern best matches within that day’s close prices:

q
t:select {a:.ai.tss.tss[x;q;k;`ignoreErrors`returnMatches!11b];a@\:iasc a[1]} close by date from trade where sym = `BTCUSD;

The result is processed to match relevant data:

q
res:select from trade where sym=`BTCUSD, {[x;y;z] a:x[z;`close;1]; $[all null a;y#0b;@[y#0b;a;:;1b]]}[t;count i;first date];

The data is then flattened using ‘ungroup’, with two new columns, ‘dist’ and ‘match’, being populated with the distances and matching values from the search. Finally, the data is filtered to retrieve only the rows with the smallest distance values, which are considered the best matches:

q
d:(0!t)`close;
res:res,' ungroup ([] dist:d[;0]; match:d[;2]);
res:`dist xasc select from res where i in k#iasc dist;
    date       time                          sym    open     high     low      close    volume      dist     match       ..
    ---------------------------------------------------------------------------------------------------------------------..
    2020.12.08 2020.12.08D12:11:00.000000000 BTCUSD 18842.04 18842.04 18841.2  18841.2  0.05202422  2.227005 18841.2  188..
    2021.01.28 2021.01.28D20:14:00.000000000 BTCUSD 32925    32961.79 32925    32956.85 6.052956    2.298247 32956.85 329..
    2020.12.08 2020.12.08D12:10:00.000000000 BTCUSD 18848.79 18848.79 18841.29 18842.04 0.1091669   2.319017 18842.04 188..
    2020.11.04 2020.11.04D02:13:00.000000000 BTCUSD 13857.96 13891.4  13857.96 13891.4  0.4292499   2.341793 13891.4  138..
    2020.02.23 2020.02.23D21:33:00.000000000 BTCUSD 9905.39  9905.39  9905.39  9905.39  0           2.353222 9905.39  990..
    2020.02.23 2020.02.23D21:34:00.000000000 BTCUSD 9905.39  9905.39  9902.59  9902.66  0.08768819  2.437255 9902.66  990..
    2020.12.08 2020.12.08D12:12:00.000000000 BTCUSD 18841.2  18841.2  18840    18840    0.01363459  2.4405   18840    188..
    2021.01.13 2021.01.13D09:19:00.000000000 BTCUSD 34930.64 34987.76 34930.64 34987.76 0.001133045 2.45972  34987.76 350..
    2021.01.28 2021.01.28D20:13:00.000000000 BTCUSD 32908.97 32925    32882.39 32925    7.625257    2.470088 32925    329..
    2021.01.28 2021.01.28D20:15:00.000000000 BTCUSD 32956.85 32976.23 32949.08 32949.09 11.53418    2.470146 32949.09 329..

Z Normalized Matches

Plotting the z-normalized results reveals our top 30 closest matches

 

We can also search across the overlap of dates to see if TSS can detect patterns that may begin in one partition and continue into the next:

q
ovl:(0N;2*count[q])#count[q]_select from trade where sym=`BTCUSD, (i in count[q]#i) | (i in neg[count[q]]#i);
ovltss:.ai.tss.tss[;q;k;`ignoreErrors`returnMatches!11b] each ovl[;`close];

Finally, we can consolidate the two searches by filtering the ovltss results:

q
maxTopK:max res`dist;
better:where@'ovltss[;0]<maxTopK;
betterOverlap:raze ovl@'ovltss[;1]@'better;

Match data and distance data are consolidated into two separate lists with a new table called betterOverlapFull, which combines the betterOverlap data with dist and match into a single table:

q
match:raze ovltss[;2]@'better;
dist:raze ovltss[;0]@'better;
betterOverlapFull:betterOverlap,'([] dist:dist; match:match);

Overlap Misses

Missed matches when overlap is not considered

This process is designed to refine the results, ensuring that only the best matches are kept, and relevant information is combined into a final table, sorted for the final output res that contains the top k closest matches sorted by distance:

q
res:k#`dist xasc res,betterOverlapFull;

Working with time-series data, especially in crypto, demands more than just storing and retrieving records. You need the ability to uncover patterns, trends, and behaviors hidden across massive datasets. In this tutorial, we explored how to do exactly that using Temporal Similarity Search (TSS). Whether you’re looking for trends within a single day or across partition boundaries, the techniques shown here, including overlap handling and symbol filtering, ensure you won’t miss critical insights.

If you enjoyed this blog and would like to explore other examples, you can visit our GitHub repository. You can also begin your journey with KDB-X by signing up for the KDB-X Community Edition Public Preview, where you can test, experiment, and build high-performance data-intensive applications with exclusive access to continuous feature updates.

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

資本市場

AxiはKXを使用して、ストリーミング・データをリアルタイムかつ大規模に取り込み、分析し、可視化しています。

詳細を読む 概要 Axi
資本市場

ADSSはKXリアルタイムデータプラットフォームを活用し、変革的成長戦略を加速させます。

詳細を読む 概要 ADSS
資本市場

10年以上にわたってKXの顧客である同社は、KXのチームとリアルタイムデータベースを信頼し、技術的な観点からクラウドへの移行を容易にできる確信がありました。

詳細を読む 概要 日本の投資銀行


AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

当社のチームが以下の実現をサポートします:

  • ストリーミング、リアルタイム、および過去データに最適化された設計
  • エンタープライズ向けのスケーラビリティ、耐障害性、統合性、そして高度な分析機能
  • 幅広い開発言語との統合に対応する充実したツール群

専門担当者によるデモをリクエスト

*」は必須フィールドを示します

このフィールドは入力チェック用です。変更しないでください。

本フォームを送信いただくと、KXの製品・サービス、お知らせ、イベントに関する営業・マーケティング情報をお受け取りいただけます。プライバシーポリシーからお手続きいただくことで購読解除も可能です。当社の個人情報の収集・使用に関する詳しい情報については、プライバシーポリシーをご覧ください。

タイムシリーズ分野におけるG2認定リーダー

// social // social