Fuzzy Filters For Symbol Changes With KDB X AI Libs

Tutorial: Fuzzy filters for symbol changes with KDB-X AI-Libs

作者

Laura Kerr

Community Management Lead

ポイント

  1. Fuzzy filters in KDB-X allow for resilient queries against real-world data, where typos, spelling differences, or symbol changes are common.
  2. By using appropriate distance metrics like Levenshtein distance, you can retrieve accurate results despite these variations
  3. By selecting appropriate distance metrics and thresholds, you can tailor fuzzy filters for your specific use case, whether that’s financial data, customer records, or any domain where identifiers evolve.

Real-world data often contains minor variations in symbol/string values, which can be a time-consuming effort to normalize, especially across large volumes. Fuzzy filters using KDB-X AI libs helps align data with small differences, without the need to cleanse your data before querying.

The KDB-X AI libs module provides fuzzy filters with simple functions, such as .ai.fuzzy.dist and .ai.fuzzy.search. In this tutorial, we will walk through examples of how to use both. You can also explore the entire notebook via GitHub.

Let’s begin by loading the AI libraries module and exploring the dataset:

q
.ai:use`kx.ai
q
first ohlc
date      | 2025.02.12
sym       | `ADD
company   | "Color Star Technology Co Ltd"
close     | 1.23
volume    | 495439
open      | 1.24
high      | 1.28
low       | 1.12

This table provides daily OHLC data for multiple stock symbols. Each row contains the opening price (open), the closing price (close), the maximum price (high), and the minimum price (low) for each day.

Example: Handling international spelling differences

Suppose we want to retrieve rows for Color Star Technology Co Ltd. The company name in the dataset uses US English spelling, but what if we query using the UK spelling “Colour”?

q
select from ohlc where company like "Colour Star Technology Co Ltd"

This query fails, returning no rows. To solve this, we can apply fuzzy filters.

Step 1: Get distinct company names

q
comps:exec distinct company from ohlc

Step 2: Compute fuzzy distances

q
.ai.fuzzy.dist[comps;"Colour Star Technology Co Ltd";`levenshtein]
1 26 24 27 24 26 20 27f

Here, we use Levenshtein distance, which counts the number of character edits needed to transform one string into another.

Let’s view all available distance metrics:

q
.ai.fuzzy.utils.fuzzyDistances
`levenshtein`levenshteinNorm`indel`hamming`jaro`jaroWinkler`lcs`damerau`osa`prefix`postfix

Step 3: Search with a threshold

q
res:.ai.fuzzy.search[comps;"Colour Star Technology Co Ltd";1;`levenshtein]
select from ohlc where company like raze res[2]

This successfully returns all rows for Color Star Technology Co Ltd, even though the query used “Colour”.

Example 2: Tracking stock symbol changes

Stock symbols change over time. Let’s see how fuzzy filters help handle this automatically.

Step 1: Count expected rows

q
select count distinct date from ohlc
date
----
124

The dataset spans ~6 months, so we expect 124 rows per symbol.

Step 2: Query symbol `HSHP`

q
count select from ohlc where sym=`HSHP
47

Only 47 rows appear. Why?

On June 3rd, 2025, Himalaya Shipping Ltd changed its symbol from HSHP to HSHIP:

Stock Symbol Change History

Source: https://www.nasdaq.com/market-activity/stocks/symbol-change-history

Step 3: Use fuzzy distance

q
syms:exec distinct sym from ohlc;
.ai.fuzzy.dist[syms;`HSHP;`levenshtein]
4 4 4 3 1 4 4 4 0f

One match has distance 0 (HSHP itself), and another only 1 character away.

Step 4: Search for close matches

q
res:.ai.fuzzy.search[syms;`HSHP;2;`levenshtein]
res
0    1
8    4
HSHP HSHIP

The filter identifies HSHP and HSHIP.

Step 5: Query both symbols

q
select from ohlc where sym in res[2]
date       sym   company                 close volume open high   low
------------------------------------------------------------------------
2025.02.12 HSHIP "Himalaya Shipping Ltd" 4.83  205194 4.75 4.85   4.738
2025.02.13 HSHIP "Himalaya Shipping Ltd" 4.94  89261  4.87 4.98   4.8501
2025.02.14 HSHIP "Himalaya Shipping Ltd" 4.99  87297  4.99 5.055  4.95
2025.02.18 HSHIP "Himalaya Shipping Ltd" 5.23  230474 5.13 5.3    5.13
2025.02.19 HSHIP "Himalaya Shipping Ltd" 5.27  279502 5.3  5.37   5.14
2025.02.20 HSHIP "Himalaya Shipping Ltd" 5.5   170577 5.45 5.68   5.44
2025.02.21 HSHIP "Himalaya Shipping Ltd" 5.34  239442 5.6  5.6    5.32
2025.02.24 HSHIP "Himalaya Shipping Ltd" 5.18  278755 5.2  5.345  5.11
2025.02.25 HSHIP "Himalaya Shipping Ltd" 5.42  258100 5.42 5.59   5.34
2025.02.26 HSHIP "Himalaya Shipping Ltd" 5.68  198694 5.58 5.75   5.58
2025.02.27 HSHIP "Himalaya Shipping Ltd" 5.31  198515 5.4  5.45   5.28
2025.02.28 HSHIP "Himalaya Shipping Ltd" 5.45  174216 5.48 5.54   5.36
2025.03.03 HSHIP "Himalaya Shipping Ltd" 5.34  309573 5.55 5.58   5.335
2025.03.04 HSHIP "Himalaya Shipping Ltd" 5.25  147806 5.13 5.31   5.04
2025.03.05 HSHIP "Himalaya Shipping Ltd" 5.44  216607 5.67 5.67   5.41
2025.03.06 HSHIP "Himalaya Shipping Ltd" 5.52  189787 5.51 5.61   5.46
2025.03.07 HSHIP "Himalaya Shipping Ltd" 5.65  202489 5.71 5.75   5.625
..

This retrieves all rows across both symbols.

Step 6: Verify full coverage

q
count select from ohlc where sym in res[2]
124

We now have the complete dataset for Himalaya Shipping Ltd, regardless of its symbol change.

Fuzzy Filters in KDB-X allow resilient queries against real-world data, where typos, spelling differences, or symbol changes are common.

  • Example 1 demonstrated how fuzzy matching overcomes spelling variations (e.g., Color vs. Colour)
  • Example 2 demonstrated how to handle stock symbol changes seamlessly

By selecting appropriate distance metrics and thresholds, you can tailor fuzzy filters for your specific use case, whether that’s financial data, customer records, or any domain where identifiers evolve.

If you enjoyed this blog and would like to explore other examples, you can visit our GitHub repository. You can also begin your journey with KDB-X by signing up for the KDB-X Community Edition Public Preview.

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

資本市場

AxiはKXを使用して、ストリーミング・データをリアルタイムかつ大規模に取り込み、分析し、可視化しています。

詳細を読む 概要 Axi
資本市場

ADSSはKXリアルタイムデータプラットフォームを活用し、変革的成長戦略を加速させます。

詳細を読む 概要 ADSS
資本市場

10年以上にわたってKXの顧客である同社は、KXのチームとリアルタイムデータベースを信頼し、技術的な観点からクラウドへの移行を容易にできる確信がありました。

詳細を読む 概要 日本の投資銀行


AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

当社のチームが以下の実現をサポートします:

  • ストリーミング、リアルタイム、および過去データに最適化された設計
  • エンタープライズ向けのスケーラビリティ、耐障害性、統合性、そして高度な分析機能
  • 幅広い開発言語との統合に対応する充実したツール群

専門担当者によるデモをリクエスト

*」は必須フィールドを示します

このフィールドは入力チェック用です。変更しないでください。

本フォームを送信いただくと、KXの製品・サービス、お知らせ、イベントに関する営業・マーケティング情報をお受け取りいただけます。プライバシーポリシーからお手続きいただくことで購読解除も可能です。当社の個人情報の収集・使用に関する詳しい情報については、プライバシーポリシーをご覧ください。

タイムシリーズ分野におけるG2認定リーダー

// social // social