Tutorial: Fuzzy filters for symbol changes with KDB-X AI-Libs

Tutorial: Fuzzy filters for symbol changes with KDB-X AI-Libs

作者

Laura Kerr

Community Management Lead

投稿済み

17 9月, 2025

読む時間

ポイント

Fuzzy filters in KDB-X allow for resilient queries against real-world data, where typos, spelling differences, or symbol changes are common.
By using appropriate distance metrics like Levenshtein distance, you can retrieve accurate results despite these variations
By selecting appropriate distance metrics and thresholds, you can tailor fuzzy filters for your specific use case, whether that’s financial data, customer records, or any domain where identifiers evolve.

Real-world data often contains minor variations in symbol/string values, which can be a time-consuming effort to normalize, especially across large volumes. Fuzzy filters using KDB-X AI libs helps align data with small differences, without the need to cleanse your data before querying.

The KDB-X AI libs module provides fuzzy filters with simple functions, such as .ai.fuzzy.dist and .ai.fuzzy.search. In this tutorial, we will walk through examples of how to use both. You can also explore the entire notebook via GitHub.

Let’s begin by loading the AI libraries module and exploring the dataset:

.ai:use`kx.ai

first ohlc

date      | 2025.02.12
sym       | `ADD
company   | "Color Star Technology Co Ltd"
close     | 1.23
volume    | 495439
open      | 1.24
high      | 1.28
low       | 1.12

This table provides daily OHLC data for multiple stock symbols. Each row contains the opening price (open), the closing price (close), the maximum price (high), and the minimum price (low) for each day.

Example: Handling international spelling differences

Suppose we want to retrieve rows for Color Star Technology Co Ltd. The company name in the dataset uses US English spelling, but what if we query using the UK spelling “Colour”?

select from ohlc where company like "Colour Star Technology Co Ltd"

This query fails, returning no rows. To solve this, we can apply fuzzy filters.

Step 1: Get distinct company names

comps:exec distinct company from ohlc

Step 2: Compute fuzzy distances

.ai.fuzzy.dist[comps;"Colour Star Technology Co Ltd";`levenshtein]

1 26 24 27 24 26 20 27f

Here, we use Levenshtein distance, which counts the number of character edits needed to transform one string into another.

Let’s view all available distance metrics:

.ai.fuzzy.utils.fuzzyDistances

`levenshtein`levenshteinNorm`indel`hamming`jaro`jaroWinkler`lcs`damerau`osa`prefix`postfix

Step 3: Search with a threshold

res:.ai.fuzzy.search[comps;"Colour Star Technology Co Ltd";1;`levenshtein]

select from ohlc where company like raze res[2]

This successfully returns all rows for Color Star Technology Co Ltd, even though the query used “Colour”.

Example 2: Tracking stock symbol changes

Stock symbols change over time. Let’s see how fuzzy filters help handle this automatically.

Step 1: Count expected rows

select count distinct date from ohlc

date
----
124

The dataset spans ~6 months, so we expect 124 rows per symbol.

Step 2: Query symbol `HSHP`

count select from ohlc where sym=`HSHP

Only 47 rows appear. Why?

On June 3rd, 2025, Himalaya Shipping Ltd changed its symbol from HSHP to HSHIP:

Source: https://www.nasdaq.com/market-activity/stocks/symbol-change-history

Step 3: Use fuzzy distance

syms:exec distinct sym from ohlc;
.ai.fuzzy.dist[syms;`HSHP;`levenshtein]

4 4 4 3 1 4 4 4 0f

One match has distance 0 (HSHP itself), and another only 1 character away.

Step 4: Search for close matches

res:.ai.fuzzy.search[syms;`HSHP;2;`levenshtein]

res
0    1
8    4
HSHP HSHIP

The filter identifies HSHP and HSHIP.

Step 5: Query both symbols

select from ohlc where sym in res[2]

date       sym   company                 close volume open high   low
------------------------------------------------------------------------
2025.02.12 HSHIP "Himalaya Shipping Ltd" 4.83  205194 4.75 4.85   4.738
2025.02.13 HSHIP "Himalaya Shipping Ltd" 4.94  89261  4.87 4.98   4.8501
2025.02.14 HSHIP "Himalaya Shipping Ltd" 4.99  87297  4.99 5.055  4.95
2025.02.18 HSHIP "Himalaya Shipping Ltd" 5.23  230474 5.13 5.3    5.13
2025.02.19 HSHIP "Himalaya Shipping Ltd" 5.27  279502 5.3  5.37   5.14
2025.02.20 HSHIP "Himalaya Shipping Ltd" 5.5   170577 5.45 5.68   5.44
2025.02.21 HSHIP "Himalaya Shipping Ltd" 5.34  239442 5.6  5.6    5.32
2025.02.24 HSHIP "Himalaya Shipping Ltd" 5.18  278755 5.2  5.345  5.11
2025.02.25 HSHIP "Himalaya Shipping Ltd" 5.42  258100 5.42 5.59   5.34
2025.02.26 HSHIP "Himalaya Shipping Ltd" 5.68  198694 5.58 5.75   5.58
2025.02.27 HSHIP "Himalaya Shipping Ltd" 5.31  198515 5.4  5.45   5.28
2025.02.28 HSHIP "Himalaya Shipping Ltd" 5.45  174216 5.48 5.54   5.36
2025.03.03 HSHIP "Himalaya Shipping Ltd" 5.34  309573 5.55 5.58   5.335
2025.03.04 HSHIP "Himalaya Shipping Ltd" 5.25  147806 5.13 5.31   5.04
2025.03.05 HSHIP "Himalaya Shipping Ltd" 5.44  216607 5.67 5.67   5.41
2025.03.06 HSHIP "Himalaya Shipping Ltd" 5.52  189787 5.51 5.61   5.46
2025.03.07 HSHIP "Himalaya Shipping Ltd" 5.65  202489 5.71 5.75   5.625
..

This retrieves all rows across both symbols.

Step 6: Verify full coverage

count select from ohlc where sym in res[2]

We now have the complete dataset for Himalaya Shipping Ltd, regardless of its symbol change.

Fuzzy Filters in KDB-X allow resilient queries against real-world data, where typos, spelling differences, or symbol changes are common.

Example 1 demonstrated how fuzzy matching overcomes spelling variations (e.g., Color vs. Colour)
Example 2 demonstrated how to handle stock symbol changes seamlessly

By selecting appropriate distance metrics and thresholds, you can tailor fuzzy filters for your specific use case, whether that’s financial data, customer records, or any domain where identifiers evolve.

If you enjoyed this blog and would like to explore other examples, you can visit our GitHub repository. You can also begin your journey with KDB-X by signing up for the KDB-X Community Edition Public Preview.

AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

当社のチームが以下の実現をサポートします：

ストリーミング、リアルタイム、および過去データに最適化された設計

エンタープライズ向けのスケーラビリティ、耐障害性、統合性、そして高度な分析機能

幅広い開発言語との統合に対応する充実したツール群

Tutorial: Fuzzy filters for symbol changes with KDB-X AI-Libs

Developer

ポイント

Example: Handling international spelling differences

Step 1: Get distinct company names

Step 2: Compute fuzzy distances

Step 3: Search with a threshold

Example 2: Tracking stock symbol changes

Step 1: Count expected rows

Step 2: Query symbol `HSHP`

Step 3: Use fuzzy distance

Step 4: Search for close matches

Step 5: Query both symbols

Step 6: Verify full coverage

Customer Stories

AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

専門担当者によるデモをリクエスト

タイムシリーズ分野におけるG2認定リーダー

Modernizing infrastructures that mix Python and q

The ultimate guide to choosing embedding models for AI applications

7つの革新的トレーディングアプリ (および実践できる7つのベストプラクティス)

11 insights to help quants break through data and analytics barriers

PyKX 3.0: Easier to use and more powerful than ever

リアルタイム分析でクオンツパフォーマンスを飛躍的に向上

Webinar: Six best practices for optimizing trade execution

Developer

ポイント

Example: Handling international spelling differences

Step 1: Get distinct company names

Step 2: Compute fuzzy distances

Step 3: Search with a threshold

Example 2: Tracking stock symbol changes

Step 1: Count expected rows

Step 2: Query symbol `HSHP`

Step 3: Use fuzzy distance

Step 4: Search for close matches

Step 5: Query both symbols

Step 6: Verify full coverage

Solving the longitude problem in capital markets: Why AI needs temporal precision

KDB.AI is now a service inside KDB-X: Vector search where your data already lives

Teaching Claude Code to speak KX: Open-source plugins for q, KDB-X, pykx, and KDB.AI

Customer Stories

AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

専門担当者によるデモをリクエスト

タイムシリーズ分野におけるG2認定リーダー