Fuzzy Filters For Symbol Changes With KDB X AI Libs

Tutorial: Fuzzy filters for symbol changes with KDB-X AI-Libs

Author

Laura Kerr

Community Management Lead

Key Takeaways

  1. Fuzzy filters in KDB-X allow for resilient queries against real-world data, where typos, spelling differences, or symbol changes are common.
  2. By using appropriate distance metrics like Levenshtein distance, you can retrieve accurate results despite these variations
  3. By selecting appropriate distance metrics and thresholds, you can tailor fuzzy filters for your specific use case, whether that’s financial data, customer records, or any domain where identifiers evolve.

Real-world data often contains minor variations in symbol/string values, which can be a time-consuming effort to normalize, especially across large volumes. Fuzzy filters using KDB-X AI libs helps align data with small differences, without the need to cleanse your data before querying.

The KDB-X AI libs module provides fuzzy filters with simple functions, such as .ai.fuzzy.dist and .ai.fuzzy.search. In this tutorial, we will walk through examples of how to use both. You can also explore the entire notebook via GitHub.

Let’s begin by loading the AI libraries module and exploring the dataset:

q
.ai:use`kx.ai
q
first ohlc
date      | 2025.02.12
sym       | `ADD
company   | "Color Star Technology Co Ltd"
close     | 1.23
volume    | 495439
open      | 1.24
high      | 1.28
low       | 1.12

This table provides daily OHLC data for multiple stock symbols. Each row contains the opening price (open), the closing price (close), the maximum price (high), and the minimum price (low) for each day.

Example: Handling international spelling differences

Suppose we want to retrieve rows for Color Star Technology Co Ltd. The company name in the dataset uses US English spelling, but what if we query using the UK spelling “Colour”?

q
select from ohlc where company like "Colour Star Technology Co Ltd"

This query fails, returning no rows. To solve this, we can apply fuzzy filters.

Step 1: Get distinct company names

q
comps:exec distinct company from ohlc

Step 2: Compute fuzzy distances

q
.ai.fuzzy.dist[comps;"Colour Star Technology Co Ltd";`levenshtein]
1 26 24 27 24 26 20 27f

Here, we use Levenshtein distance, which counts the number of character edits needed to transform one string into another.

Let’s view all available distance metrics:

q
.ai.fuzzy.utils.fuzzyDistances
`levenshtein`levenshteinNorm`indel`hamming`jaro`jaroWinkler`lcs`damerau`osa`prefix`postfix

Step 3: Search with a threshold

q
res:.ai.fuzzy.search[comps;"Colour Star Technology Co Ltd";1;`levenshtein]
select from ohlc where company like raze res[2]

This successfully returns all rows for Color Star Technology Co Ltd, even though the query used “Colour”.

Example 2: Tracking stock symbol changes

Stock symbols change over time. Let’s see how fuzzy filters help handle this automatically.

Step 1: Count expected rows

q
select count distinct date from ohlc
date
----
124

The dataset spans ~6 months, so we expect 124 rows per symbol.

Step 2: Query symbol `HSHP`

q
count select from ohlc where sym=`HSHP
47

Only 47 rows appear. Why?

On June 3rd, 2025, Himalaya Shipping Ltd changed its symbol from HSHP to HSHIP:

Stock Symbol Change History

Source: https://www.nasdaq.com/market-activity/stocks/symbol-change-history

Step 3: Use fuzzy distance

q
syms:exec distinct sym from ohlc;
.ai.fuzzy.dist[syms;`HSHP;`levenshtein]
4 4 4 3 1 4 4 4 0f

One match has distance 0 (HSHP itself), and another only 1 character away.

Step 4: Search for close matches

q
res:.ai.fuzzy.search[syms;`HSHP;2;`levenshtein]
res
0    1
8    4
HSHP HSHIP

The filter identifies HSHP and HSHIP.

Step 5: Query both symbols

q
select from ohlc where sym in res[2]
date       sym   company                 close volume open high   low
------------------------------------------------------------------------
2025.02.12 HSHIP "Himalaya Shipping Ltd" 4.83  205194 4.75 4.85   4.738
2025.02.13 HSHIP "Himalaya Shipping Ltd" 4.94  89261  4.87 4.98   4.8501
2025.02.14 HSHIP "Himalaya Shipping Ltd" 4.99  87297  4.99 5.055  4.95
2025.02.18 HSHIP "Himalaya Shipping Ltd" 5.23  230474 5.13 5.3    5.13
2025.02.19 HSHIP "Himalaya Shipping Ltd" 5.27  279502 5.3  5.37   5.14
2025.02.20 HSHIP "Himalaya Shipping Ltd" 5.5   170577 5.45 5.68   5.44
2025.02.21 HSHIP "Himalaya Shipping Ltd" 5.34  239442 5.6  5.6    5.32
2025.02.24 HSHIP "Himalaya Shipping Ltd" 5.18  278755 5.2  5.345  5.11
2025.02.25 HSHIP "Himalaya Shipping Ltd" 5.42  258100 5.42 5.59   5.34
2025.02.26 HSHIP "Himalaya Shipping Ltd" 5.68  198694 5.58 5.75   5.58
2025.02.27 HSHIP "Himalaya Shipping Ltd" 5.31  198515 5.4  5.45   5.28
2025.02.28 HSHIP "Himalaya Shipping Ltd" 5.45  174216 5.48 5.54   5.36
2025.03.03 HSHIP "Himalaya Shipping Ltd" 5.34  309573 5.55 5.58   5.335
2025.03.04 HSHIP "Himalaya Shipping Ltd" 5.25  147806 5.13 5.31   5.04
2025.03.05 HSHIP "Himalaya Shipping Ltd" 5.44  216607 5.67 5.67   5.41
2025.03.06 HSHIP "Himalaya Shipping Ltd" 5.52  189787 5.51 5.61   5.46
2025.03.07 HSHIP "Himalaya Shipping Ltd" 5.65  202489 5.71 5.75   5.625
..

This retrieves all rows across both symbols.

Step 6: Verify full coverage

q
count select from ohlc where sym in res[2]
124

We now have the complete dataset for Himalaya Shipping Ltd, regardless of its symbol change.

Fuzzy Filters in KDB-X allow resilient queries against real-world data, where typos, spelling differences, or symbol changes are common.

  • Example 1 demonstrated how fuzzy matching overcomes spelling variations (e.g., Color vs. Colour)
  • Example 2 demonstrated how to handle stock symbol changes seamlessly

By selecting appropriate distance metrics and thresholds, you can tailor fuzzy filters for your specific use case, whether that’s financial data, customer records, or any domain where identifiers evolve.

If you enjoyed this blog and would like to explore other examples, you can visit our GitHub repository. You can also begin your journey with KDB-X by signing up for the KDB-X Community Edition Public Preview.

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

ADSS Logo
Capital Markets

ADSS leverages KX real-time data platform to accelerate its transformational growth strategy.

Read More About ADSS
Axi logo
Capital Markets

Axi uses KX to capture, analyze, and visualize streaming data in real-time and at scale.

Read More About Axi


Demo the world’s fastest database for vector, time-series, and real-time analytics

Start your journey to becoming an AI-first enterprise with 100x* more performant data and MLOps pipelines.

  • Process data at unmatched speed and scale
  • Build high-performance data-driven applications
  • Turbocharge analytics tools in the cloud, on premise, or at the edge

*Based on time-series queries running in real-world use cases on customer environments.

Book a demo with an expert

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

By submitting this form, you will also receive sales and/or marketing communications on KX products, services, news and events. You can unsubscribe from receiving communications by visiting our Privacy Policy. You can find further information on how we collect and use your personal data in our Privacy Policy.

// social // social