Combine Structured And Unstructured Data With KDB X

From ticks to tweets: Combining structured and unstructured financial data with KDB-X

作者

Ryan Siegler

Data Scientist

ポイント

  1. KDB-X provides advanced querying capabilities that allow users to perform complex analysis across both structured and unstructured data.
  2. Temporal Similarity Search detects patterns in time series data, allowing users to explore the narrative behind numerical trends.
  3. By combining vector-based text search with time series pattern detection, KDB-X supports sophisticated hypothesis testing.

Capital markets have always excelled at structured data analytics, but the most potent insights often lie hidden within vast pools of unstructured data. Structured data provides precision and speed, making it ideal for tasks such as quantitative modeling, risk management, and real-time trade execution. In contrast, unstructured data adds invaluable context, offering sentiment analysis, event-driven insights, and a deeper understanding of market drivers from news articles, analyst reports, earnings transcripts, and more.

In this tutorial, we will explore a real-world financial analysis scenario, using KDB-X to discover the relationship between corporate news announcements (unstructured text) and stock price movements (structured time series data). We will utilize SENS (Stock Exchange News Service) announcements from the Johannesburg Stock Exchange, along with corresponding market data obtained from Yahoo Finance.

If you would like to build this solution, you can do so by following the full tutorial on GitHub.

Let’s begin

Step 1: Data preparation and enrichment

To begin, we will load both structured market data and unstructured news announcements into KDB-X tables, ensuring that each announcement is made “queryable” by converting its body of text into numerical representations (vector embeddings). We will use a lightweight transformer model from Python’s sentence-transformers library, which we will call directly from within KDB-X, along with an embedding function to convert raw text into meaningful vectors.

q
/ Load the pre-trained transformer model using Python
model:ST[`$"paraphrase-MiniLM-L3-v2";`device pykw `cpu];

/ Define an 'embed' function that can be called from q
embed:{x[`:encode][y]`}[qmodel;];

/ Create embeddings for each announcement body
sens:update embedding:embed[`$body] from sens;

To make searching near instantaneous, we will load our embeddings into a Hierarchical Navigable Small World (HNSW) index, designed for fast similarity search.

Step 2: Explore the data

With the data prepared, we will now explore the relationship between the announcements and market performance from multiple angles.

Example: From text to trend

In our first example, we will search for a specific phrase within the news announcements, such as “interim results,” to determine if it serves as a leading indicator for market movements.

  1. We embed our search phrase into a vector.
  2. We use the HNSW index to find the 30 announcements most similar to our phrase.
  3. We then pull the stock performance for each of those companies for the 14 days following the announcement.

The results show that, on average, these announcements were followed by a 3.8% increase in stock price over the next 14 days.

Normalized Price Movements With Avg

Example: From trend to text

We can also reverse this process by highlighting a specific pattern in our structured data and the corresponding text to identify the cause. In this instance, a “V-shaped” dip and recovery, representing a five-day price fall followed by a ten-day recovery.
Using the KDB-X Temporal Similarity Search (TSS), we can identify stocks that followed this V-shaped pattern, revealing a clear theme and highlighting headlines such as “unaudited result” and “interim result”.

Z Normalized TSS Matches

In one specific case, we find that a positive earnings announcement (“Headline earnings per share up 17%”) was released at the bottom of the price dip, immediately preceding the recovery, demonstrating a clear correlation between the news event and market activity.

Announcements Matched Over Time

Example: Combined searches

Finally, we combine both techniques to backtest a hypothesis: Does a positive earnings announcement reliably cause a stock to recover after a downturn?

We designed a combined query to find instances where:

  1. A stock’s price trended downwards for 5-10 days (via a TSS search).
  2. This was immediately followed by an announcement with text similar to “Headline earnings per share up” (via an HNSW vector search).
    Using a powerful window join, we were able to find all occurrences of this combined event.

By analyzing the subsequent price action, we observed that while these positive announcements halted the bearish movement, the sharp “V-shaped” recovery identified from our previous example was less common, with stock prices increasing by a more modest 2%.

Normalized Price Pattern

Combined market response following a positive announcement:

Normalized Price Pattern Following Positive Announcement

This tutorial demonstrates the analytical power that comes from breaking down the silos between structured and unstructured data. Using the integrated AI libraries within KDB-X, we were able to:

  • Search by meaning: Starting with a simple text phrase (“interim results”), we found related announcements using vector search, and then analyzed the subsequent structured market data to measure the average stock price response
  • Search by pattern: We inverted the process by first identifying a specific time series pattern (a V-shaped price dip and recovery) in market data, and then, by joining back to the unstructured text, found the news that likely caused the movement
  • Backtest a hypothesis: By combining both techniques, we were able to test a specific theory, searching for a downward price trend (a structured pattern) followed by a positive earnings announcement (unstructured text) to see if a recovery occurred

This ability to seamlessly query, join, and analyze data based on both temporal patterns and semantic meaning in a single environment unlocks a far deeper and more nuanced understanding of market dynamics.

If you enjoyed this blog and would like to explore other examples, you can visit our GitHub repository. You can also begin your journey with KDB-X by signing up for the KDB-X Community Edition Public Preview, where you can test, experiment, and build high-performance data-intensive applications with exclusive access to continuous feature updates.

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

資本市場

10年以上にわたってKXの顧客である同社は、KXのチームとリアルタイムデータベースを信頼して、簡単にクラウドに移行できることを知っていました。

詳細を読む 概要 邦銀


AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

当社のチームが以下の実現をサポートします:

  • ストリーミング、リアルタイム、および過去データに最適化された設計
  • エンタープライズ向けのスケーラビリティ、耐障害性、統合性、そして高度な分析機能
  • 幅広い開発言語との統合に対応する充実したツール群

専門担当者によるデモをリクエスト

*」は必須フィールドを示します

本フォームを送信いただくと、KXの製品・サービス、お知らせ、イベントに関する営業・マーケティング情報をお受け取りいただけます。プライバシーポリシーからお手続きいただくことで購読解除も可能です。当社の個人情報の収集・使用に関する詳しい情報については、プライバシーポリシーをご覧ください。

このフィールドは入力チェック用です。変更しないでください。

タイムシリーズ分野におけるG2認定リーダー