A developers guide to JSON parsing in kdb+

作者

Rian Ó Cuinneagáin

Principal Software Developer

With its ability to hold complex structures, JSON has long offered kdb+ developers significant advantages over traditional CSV. From data serialization, deserialization, and API integration to easier debugging, data exchange, and unstructured data querying. However, ingesting JSON can also introduce complexities, leading to performance and scalability issues if not done correctly.

In this blog, we will explore some practical examples to help developers navigate these issues and optimize data ingestion.

Understanding datatypes

It’s important to understand that kdb+ can extract the following data types from JSON

  • String: Sequence of characters, typically used to represent text
  • Float: Real numbers that require fractional precision
  • Boolean: Binary states such as yes/no, on/off, or 1/0

This means that in addition to parsing the data from JSON, developers may wish to cast to something more suitable.

To demonstrate, let’s convert a long in kdb+ to JSON and then parse back using .j.j and .j.k

Q(kdb+データベース)
6~.j.k .j.j 6

0b

As you can see, the roundtrip fails because the input is no longer equal to the output due to the numerics being converted to floats.

 

Q(kdb+データベース)
.j.k .j.j 6

6f

JSON table encoding

Let’s now create a table in kdb+ and explore the cast and tok operators.

  • The cast operator is used to convert data from one type to another. This is particularly useful when you need to change the data type of a value to match the expected type for a specific operation or function. For example, if you have a string representing a number and you need to perform arithmetic operations on it, you can cast it to an integer or float
  • The tok operator interprets a string as a different data type. This is useful when working with data that comes in as a string (e.g., from a file or API) and must be converted to a more helpful type for analysis or processing. For example, you might receive a date as a string and need to convert it to a date type to perform date calculations

 

Q(kdb+データベース)
//Create a sample table

tab:([] longCol:1 2;
        floatCol:4 5f;
        symbolCol:`b`h;
        stringCol:("bb";"dd");
        dateCol:2018.11.23 2018.11.23;
        timeCol:00:01:00.000 00:01:00.003)
tab

longCol floatCol symbolCol stringCol dateCol    timeCol     
------------------------------------------------------------ 
1       4        b         "bb"      2018.11.23 00:01:00.000 
2       5        h         "dd"      2018.11.23 00:01:00.003


meta tab

c        | t f a
---------| -----
longCol  | j   
floatCol | f    
symbolCol| s   
stringCol| C   
dateCol  | d   
timeCol  | t

Let’s now record the round trip.

Q(kdb+データベース)
q).j.k .j.j tab
longCol floatCol symbolCol stringCol dateCol      timeCol      
----------------------------------------------------------------
1       4        ,"b"      "bb"      "2018-11-23" "00:01:00.000"
2       5        ,"h"      "dd"      "2018-11-23" "00:01:00.003"

q)meta .j.k .j.j tab
c        | t f a
---------| -----
longCol  | f   
floatCol | f   
symbolCol| C   
stringCol| C   
dateCol  | C   
timeCol  | C

Now, let’s use lowercase casts on numerics and capital case tok on our string datatypes.

Q(kdb+データベース)
//* will leave a column untouched

flip "j*S*DT"$flip .j.k .j.j tab
tab~flip "j*S*DT"$flip .j.k .j.j tab
longCol floatCol symbolCol stringCol dateCol    timeCol    
------------------------------------------------------------
1       4        b         "bb"      2018.11.23 00:01:00.000
2       5        h         "dd"      2018.11.23 00:01:00.003


1b

Instead of using flip and specifying * to leave a column untouched, we can write a simple helper function.

To begin, we will pass it a dictionary containing the rules we need to perform.

 

Q(kdb+データベース)
helper:{[t;d] ![t;();0b;key[d]!{($;x;y)}'[value d;key d]]}

castRules:`longCol`symbolCol`dateCol`timeCol!"jSDT"

tab~helper[;castRules] .j.k .j.j tab
1b

Similarly, instead of forcing $, we can make a more general helper based on a monodic function per column.

Q(kdb+データベース)
generalHelper:{[t;d] ![t;();0b;key[d]!{(x;y)}'[value d;key d]]}

castRules:`longCol`symbolCol`dateCol`timeCol!({neg "j"$ x};{`$upper x};"D"$;"T"$)

generalHelper[;castRules] .j.k .j.j tab
longCol floatCol symbolCol stringCol dateCol    timeCol    
------------------------------------------------------------
-1      4        B         "bb"      2018.11.23 00:01:00.000
-2      5        H         "dd"      2018.11.23 00:01:00.003

Field-based JSON encoding

One common use of JSON is objects (key/value pairs) that parse in kdb+ as dictionaries. These are useful for storing sparse datasets in which it does not make sense to have each key as a new column.

Q(kdb+データベース)
q)\c 25 200
q)read0 `:sample.json
"{\"data\":\"26cd02c57f9db87b1df9f2e7bb20cc7b\",\"expiry\":1527796725,\"requestID\":[\"b4a566eb-2529-5cf4-1327-857e3d73653e\"]}\"
"{"result":"success","message":"success","receipt":[123154,4646646],"requestID":["b4a566eb-2529-5cf4-1327-857e3d73653e"]}"
"{"receipt":[12345678,98751466],"requestID":["b4a566eb-2529-5cf4-1327-857e3d73653e"]}"
"{"data":"26cd02c57f9db87b1df9f2e7bb20cc7b","requestID":["b4a566eb-2529-5cf4-1327-857e3d73653e"]}"
"{"receipt":[12345678,98751466],"requestID":["b4a566eb-2529-5cf4-1327-857e3d73653e"]}"
"{"listSize":2,"list":"lzplogjxokyetaeflilquziatzpjagsginnajfpbkomfancdmhmumxhazblddhcc"}"
"{"requestID":["b4a566eb-2529-5cf4-1327-857e3d73653e"]}"

A potential way to manage these items may be to create a utility to cast dictionaries using keys to control casting rules, thus allowing more complex parsing rules for each field.

Q(kdb+データベース)
//Converts JSON to q with rules per key
decode:{[j]k:.j.k j;(key k)!j2k[key k]@'value k}

//Converts q to JSON with rules per key
encode:{[k].j.j (key k)!k2j[key k]@'value k}

//Rules for JSON to q conversion
j2k:(enlist `)!enlist (::);

j2k[`expiry]:{0D00:00:01*`long$x};
j2k[`result]:`$;
j2k[`receipt]:`long$;
j2k[`id]:{"G"$first x};
j2k[`listSize]:`long$;
j2k[`data]:cut[32];
j2k[`blockCount]:`long$;
j2k[`blocks]:raze;

//Rules for q to JSON conversion
k2j:(enlist `)!enlist (::);

k2j[`expiry]:{`long$%[x;0D00:00:01]};
k2j[`result]:(::);
k2j[`receipt]:(::);
k2j[`id]:enlist;
k2j[`listSize]:(::);
k2j[`data]:raze;
k2j[`blocks]:(::);


//Using default .j.k our structures are not transferred as we wish
{show .j.k x} each read0 `:sample.json;
data     | "26cd02c57f9db87b1df9f2e7bb20cc7b"
expiry   | 1.527797e+009
requestID| ,"b4a566eb-2529-5cf4-1327-857e3d73653e"
result   | "success"
message  | "success"
receipt  | 123154 4646646f
requestID| ,"b4a566eb-2529-5cf4-1327-857e3d73653e"
receipt  | 1.234568e+007 9.875147e+007
requestID| ,"b4a566eb-2529-5cf4-1327-857e3d73653e"
data     | "26cd02c57f9db87b1df9f2e7bb20cc7b"
requestID| ,"b4a566eb-2529-5cf4-1327-857e3d73653e"
receipt  | 1.234568e+007 9.875147e+007
requestID| ,"b4a566eb-2529-5cf4-1327-857e3d73653e"
listSize| 2f
list    | "lzplogjxokyetaeflilquziatzpjagsginnajfpbkomfancdmhmumxhazblddhcc"
requestID| "b4a566eb-2529-5cf4-1327-857e3d73653e"


//Using decode utility captures complex structures
{show decode x} each read0 `:sample.json;
data     | ,"26cd02c57f9db87b1df9f2e7bb20cc7b"
expiry   | 17682D19:58:45.000000000
requestID| ,"b4a566eb-2529-5cf4-1327-857e3d73653e"
result   | `success
message  | "success"
receipt  | 123154 4646646
requestID| ,"b4a566eb-2529-5cf4-1327-857e3d73653e"
receipt  | 12345678 98751466
requestID| ,"b4a566eb-2529-5cf4-1327-857e3d73653e"
data     | "26cd02c57f9db87b1df9f2e7bb20cc7b"   
requestID| "b4a566eb-2529-5cf4-1327-857e3d73653e"
receipt  | 12345678 98751466
requestID| ,"b4a566eb-2529-5cf4-1327-857e3d73653e"
listSize| 2
list    | "lzplogjxokyetaeflilquziatzpjagsginnajfpbkomfancdmhmumxhazblddhcc"
requestID| "b4a566eb-2529-5cf4-1327-857e3d73653e"


//The encode utility allows us to round trip
{sample~{encode decode x} each sample:read0 x}`:sample.json

1b

Querying unstructured data

It has become much easier to manage unstructured data via Anymap. However, some considerations must be made.

Q(kdb+データベース)
sample:([] data:decode each read0 `:sample.json)
sample
data                                                                                                                         
-----------------------------------------------------------------------------------------------------------------------------
`data`expiry`requestID!(,"26cd02c57f9db87b1df9f2e7bb20cc7b";17682D19:58:45.000000000;,"b4a566eb-2529-5cf4-1327-857e3d73653e")
`result`message`receipt`requestID!(`success;"success";123154 4646646;,"b4a566eb-2529-5cf4-1327-857e3d73653e")               
`receipt`requestID!(12345678 98751466;,"b4a566eb-2529-5cf4-1327-857e3d73653e")                                              
`data`requestID!(,"26cd02c57f9db87b1df9f2e7bb20cc7b";,"b4a566eb-2529-5cf4-1327-857e3d73653e")                               
`receipt`requestID!(12345678 98751466;,"b4a566eb-2529-5cf4-1327-857e3d73653e")                                              
`listSize`list!(2;"lzplogjxokyetaeflilquziatzpjagsginnajfpbkomfancdmhmumxhazblddhcc")                                       
(,`requestID)!,,"b4a566eb-2529-5cf4-1327-857e3d73653e"

Indexing at depth allows the sparse data within the dictionaries to be queried easily.

Q(kdb+データベース)
select data[;`requestID] from sample
x                                     
---------------------------------------
,"b4a566eb-2529-5cf4-1327-857e3d73653e"
,"b4a566eb-2529-5cf4-1327-857e3d73653e"
,"b4a566eb-2529-5cf4-1327-857e3d73653e"
,"b4a566eb-2529-5cf4-1327-857e3d73653e"
,"b4a566eb-2529-5cf4-1327-857e3d73653e"
0N                                    
,"b4a566eb-2529-5cf4-1327-857e3d73653e"

Notice that when a key is missing from a dictionary, kdb+ returns a null value. The type of this null is determined by the first key type within the dictionary.

Q(kdb+データベース)
//Many different nulls are returned
select data[;`expiry] from sample
x                      
------------------------
17682D19:58:45.000000000
`                      
`long$()               
,""                    
`long$()               
0N                     
,""

//Succeeds on first 2 rows as by chance only null returned in a atom null
select from (2#sample) where null data[;`expiry]

//Fails once moving to 3 rows as there is an empty list null
select from (3#sample) where null data[;`expiry]

data                                                                                                         
-------------------------------------------------------------------------------------------------------------
`result`message`receipt`requestID!(`success;"success";123154 4646646;,"b4a566eb-2529-5cf4-1327-857e3d73653e")


evaluation error:

type

  [0]  select from (3#sample) where null data[;`expiry]
       ^

Checking if a given key is in the dictionary will only return rows that do not have the key.

Q(kdb+データベース)
select from sample where `expiry in/:key each data, not null data[;`expiry]
data                                                                                                                        
-----------------------------------------------------------------------------------------------------------------------------
`data`expiry`requestID!(,"26cd02c57f9db87b1df9f2e7bb20cc7b";17682D19:58:45.000000000;,"b4a566eb-2529-5cf4-1327-857e3d73653e")

However, if we prepend each dictionary with the null symbol key “and generic null value(::)), we can query more freely.

Q(kdb+データベース)
updatedata:(enlist[`]!enlist(::))(,)/:datafrom`sample;
sample

data
---------------------------------------------------------------------------------------------------------------------------------
``data`expiry`requestID!(::;,"26cd02c57f9db87b1df9f2e7bb20cc7b";17682D19:58:45.000000000;,"b4a566eb-2529-5cf4-1327-857e3d73653e")
``result`message`receipt`requestID!(::;`success;"success";1231544646646;,"b4a566eb-2529-5cf4-1327-857e3d73653e")
``receipt`requestID!(::;1234567898751466;,"b4a566eb-2529-5cf4-1327-857e3d73653e")
``data`requestID!(::;,"26cd02c57f9db87b1df9f2e7bb20cc7b";,"b4a566eb-2529-5cf4-1327-857e3d73653e")
``receipt`requestID!(::;1234567898751466;,"b4a566eb-2529-5cf4-1327-857e3d73653e")
``listSize`list!(::;2;"lzplogjxokyetaeflilquziatzpjagsginnajfpbkomfancdmhmumxhazblddhcc")
``requestID!(::;,"b4a566eb-2529-5cf4-1327-857e3d73653e")

As you can see, nulls with missing keys are now represented as (::), meaning the previously failed query can now be executed.

Q(kdb+データベース)
select expiry:data[;`expiry] from sample
expiry                 
------------------------
17682D19:58:45.000000000
::                      
::                     
::                     
::                     
::                     
::

select from sample where not null data[;`expiry]
data                                                                                                                            
---------------------------------------------------------------------------------------------------------------------------------
``data`expiry`requestID!(::;,"26cd02c57f9db87b1df9f2e7bb20cc7b";17682D19:58:45.000000000;,"b4a566eb-2529-5cf4-1327-857e3d73653e")

We can also easily replace (::) values, in this case with an infinity value.

Q(kdb+データベース)
fill:{@[y;where null y;:;x]}
select expiry:fill[0Wn]data[;`expiry] from sample
expiry                 
------------------------
17682D19:58:45.000000000
0W                     
0W                     
0W                     
0W                     
0W                     
0W

To learn more about the commands featured in this blog, visit our comprehensive documentation on code.kx. You can also begin your journey with kdb+ by downloading our personal edition, or via one of our many courses on the KX Academy.

Customer Stories

Discover richer, actionable insights for faster, better informed decision making

Capital Markets

As a customer of KX for 10+ years, they knew they could rely on KX’s team and its real-time database to easily migrate into the cloud.

詳細を読む 概要 Japanese Bank


AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

当社のチームが以下の実現をサポートします:

  • ストリーミング、リアルタイム、および過去データに最適化された設計
  • エンタープライズ向けのスケーラビリティ、耐障害性、統合性、そして高度な分析機能
  • 幅広い開発言語との統合に対応する充実したツール群

専門担当者によるデモをリクエスト

*」は必須フィールドを示します

本フォームを送信いただくと、KXの製品・サービス、お知らせ、イベントに関する営業・マーケティング情報をお受け取りいただけます。プライバシーポリシーからお手続きいただくことで購読解除も可能です。当社の個人情報の収集・使用に関する詳しい情報については、プライバシーポリシーをご覧ください。

このフィールドは入力チェック用です。変更しないでください。

タイムシリーズ分野におけるG2認定リーダー