KxCon2016 Puzzle Challenge

25 May 2016 | , , ,
Share on:

By Nick Psaris. Inspired by Andrew Ng’s Machine Learning Coursera Class.

KxCon2016 was a success, especially for the brave programmers who took on the KxCon2016 programming challenge. Try your hand and we will post the solutions next week.

The KxCon2016 programming challenge was chosen because it can be quickly implemented inefficiently and then considerably optimized. This is not a toy problem – the resulting function is used to load datasets for machine learning. Finally, to make the problem more interesting, an existing q operator has been extended (“reshape extended to >2 dimensions…”) in kdb+ 3.4t that can make your solution even shorter.



A popular application of machine learning is character recognition. If we assume a handwritten digit can be digitized into a vector of pixels, logistic regression (among many other techniques) can be used to assign a weight to each pixel. These learned weights can then be combined with a new image to make a prediction of which digit it represents.

The MNIST database holds a collection of handwritten digits that have been normalized for use in testing machine learning and pattern recognition techniques.


Figure 1: The first image in the MNIST training file representing the number 5.

To process these images of handwritten digits, we must load the data from files stored in the custom MNIST binary format. Your challenge is to write a function to read this data and return the resulting n-dimensional array. Lucky for you, this format has been well documented on the MNIST site.

The site specifies the exact dimension and numerical type of each dataset. This would allow you to write a custom loader for each file. The file format, however, is self-describing. You are required, therefore, to write a general loader that works with datasets of all dimensions and types. While you are waiting for the dataset to download, you can begin testing your implementation against the unit tests below.



Your function will be applied to the MNIST training dataset. To make the function more flexible, its should accept a byte-vector instead of a file name. The function can then be applied to unit tests to confirm proper behavior. To be accepted, your function named ldidx should produce the following results (signed and unsigned bytes should both be returned as type “x”). NOTE: ignore any extra trailing bytes.

Figure 2: The last image in the MNIST training file representing the number 8.

q)ldidx 0x0000080100000000
q)ldidx 0x000008010000000100
,0 x00
q)0N!ldidx 0x0000080200000002000000020001020304;
q)0N!ldidx 0x00000803000000020000000200000002000102030405060708;
q)ldidx 0x00000b010000000200010002
1 2h
q)ldidx 0x00000c01000000020000000100000002
1 2i
q)ldidx 0x00000d01000000023f80000040000000
1 2e
q)ldidx 0x00000e01000000023ff00000000000004000000000000000
1 2f
q)md5 raze over string X:ldidx b:read1 ‘$”train-images-idx3-ubyte”


Email your function to as soon as it produces valid results. Email it again when you’ve optimized the code. No external user-defined functions or data structures can be used. Only the first and last submission by an individual will be accepted for the competition. All submissions must be made prior to 00:00 EST on 22 May 2016. The 32 bit free version of q available on 20 May 2016 will be used to test each submission.


One point will be awarded for each of the following categories.

  1. Fastest valid submission measured in milliseconds elapsed – q)t:10 ldidx b
  2. Smallest valid submission measured in allocated bytes – q)ts ldidx b
  3. Shortest valid submission measured in bytes – q)count first get ldidx

In case of a tie, the submitter who provided the first valid submission (irrespective of performance) will win.

UPDATE: The solution is here.

© 2018 Kx Systems
Kx® and kdb+ are registered trademarks of Kx Systems, Inc., a subsidiary of First Derivatives plc.


Head of Products, Solutions and Innovation at Kx on Product Design and the Vision for the Future

16 Mar 2018 | , , ,

As the SVP of Products, Solutions and Innovation at Kx Systems, James Corcoran is part of a new chapter in software development at Kx. Since joining Kx parent First Derivatives as a financial engineer in 2009, James has worked around the world building enterprise systems at top global investment banks before moving to the Kx product team in London. James sat down with us recently to discuss his perspective on product design and our technology strategy for the future.

Kdb+ Utilities: Essential utility for identifying performance problems

28 Feb 2018 | ,

If you are a kdb+/q developer, you will find the utilities created by Kx Managing Director and Senior Solution Architect Leslie Goldsmith to be a valuable resource. The “Kdb+ Utilities” series of blog posts gives a quick introduction to the utilities, available at Leslie Goldsmith’s GitHub. In this third part of the series we look at Leslie’s qprof, which allows a programmer to drill down into q functions or applications to inspect performance and CPU usage in a fine-grained fashion.

kdb+ utility to search codebase

Kdb+ Utilities: Q code Workspace Utilities

6 Feb 2018 | , ,

If you are a kdb+/q developer, you will find the workspace utilities created by Kx Managing Director and Senior Solution Architect Leslie Goldsmith to be a valuable resource. This is the first in a series of blog posts that give a quick introduction to several utilities available at Leslie Goldsmith’s GitHub. In this part of the series we look at an essential tool which contains routines for summarizing and searching the contents of a workspace, ws.