5041

KxCon2016 Puzzle Challenge

25 May 2016 | , , ,
Share on:

By Nick Psaris. Inspired by Andrew Ng’s Machine Learning Coursera Class.

KxCon2016 was a success, especially for the brave programmers who took on the KxCon2016 programming challenge. Try your hand and we will post the solutions next week.

The KxCon2016 programming challenge was chosen because it can be quickly implemented inefficiently and then considerably optimized. This is not a toy problem – the resulting function is used to load datasets for machine learning. Finally, to make the problem more interesting, an existing q operator has been extended (“reshape extended to >2 dimensions…”) in kdb+ 3.4t that can make your solution even shorter.

MACHINE LEARNING

Background

A popular application of machine learning is character recognition. If we assume a handwritten digit can be digitized into a vector of pixels, logistic regression (among many other techniques) can be used to assign a weight to each pixel. These learned weights can then be combined with a new image to make a prediction of which digit it represents.

The MNIST database holds a collection of handwritten digits that have been normalized for use in testing machine learning and pattern recognition techniques.

Challenge

5
Figure 1: The first image in the MNIST training file representing the number 5.

To process these images of handwritten digits, we must load the data from files stored in the custom MNIST binary format. Your challenge is to write a function to read this data and return the resulting n-dimensional array. Lucky for you, this format has been well documented on the MNIST site.

The site specifies the exact dimension and numerical type of each dataset. This would allow you to write a custom loader for each file. The file format, however, is self-describing. You are required, therefore, to write a general loader that works with datasets of all dimensions and types. While you are waiting for the dataset to download, you can begin testing your implementation against the unit tests below.

RULES

Interface

Your function will be applied to the MNIST training dataset. To make the function more flexible, its should accept a byte-vector instead of a file name. The function can then be applied to unit tests to confirm proper behavior. To be accepted, your function named ldidx should produce the following results (signed and unsigned bytes should both be returned as type “x”). NOTE: ignore any extra trailing bytes.

8
Figure 2: The last image in the MNIST training file representing the number 8.

q)ldidx 0x0000080100000000
byte$()
q)ldidx 0x000008010000000100
,0 x00
q)0N!ldidx 0x0000080200000002000000020001020304;
(0x0001;0x0203)
q)0N!ldidx 0x00000803000000020000000200000002000102030405060708;
((0x0001;0x0203);(0x0405;0x0607))
q)ldidx 0x00000b010000000200010002
1 2h
q)ldidx 0x00000c01000000020000000100000002
1 2i
q)ldidx 0x00000d01000000023f80000040000000
1 2e
q)ldidx 0x00000e01000000023ff00000000000004000000000000000
1 2f
q)md5 raze over string X:ldidx b:read1 ‘$”train-images-idx3-ubyte”
0x6a5cde79f049959f93df34292c599c1b

Submission

Email your function to as soon as it produces valid results. Email it again when you’ve optimized the code. No external user-defined functions or data structures can be used. Only the first and last submission by an individual will be accepted for the competition. All submissions must be made prior to 00:00 EST on 22 May 2016. The 32 bit free version of q available on 20 May 2016 will be used to test each submission.

Scoring

One point will be awarded for each of the following categories.

  1. Fastest valid submission measured in milliseconds elapsed – q)t:10 ldidx b
  2. Smallest valid submission measured in allocated bytes – q)ts ldidx b
  3. Shortest valid submission measured in bytes – q)count first get ldidx

In case of a tie, the submitter who provided the first valid submission (irrespective of performance) will win.

UPDATE: The solution is here.

© 2017 Kx Systems
Kx® and kdb+ are registered trademarks of Kx Systems, Inc., a subsidiary of First Derivatives plc.

SUGGESTED ARTICLES

Kx collaborating with Fintech startup chartiq

Collaboration: The Dominant Trend in Finance

13 Dec 2017 | , , , ,

In December we are re-blogging some of our favorite content from Kx partners and affiliated companies, starting with this article on the ChartIQ blog. ChartIQ is an agile FinTech company that sells an advanced HTML5 charting library used in technical data analysis, trading configurations and for charting in the capital markets industry. Kx offers a ChartIQ integration as an addition to our Dashboards. In Collaboration: The Dominant Trend in Finance, ChartIQ’s Hanni Chehak writes about the rise of FinTech companies, and the role collaboration plays as FinTech companies are increasingly disrupting the traditional banking sector.

Water system workers with kdb+ historical database

Kdb+ Use Case: Machine Learning Water System Maintenance Application

6 Dec 2017 | , , , ,

Kdb+ is being used much more widely in machine learning applications today. Its ability to quickly ingest and process data, particularly large, fragmented datasets, is one way that developers are adding kdb+ to their technology stack of artificial intelligence and machine learning tools.
For Australian kdb+ developer Sherief Khorshid, who also develops machine learning systems, incorporating kdb+ into a predictive maintenance application gave him the edge in a hackathon win that landed him a cash prize and a contract with the Water Corporation of Western Australia.

kdb+ FFI

Kdb+ FFI: Access external libraries more easily from q

22 Nov 2017 | , , ,

Following on from the hugely popular Python library and interface embedPy and PyQ, Kx has released an FFI as part of the Fusion for kdb+ interfaces. As with embedPy and PyQ, this FFI is open-sourced under the Apache 2 license.
The kdb+ FFI is a foreign function interface library for loading and calling dynamic libraries from q code. It has been adapted and expanded upon from a library originally written by Alex Belopolsky of Enlightenment Research. With the kdb+ FFI you can now call your favorite C/C++ libraries directly from q without the overhead of having to compile shared objects and load into q using the 2: command.