Machine Learning in kdb+: k-NN classification and pattern recognition

31 Aug 2017 | , , , , , ,
Share on:

By Emanuele Melis

As a powerful array-processing technology, kdb+ can be used with great effect in machine learning algorithms. This latest Kx whitepaper on  k-NearestNeighbor classification and pattern recognition in kdb+ uses a non-parametric statistical method commonly used for Pattern Recognition.

k-NN  assumes data points are in a metric space and are represented using n-dimensional vectors, out of which distance metrics can be extracted. This makes it one of the easiest Machine Learning algorithms to implement, but impractical to use in some industry settings due to the computational complexity and cost of: (1) distance metrics; (2) feature extraction; (3) classification.

The paper further examines the implementation strategies in kdb+, and the performance of a k-NN classifier used to predict digits in a dataset of handwritten samples normalized in arrays of 8 (x,y) coordinate pairs. The training set, loaded in kdb+ as “label-to-arrays of features” mappings, was represented as a table keyed on the label and the distance metric calculated applying distance functions on it. A validation set has been used to measure the prediction accuracy of the classifier, leveraging q-sql syntax.

Adopting kdb+ to implement a k-NN classifier introduced the benefits of using a high performance array processing language with an easy to read q-sql syntax, which allows a performant and elegant algorithm implementation without using external libraries.

The code used in this white paper is available on the Kx Github.

 

Emanuele Melis is an expert kdb+/q software engineer currently based in Glasgow, Scotland.

SUGGESTED ARTICLES

Classification using K-Nearest Neighbors in kdb+

21 Jun 2018 | , , , , , ,

As part of Kx25, the international kdb+ user conference held May 18th in New York City, a series of seven JuypterQ notebooks were released and are now available on https://code.kx.com/q/ml/. Each notebook demonstrates how to implement a different machine learning technique in kdb+, primarily using embedPy, to solve all kinds of machine learning problems, from feature extraction to fitting and testing a model.

ML and credit card fraud

Machine Learning and AI: The Future Possibilities in Business

20 Jun 2018 | , , ,

At Kx25, the international kdb+ user conference held in New York City on May 18th, Alan Rozet, from Kx partner Brainpool.ai, gave a presentation about how machine learning (ML) is being used in business today. Alan is a Principal Machine Learning Engineer at Capital One with wide ranging experience in ML including in the financial and health care and consumer sectors.

Machine Learning with kdb+ blog

Dimensionality Reduction in kdb+

14 Jun 2018 | , , , , ,

Dimensionality reduction methods have been the focus of much interest within the statistics and machine learning communities for a range of applications. These techniques have a long history as being methods for data pre-processing. Dimensionality reduction is the mapping of data to a lower dimensional space such that uninformative variance in the data is discarded. By doing this we hope to retain only that data that is meaningful to our machine learning problem. In addition, by finding a lower-dimensional representation of a dataset we hope we can improve the efficiency and accuracy of machine learning models.