Decision Trees in kdb+

5 Jul 2018 | , , , ,

The open source notebook outlined in this blog, describes the use of a common machine learning technique called decision trees. We focus here on a decision tree which provides an ability to classify if a cancerous tumor is malignant or benign. The notebook shows the use of both q and Python to leverage the areas where they respectively provide advantages in data manipulation and visualization.

Classification using K-Nearest Neighbors in kdb+

21 Jun 2018 | , , , , , ,

As part of Kx25, the international kdb+ user conference held May 18th in New York City, a series of seven JuypterQ notebooks were released and are now available on https://code.kx.com/q/ml/. Each notebook demonstrates how to implement a different machine learning technique in kdb+, primarily using embedPy, to solve all kinds of machine learning problems, from feature extraction to fitting and testing a model.

Dimensionality Reduction in kdb+

14 Jun 2018 | , , , , ,

Dimensionality reduction methods have been the focus of much interest within the statistics and machine learning communities for a range of applications. These techniques have a long history as being methods for data pre-processing. Dimensionality reduction is the mapping of data to a lower dimensional space such that uninformative variance in the data is discarded. By doing this we hope to retain only that data that is meaningful to our machine learning problem. In addition, by finding a lower-dimensional representation of a dataset we hope we can improve the efficiency and accuracy of machine learning models.