Software engineer and kdb+ programmer Juan Lasheras recently added a kdb+/q machine learning project to GitHub.
The aim of Juan’s ml.q repository is to act as a multi-purpose machine learning toolkit. It provides multiple useful methods that practitioners can use for data analysis and predictive modeling. It is comparable to the scikit-learn toolkit for Python. The project currently has the following three algorithms implemented:
K nearest neighbors: The user specifies a known point in a dataset and the algorithm will find other points closest to it.
K-means clustering: This breaks down a dataset into multiple partitions. This is particularly useful as the partitions can indicate some sort of relationship between data points.
Decision Tree (ID3): This scans a dataset and constructs a series of questions that can help identify future data points.
You can see Juan’s project here.
To learn more about the scikit-learn toolkit for Python see http://scikit-learn.org.