Machine learning with kdb+ has been a theme of the KX blog over the past couple of months because of the release of a series of JupyterQ notebooks on the KX ML GitHub. As more different kinds of developers work with ML techniques, the uses for kdb+ in ML applications is growing. The release of embedPy, which loads Python into kdb+, so Python variables and objects become q variables and either language can act upon them, has been a catalyst for this trend. With embedPy, Python code and files can be embedded within q code, and Python functions can be called as q functions.
Building on these capabilities, the KX ML team has created a number of JupyterQ notebooks, and continues to develop more. Each notebook demonstrates how to implement different machine learning techniques in kdb+, primarily using embedPy, to solve all kinds of machine learning problems from feature extraction to fitting and testing a model. These notebooks act as a foundation to our users, allowing them to manipulate the code and get access to the exciting world of machine learning within KX.
Our current list of ML notebooks are described in the following KX blogs:
- Neural Networks in kdb+ by Esperanza López Aguilera
- Natural Language Processing in kdb+ by Fionnuala Carr
- Dimensionality Reduction in kdb+ by Conor McCarthy
- Classification Using k-Nearest Neighbors in kdb+ by Fionnuala Carr
- Feature Engineering in kdb+ by Fionnuala Carr
- Decision Trees in kdb+ by Conor McCarthy
- Random Forests in kdb+ by Esperanza López Aguilera
If you would like to further investigate the uses of embedPy and machine learning algorithms in KX, keep checking back to the ML notebooks on GitHub. You can use Anaconda to integrate into your Python installation to set up your machine learning environment, or you can build your own, which consists of downloading kdb+, embedPy and JupyterQ. You can find the installation steps on the ML section of the kdb+ Developers’ site.