By: Nicolle Eagan
At KX25, the international kdb+ user group conference held on May 18th, I made a presentation that demonstrated how to use a JupyterQ notebook to perform analytics on solar storm data.
My experience working with solar storm data began last year when I was a visiting data scientist at the NASA Frontier Development Lab (FDL) in Mountain View, California. The FDL is hosted by the Search for Extraterrestrial Intelligence Institute (SETI): a not-for-profit research organization founded in 1984, whose purpose is to drive research in the space sciences. Within SETI, FDL is an applied artificial intelligence research accelerator established to maximize new AI technologies and apply them to challenges in the space sciences.
I worked with the solar storm team at FDL on the development of a tool called FlareNet. FlareNet is a deep learning framework that uses a neural network to predict the strength of a potential flare based on the current state of the sun. These solar storm flares can cause technology such as cell phones and GPS to become useless, with dire consequences. According to scientists at FDL kdb+ was essential for the analysis of data and the training of the FlareNet tool, particularly because of its speed and accuracy with time-series data.
At KX25, I used JupyterQ, KX’s newly productionized Jupyter notebook kernel for kdb+, to show different types of analytics on solar storm data. For the demo I only needed Anaconda, q, and embedPy. In addition, I decided to do the work on the Google Cloud, using eight Intel Skylake Cores and a standard remote spinning disk. All of this was easy to install in minutes thanks to the addition of kdb+ to the Google Cloud Launcher and to Anaconda’s distribution platform. Within minutes, I had a notebook up and running, ready to query my data.
Using the power of q to query and aggregate the data in milliseconds, and the many plotting libraries in Python, I was able to build my database and gain many interesting insights into the data. The data was provided by the United States Geological Survey, where 14 data centers in the US record solar readings from four different orientations for every minute of every day. This may not sound like a lot, but over many years it certainly adds up!
Once I created my database in an optimal form for q, I utilized several Python visualization libraries to find interesting features of the data. For instance, using aggregate queries in q and the Basemap library in Python, I discovered that data centers further away from the equator produced stronger solar readings than those close to the equator – quite an interesting find.
My notebook, which includes many varied examples of my analysis on the data, can be found on my GitHub page. You can also watch my presentation here. Feel free to contact me directly with any questions or comments on GitHub.