White Paper on Detection of Exoplanets at NASA FDL with kdb+

13 Dec 2018 | , , , ,
Share on:

By Esperanza López Aguilera

As part of the Kx and NASA Frontier Development Lab (FDL) partnership I recently had the privilege to assist as a visiting data scientist at the FDL in Mountain View, California. This program brings commercial and private partners together with researchers to solve challenges in the space science community using new AI technologies.

NASA FDL 2018 focused on four areas of research – Space Resources, Exoplanets, Space Weather and Astrobiology – each with their own separate challenges. I was given the chance of being part of the Exoplanets team, which aimed to improve accuracy of finding new exoplanets using machine learning models.

Data for the TESS Exoplanets Mission is being gathered by the Transiting Exoplanet Survey Satellite (TESS), which was launched in April 2018, with the objective of discovering new exoplanets in orbit around the brightest stars in the solar neighborhood. The first public data from TESS was made available in September, after the close of the NASA FDL 2018 program, so this year we used simulated data for developing a research plan to use actual TESS data in 2019.

In its two-year mission, TESS is monitoring 26 sectors for 27 days each, covering 85% of the sky. TESS is spending the first year exploring the 13 sectors that cover the Southern Hemisphere, before rotating to explore the Northern Hemisphere during year two. A key element of FDL’s research are photos taken by TESS, at set frequencies, to create a Satellite Image Time Series (SITS) for each sector.

Once collected, SITS are passed through a highly complex data-processing pipeline, developed by NASA. During the first stage of the exoplanet study, calibration finds the optimal set of pixels representing each target star. Aggregate brightness is extracted from the sequence and the pixels associated with each star, to create a light curve (time series) for each target star. The raw light curve is processed to remove: noise, trends and other factors introduced by the satellite itself. The final result is the corrected flux from the star, referred to from now on as a light curve.

Studying light curves is essential  for research that attempts to detect exoplanets. Variations in the brightness of the target stars may indicate the presence of a transiting planet. Preprocessing pipeline searches for signals consistent with transiting planets are used in order to identify planet candidates or Threshold Crossing Events (TCEs). However, the list of TCEs will likely contain a large number of false positives, caused by eclipsing binary systems, background eclipsing binaries or simple noise.

At this stage, machine learning (ML) comes into play. We proposed training a Bayesian Neural Network to try and classify the extracted TCEs as real planets or false positives. For this purpose we took advantage of the strength of kdb+/q to deal with time series, process data and perform analytics while embedPy was used to import the necessary Python ML libraries.

The full paper and JupyterQ notebooks explaining the data origins, data-preprocessing, feature engineering and the ML models used can be found here on the Kx Developer’s site.

Additional information about Kx at NASA FDL is below:

The Exploration of Space Weather at NASA FDL with kdb+

Case study: Kdb+ Used at NASA Frontier Development Lab in Predictive AI tool

The Exploration of Solar Storm Data Using JupyterQ

VIDEO: The Exploration of Solar Storms at NASA FDL

Esperanza and Kx gratefully acknowledge the Exoplanet team at FDL, Chedy Raissi, Jeff Smith, Megan Ansdell, Yani Ioannou, Hugh Osborn and Michele Sasdelli, for their contributions and support.

SUGGESTED ARTICLES

ML and kdb+

Machine Learning Toolkit Update: Cross-Validation and ML Workflow in kdb+

23 Jul 2019 | , ,

The Kx machine learning team has an ongoing project of periodically releasing useful machine learning libraries and notebooks for kdb+. This release relates to the areas of cross-validation and standardized code distribution procedures for incorporating both Python and q distribution. Such procedures are used in feature creation through the FRESH algorithm and cross-validation within kdb+/q.