ML and credit card fraud

Machine Learning and AI: The Future Possibilities in Business

20 Jun 2018 | , , ,
Share on:

At Kx25, the international kdb+ user conference held in New York City on May 18th, Alan Rozet, from Kx partner, gave a presentation (available on the Kx Youtube channel here) about how machine learning (ML) is being used in business today. Alan is a Principal Machine Learning Engineer at Capital One with wide ranging experience in ML including in the financial, health care and consumer sectors.

Brainpool is an academic collective of over 250 experienced data scientists with doctorates and masters degrees from top universities around the world whose goal is to bridge the gap between academia and industry. They are partnering with Kx on a number of ML consulting projects and are involved in creating internal machine learning training material.

The need for ML talent by businesses is far outstripping the rate at which trained specialists are entering the workforce. In his talk, Alan cites figures from the McKinsey Global Institute, which estimates that there will be a 50% to 60% gap between the number of trained data scientists and the requisite demand for data scientists in 2018. At the same time, the Artificial Intelligence market is estimated to grow from $640M in 2016 to $37B in 2025, at a compound annual growth rate of 50%.

In his presentation, Alan presented several real world business uses for ML, including for credit card spend forecasting. In this case, ML was used to manage risk and prevent credit card fraud. To begin, the developers of the application asked themselves, ‘Given a customer with an historical transaction record, how does one predict their aggregate spend month to month?’

Assuming that a single customer might have 50 to 100 transactions per month, aggregated across millions, or tens of millions of customers, and across entire organizations, this quickly became a Big Data problem.  A large part of the project focused on how to take advantage of technologies for fast batch processing and other back-end resources for model training that allowed the data scientists to use many instances, or servers, at once for faster training. This supervised learning case predicted aggregate monthly spend per customer.

Alan also looked at the same problem in a slightly different way, in terms of anomalous credit card spend detection. He pointed out that anomaly detection is very much a problem where you need a close interplay between humans and the machine learning model. The data scientist needs to consider what kinds of signaling is needed in anticipation of an anomalous event, and once one has happened, who should be alerted, and how might that decision change depending on the consumer segment.

For example, if a new consumer item that is enormously popular, like a ‘Tickle Me Elmo’ toy comes out and suddenly tons of consumers are purchasing it, there will be widespread new random spends of $100 — is that truly an anomaly that should generate an alert? Versus a small business customer who typical spends $10K a month, and suddenly has charges that jump to $100K or $1M. What should that alerting look like? Which department should be notified and how should the credit card company let the customer know?

In addition to demonstrating how forecasting and anomaly detection can be tied to the proactive case in credit card fraud detection with ML, Alan described a number of other ML use cases in other industries in his talk. For further insights into how businesses are using ML watch Alan’s full Kx25 presentation here on the Kx Youtube channel.

For more information about ML at Kx, please write to us at


Random forest and kdb+

Random Forests in kdb+

12 Jul 2018 | , , , , ,

The Random Forest algorithm is an ensemble method commonly used for both classification and regression problems that combines multiple decision trees and outputs and average prediction. It can be considered to be a collection of decision trees (forest) so it offers the same advantages as an individual tree: it can manage a mix of continuous, discrete and categorical variables; it does not require either data normalization or pre-processing; it is not complicated to interpret; and it automatically performs feature selection and detects interactions between variables. In addition to these, random forests solve some of the issues presented by decision trees: reduce variance and overfitting and provide more accurate and stable predictions. This is all achieved by making use of two different techniques: bagging (or bootstrap aggregation) and boosting.

Kx and NASA FDL: Space Weather, GNSS and Exoplanets

10 Jul 2018 | , ,

By Robert Hill Kx is delighted to once more be partnering with the NASA Frontier Development Laboratory (NASA FDL) team on two exciting challenges facing the space sector. This follows from last year’s successful solar activity detection work, which resulted in the ‘FlareNet’ tool (supported by Kx and Lockheed Martin) that demonstrated the potential for […]

Kx Insights: Machine learning subject matter experts in semiconductor manufacturing

9 Jul 2018 | , ,

Subject matter experts are needed for ML projects since generalist data scientists cannot be expected to be fully conversant with the context, details, and specifics of problems across all industries. The challenges are often domain-specific and require considerable industry background to fully contextualize and address. For that reason, successful projects are typically those that adopt a teamwork approach bringing together the strengths of data scientists and subject matter experts. Where data scientists bring generic analytics and coding capabilities, Subject matter experts provide specialized insights in three crucial areas: identifying the right problem, using the right data, and getting the right answers.