by James Corcoran
The data problem has not appeared overnight. It has been building for years, and not just in technology. When Thomas Jefferson was president, he received around 150 letters per month. One hundred years later, Theodore Roosevelt needed a dedicated staff of around 50 to handle the increased volume. By Harry Truman’s time, it was arriving at a rate of three truckloads per day. Current volumes are in the order of 65,000 letters (yes, still letters) per week – not to mention the emails, tweets, and social media posts that go along with them – and that’s the data volume challenge that everyone faces.
But as well as increasing in volume, data has become more valuable, and the decisions based on it are more critical – just think of backtesting & trading model calibration, predictive healthcare, anomaly detection, predictive maintenance & operational equipment efficiency, and pre-and post-trade analytics. And that’s before you consider what machine learning may reveal in terms of trends or patterns, which, depending on how quickly you find them, could either yield opportunity – or spell disaster.
But many companies in all industry sectors are not actually gaining those insights. They are too focused instead on solving technical problems around their data than uncovering valuable information with it. And the main reason why this is happening is that their time series database and real-time analytics software aren’t up to the demands being placed on them. Here are five must-haves for any modern real-time analytics engine:
- Optimized for time series data
Most data today is time series based, generated by processes and machines rather than humans. Any analytics database should be optimized for its specific characteristics like append-only, fast, and time-stamped. It should be able to quickly correlate diverse data sets (asof joins) and perform in-line calculations (vwaps, twaps), as well as execute fast reads and provide efficient storage. - Openness and Connectivity
The data landscape of most large, modern enterprises is broad. This means that any analytics engine has to interface with a wide variety of messaging protocols (eg: Kafka, MQ, Solace) and support a range of data formats (eg: CSV, JSON, FIX) along with IPC (Interprocess communication) and REST APIs for quick, easy connectivity to multiple sources. It should also cater to reference data, like sensor or bond IDs, that enable it to add context and meaning to streaming data sets, giving the ability to combine them in advanced analytics and share as actionable insights across the enterprise. - Real-time and Historical Data
By combining real-time data for immediacy with historical data for context, companies can make faster and better in-the-moment responses to events as they happen and eliminate the development and maintenance overhead of replicated queries and analytics on separate systems. This ability to rapidly process vast quantities of data using fewer computing resources is also well suited for machine learning initiatives, not to mention reducing TCO and helping businesses to hit sustainability targets. - Easy Adoption
Look for analytics software built with microservices that enable developers and data scientists to quickly ingest, transform and publish valuable insights on datasets without the need to develop complex access, tracking, and location mechanisms. Complications like data tiering, aging, archiving, and migration can take up valuable time and resources which could be better used to concentrate on extracting actionable insights. Natively integrated with major cloud vendors and available as a fully managed service should also be an important consideration for easy adoption. - Proven in production
While time series databases have been around for a long time, the ever-growing volume, velocity, and variety of data, and the need to generate rapid insights and actions from it, means that many technologies are not proven in the field. Look for software with robust use cases and clear examples of ROI.
Data has evolved. It is now an asset (it has C-Level owners), it automates decisions (in trading, production, and networks), it has value (you pay, handsomely, for it), it is temporal (today’s confidential information may be tomorrow’s public news), it is structured, it is unstructured, it is complicated. But, as everyone knows, there are many benefits that businesses can reap from continuous, context-rich data analytics-driven insights.
Real-time analytics using time series data can deliver better business decisions, enable enterprises to react faster to market changes, increase customer satisfaction, and improve their bottom line, providing they have the right technology in place.