Glenn Wright, Solutions Architect at KX, took part in a panel discussion on “Research in Real Time”, at Global STAC live on May 18th along with Jeffrey Ryan, owner at QuantAtLarge, and David Cohen, Senior Principal Engineer at Intel.
As a prelude to the discussion, Glenn was asked about client needs and expectations in this area. He outlined an increased appetite for performing real-time continuous analytics driven by streaming data but incorporating historical data too for context. Moreover, in applying it in areas where milliseconds matter, they want results in low-latency response times, whether executed on-prem or in the cloud. Those requirements were among the drivers for the recent release of KX Insights as a cloud-first solution offering REST to access both analytics and object storage of the data they are based on.
The floor then moved to Jeffrey who asked about ways technology can now accommodate streaming and historical data simultaneously. David noted that they had traditionally been considered separately and processed in different systems but what brought them together was a recognition of 3 classes of time – real-time, historical and an intermediate status of “past but often accessed” data which technology now supports across different storage media. The secret is to maintain an index that spans those three dimensions in order to facilitate access to their respective storage locations – something, he remarked, that KX does very well.
Glenn supported the point saying the goal has been to bring historical data close to real-time data but noted that architectural decisions in doing so can have consequences that may introduce further implementation considerations. As an example, if choosing horizontal scaling it becomes necessary to distribute the feed handler layer too, with implications for subsequently aggregating results coming from different sources.
Another point to consider was the structure of queries and analytics and the implication of running them on real-time and/or historical data sets. Some workloads need guaranteed response times whereas others can accept delays – pricing for trade execution versus pricing in backtesting is a prime example. In such cases, consideration has to be given to options like isolation so that it does not impinge on live performance, or possibly increasing the I/O configuration to support the additional overhead. Another option, of course, is to configure for burst processing in the cloud. David mentioned an additional option, the approach offered by Optane, of expanding the buffer between real-time and historical data and preloading as much of the data as required into readily accessible memory.
Across them all advances in cache management and synchronisation ensure that no data is lost. The problem, therefore, becomes as much a business one as a technical one in deciding the cost-benefit of achieving quick access versus the trade-off of cheaper storage of older data.
The discussion closed with an interesting consideration on how research in real-time could translate into discretionary as opposed to formulaic trading. All parties agreed that, given the memory and storage capacity now available for generating the additional insights, the problem may be resolved in the trading domain itself in an area that its practitioners already excel – in managing risk in terms of confidence intervals and in forming execution strategies on that basis. It was a consideration that nicely closed an interesting discussion