by Andrew Wilson
In Kx v4.0 we introduced some additional interfaces as part of the Fusion initiative providing open integration to and from other technologies. Full details are available on www.code.kx.com along with implementation guides and examples. Later in this blog we will give an overview of the new functionality and what it enables but first, as Fusion is a critical component of the Kx streaming analytics platform, we would like to outline the motivation behind the initiative and why we will continue to support it.
Big Data, Big Technologies. Big Problems.
One of the first things that became apparent at the dawn of Big Data was that traditional technologies could not scale to the new volumes they faced – well certainly not easily or cost-effectively. As CPUs were becoming cheaper and faster, and vertical scalability seemed to offer a cost-effective solution, even that threshold was quickly passed and the alternative of horizontal scaling presented enormous challenges in terms of data consistency, resilience and availability. Another realization was that the variety of data had expanded to include everything from video and voice to networks and images, on top of which the real-time needs of data were becoming more demanding. So something new was needed, and the NoSQL era was quickly upon us.
Open Source and Specialization.
Two dominant themes prevailed in the new approach: open-source and specialization. They were jointly exemplified in the multitude of Apache initiatives that formed around Hadoop and the granularity of components within it like Flume, Pig and Avro for ingesting, parsing and serializing data. Add Sqoop, Hive and Yarn, and the list was still only beginning.
Open-source was (initially) attractive to developers and businesses alike as it seemed to offer the flexibility, immediacy and reuse of already written code, and all without incurring license costs. What wasn’t there to like? Well, the absence of support and maintenance for a start, as reliance on collective collaboration for timely remedies jarred with impending penalties for failing SLAs and production outages. Add to that the everchanging components and newly emerging alternatives that made it difficult to discern what was fledgling and what was mature.
Meanwhile, on the specialization side, the myriad components, the multiple interfaces to connect them, the many machines to run them and the diverse skills to manage them soon proved to be anything but easy and nothing but expensive. The reports of failed projects can be extreme; some cite less than 20 successful implementations globally while others like Gartner, who posited a failure rate of 85%, may be more conservative but were just as frightening. And those that succeeded would have won few prizes for the simplicity and elegance of their architectures.
Achieving Reuse and Best-of-Breed deployment
Not all was bad about open-source and specialization, however. In fact, their combination is the underpinning of how Kx has forged its position as the leading technology for streaming analytics in Big Data solutions. Open-source has delivered many robust technologies, like Kafka for messaging for instance, but more importantly, the philosophy that underlies it enables organizations to reuse existing technologies while adopting new ones and enables them to preserve their investment in skills and systems they already have. That was the motivation behind Fusion.
Using Fusion’s interfaces to languages like Python and R, for example, enables organizations to reuse their existing libraries in areas like machine learning and data science without necessarily having to replicate them in q. Similar interfaces enable Java, C/C#, and other languages to connect to kdb+ as well as creating Jupyter Notebooks for data scientists. Further interfaces facilitate the two-way communication of data between other systems for either storage (eg HDF5) or processing (eg GPUs). In short, the openness and interconnectivity of Fusion saves development time and makes implementation easier.
The value of specialization, on the other hand, is being able to focus on specific needs and target domains. Kdb+, for example, was designed explicitly for time-series data and as a result it was able to make assumptions and design decisions that optimized its performance in a way that traditional or multi-purpose databases could not match. That benefit of specialization has prevailed in many other areas too with the upshot that, in some cases, particular technologies provided best-of-breed functionality that became more appropriate to adopt than to replicate. This was particularly true in areas like machine learning and NLP where comprehensive and robust functionality is available.
Kx, therefore, focuses on what it provides as best-of-breed – a low-footprint, integrated, streaming analytics platform for both processing and visualizing real-time and historical data – and enables customers to augment with their existing technologies of choice, be they open source or standard commercial offerings, that excel in their specific areas. To that end, Fusion describes Kx’s efforts to connect kdb+/q with other technologies to enable reuse and continued best-of-breed implementation. The following interfaces are included in the latest release:
New Interfaces within Fusion
Solace provide messaging and event management solutions to a range of industries, including financial services, manufacturing, utilities and telecommunications. The Solace PubSub+ Event Broker can be used to efficiently stream events and information across cloud, on-premises and IoT environments.
The Fusion Solace interface enables easy and efficient communication between kdb+ and Solace PubSub+. Kdb+ users can quickly connect to Solace PubSub+ event brokers and begin subscribing to topics, consuming events and publishing direct to Solace.
Prometheus is an open-source systems monitoring toolkit, which facilitates metric gathering, querying and alerting for a range of different technologies.
The Fusion Prometheus Exporter enables kdb+ processes to report important, user-defined metrics to Prometheus. Thus, kdb+ can be monitored in real-time, alongside other technologies as part of a wider application.
Message Queueing Telemetry Transport (MQTT) is a machine-to-machine/IOT connectivity protocol. It is designed to be lightweight, offering functionality for publish/subscribe messaging transport.
The Fusion MQTT interface allows users to connect, subscribe and publish to MQTT brokers. This enables kdb+ to operate effectively within the constrained environments and low bandwidth settings common in IoT, including on the edge.
Hierarchical Data Format 5 (HDF5) is a file format designed to store and organize large amounts of data. It is used extensively within the data-science community.
The Fusion HDF5 interface lets users efficiently convert data between HDF5 and kdb+ formats. As well as reading and writing data to HDF5 files, users can access and modify the structure and attributes of these files.
Fusion Interfaces are designed to be used and understood by non-kdb+ programmers, with comprehensive documentation and useful examples. All interfaces are open source, released under the Apache 2 license, and free for all use cases. They will be maintained and supported by Kx, on a best-efforts basis, at no cost to customers.