Visualizing really big data, or ‘where Rumsfeldism & Kx collide’

7 Dec 2014 | , , ,
Share on:

q gems


Technologist Peter Simpson, who is responsible for visual data analytics at Datawatch, has contributed a blog post to kxcommunity.com about the role kdb+ plays in his work. Read his post here, and if you are in the NYC area, stop by the next Kx Community NYC Meetup to hear him talk this Tuesday after work.

When I first think of kx, I think of tick data, central market data teams and trading use cases.
Under the covers I see that the data commonly analyzed is sparse timeseries; pretty similar to sensor data.

My recent project has been working with electricity smart meter data. And due to the volume and analytics I would like to perform, kx fits perfectly. As expected I’m looking at sparse time series, but unlike market / trading data, the time granularity is in minutes, rather than nano-seconds. As you get to monitoring usage load, you get more real time, but again not down to lower than seconds. On the other side, we do monitor and analyze a much larger collection of data series. Rather than a few thousand symbols, I’m looking at hundreds of thousands of individual meters.

Like financial data, I’m not just interested in the data itself, but instead combining various different datasets, from other time series such as weather data. To standing data for the geo location, residence type, heating & cooling characteristics, occupancy type, etc. And then looking for the absolute values, and more commonly the relative differences and trends across time.

So very quickly I’m back in the big data world, but with the requirement to perform both monitoring, and exploratory data analysis, which is where “Rumsfeldism” and kx need to combine.

The “known knowns” I can work with easily, it is the “known unknowns”, and “unknown unknowns” that require exploratory analysis. I’m not just looking for the top / bottom performers, but instead I’m looking for the pattern, the exceptions to the pattern and how they relate to their peers.

I use Datawatch to visually analyze datasets from kx because of the speed, power, and scalability it offers.
The volumes of data are of course gigantic, so I cannot simply pull them into memory on my laptop. And when I render the interactive dashboards to an iPad, I cannot send over the data, it would take far too long. Instead I need to keep the data in the database, and pull out only the data I need for display purposes.

For the “known knowns” I can use pre-defined dashboards and paths through the displays. Behind the scenes I can use parameterized selects, or parameterized pre-defined functions; or subscribe to live streaming data from kdb+tick.

For the rest, it is much harder to have pre-defined paths, as when I see the data pattern visually, I will want to drill down into that area of interest.

Consequently I use Datawatch to automatically write the q queries I need, based on what I do on screen. (e.g. aggregate, conflate, filter.) Both for geospatial, traditional BI stuff (slicing & dicing of dimensions& measures) and more heavily statistical stuff. This functionality came from multiple customer requests, who all wanted to perform more exploratory analysis of their huge transactional datasets.

Kx allows me to put the “interactive” into “interactive big data analysis”. As I screen my available data universe, I expect a near instantaneous response. I cannot wait for a Hadoop batch job to run. And I cannot use something like Cassandra, because I need to aggregate, conflate & bin on the fly. Additionally a key component of the data is it’s temporal aspect; again here kx shines in that I can standardize, conflate and fill time series, something I would struggle with in most other “big data” solutions. And of course, I can go from streaming data, to intra-day, conflated history, and long term historical storage with every underlying record.

Now our dynamic/automatic querying of kx, is evolutionary based on customer demand. The latest work has suggested that as we show more statistical displays (e.g. distribution curves), we need to be better at writing frequency distribution queries. I find it interesting that I’m rarely looking at all the underlying data; there is just far too much of it. Instead I need to view sampled , aggregated views of the data universe, and dynamically filter based on areas of interest within the visuals themselves or through separate screening criteria. I only drop down into the underlying data when I’ve visually identified an area of interest.

The surprising result is how different from traditional Business intelligence reports our output workbooks are. The purpose isn’t to tell you what you already know. (My biggest sales region is …..). Instead it is about gaining insight from the available data landscape, and that tends to result in highlighting the unusual against prior performance.

As we move out from just Capital Markets use cases, a whole new world of sensor data is appearing, and we’re trying to keep up with demand. The fun part is seeing the use cases in action, knowing that the data actually refers to something physical. Probably the most exciting area is location analytics, especially with logistics datasets, as here we combine the real time and historical power of kx, with geo-mapping, defining the exact location of a problem; how we got there, and what to expect next. I spent the last few days talking to suppliers of parking sensors, and how they can optimize parking revenue for a city; previously I’ve been looking at fracking, mining, trucking, and before that ATM machine transactions, so the combination of sensors with geolocation seems to be a common trend; whether logistics, energy, utilities or even with the case of ATM machines finance.

The Kx Community NYC Meetup Dec. 9 talk is at 6:30 at GRIND. Feel free to stop by.

© 2017 Kx Systems
Kx® and kdb+ are registered trademarks of Kx Systems, Inc., a subsidiary of First Derivatives plc.

SUGGESTED ARTICLES

kdb+ tick database disaster recovery

Kx Technical Whitepaper: Disaster recovery planning for kdb+ tick systems

2 Nov 2017 | , , , ,

This whitepaper discusses disaster recovery and failover concepts from the perspective of the gateway layer accessing a typical kdb+ tick system used in capital markets applications. The end goal of constructing this plan is to ensure high availability of the application via the gateway where possible, considering all conceivable failure scenarios and outlining any actions required to prevent data loss, minimize any downtime and keep the application accessible.

Collaboration with Kx and Thomson Reuters using kdb+

The Power of Collaboration: Kx and Thomson Reuters

2 May 2017 | , , , , , ,

What defines a great collaboration? Kx’s Chief Revenue Officer Pat Brazel writes in this blog about our current Kx and Thomson Reuters partnership. Over the past 12 months Kx and Thomson Reuters’ domain experts, and engineers have jumped into the Velocity Analytics 8 product development process with good faith and real enthusiasm. Out of that has come a product delivery, roadmap and sales strategy that both companies are proud of. To both firms credit, we seem to have avoided the dreaded “not-invented-here” syndrome. Both companies have brought the best of their technologies together to create a product which builds on, and exceeds, the foundation on which it is built.

Kx for IoT in Asia with kdb+

Kx and the Internet of Things Asia

21 Apr 2017 | , , , ,

Adoption of connected devices and Internet of Things data analysis has become a compelling business imperative for companies and countries around the world. In Asia, the IoT revolution has unique characteristics reflecting the infrastructure and politics of the region. The conference is fittingly held in Singapore, which is striving to become the world’s first Smart City.