Data Challenges in Industrial IoT – with Hugh Hyndman

A conversation with Hugh Hyndman, Director of Industrial IoT Solutions at KX

Hugh Hyndman is Director of Industrial IoT Solutions at KX. He has over 30 years’ experience working in the software industry, including engineering, marketing, sales, and management. Hugh has spent most of his career providing high-performance software solutions, with specific focus on those verticals that have low latency and big data requirements

Q: You have worked in many industries. What commonalities do you see across them?

A: Despite all the changes we see today, there is one overarching business philosophy that still applies across pretty much all industries, and that is the old adage: “Increase revenue, cut costs”. What varies, however, is where and how both can be achieved and the language used to describe them.

In the semiconductor world, for example, these goals are achieved by a combination of optimizing the very complicated process of making wafers (one that that may involve more than 1,000 steps, and may take over 100 days to complete), cutting production costs by maintaining and improving the yield, minimizing scrap, and utilizing all the available tools in the supply chain.

In other industries it may be about energy and material costs and or about services and performance. The approaches may vary but the overarching goals are broadly the same.

Q: So how do you help companies in these different industries achieve their goals of increasing revenue and cutting costs?

A: Data. What I am seeing across all industries is that data – lots of it – has become the means by which those diverse approaches are identified, implemented and, ultimately, how their success is measured.

Q: What sort of data are we talking about in the manufacturing industry for example?

There are two main types. On the one hand we have operational, or what is sometimes referred to as OT data. Here I’m talking about tool-level data where we are seeing machines with thousands of sensors recording everything from temperature and pressure, to vibrations and emissions. Add to that what is sometimes referred to as “under the floor” data, broadly the same type of readings and data volume, but emanating from valves, pumps and similar devices at an infrastructural level. Then, on the other hand, we have mountains of data at the IT level, typically in management systems that monitor things like orders, inventory, and payments.

The problem is these systems are both completely separate. Each is a treasure trove of information on its own, but the real opportunity comes from combining them and unlocking their value. That, however, is a struggle for many organizations.

Q: What are the challenges in combining separate siloed systems?

There are a few.

One is the volumes involved. I have met clients who estimate they may be using only 10% of the data now available from some of the highly advanced tooling that they have invested in. A recent Economist article on the oil and gas industry echoed that view, but pitched it even lower – they citied a study commissioned by Google Cloud that said the oil and gas industry is using 1-5% of their data.

Another challenge is the frequency of the data. Vibration sensors may be emitting data at a rate of 40 to 80 thousand readings per second, and it is worth noting, that’s 24 hours a day, 365 days a week. That means 2.5 terabytes of data is produced for just one sensor, which is one of possibly hundreds of sensors in a typical machine.

Then the data has to be stored. All of it, and without loss. Why, you may wonder? Well, one reason is the traditional need to be able to go back in time for root-cause analyses of problems that may have occurred in the field to investigate what may have caused them; what conditions prevailed at the time, and evidence of changes that might have contributed to it. The more contemporary, and more demanding, need is to accumulate that data for machine learning (ML) initiatives because, as we know, ML relies on vast quantities of data

Q: Is managing and analyzing sensor data at this scale posing a significant technical challenge?

Yes it is, and it cripples existing technologies that are simply not designed for the volume and velocity of data I just mentioned. Many organizations have tried various Hadoop-based implementations as an alternative, but few seem to have been successful. Even if a company does manage to capture and store this information, that’s only part of the equation. The real value is in processing it, and processing it fast enough so that it can be used for real-time decision making. That’s where low latency processing comes into play and where KX’s pedigree in the financial services world is an advantage.

Q: How does the financial industry pedigree translate into an industry advantage?

There is something strikingly similar between the requirement to track massive streams of stock market data, and executing high-frequency trading strategies based upon them, and the need to capture masses of sensor readings from multiple sources to be able to take corrective action if a component fails, or a quality reading breaches a threshold.

I’ll give you an example. We had a client, a healthcare product provider, who wanted to implement a real-time quality control process in their injection molding process. The requirement was to capture the data from over a thousand sensors, run it through a machine learning model and make a decision on whether or not the unit just produced passed their quality specification. But it had to be done within 100 milliseconds – that’s what low-latency means. It was a perfect match for our KX for Sensors solution to capture, store and instantly process all of that data, all in one place.

The next step, for those units not meeting the initial quality criteria, is adding a “forward-feed-control” feature to apply corrections at a later point in the production line, so that the components can be used, rather than rejected. That’s both a reduction of scrap, and an improvement of yield, in one corrective action.

Q: Can KX for Sensors integrate across the organization?

Yes, and that’s very important. Ease of integration is crucial, but to achieve it at the process level just mentioned, you need to achieve it first at the technical and interface levels. It has to be remembered that organizations have already made significant investments in their technology stack, so it’s critical that any new solution fits into the existing infrastructure as something that can augment, as well as replace, existing functionality.

Sometimes we refer to this as “turbo charging your legacy system” and it’s something I try to illustrate to potential clients by showing them how a map of how a typical infrastructure (or better still, their exact infrastructure) can remain largely unchanged when adding our technology. This is done by adding KX as a data mart and analytics platform that can complement rather than usurp the existing RDBMS role, but has all the advantages of scalability, processing capability and openness that enable it to integrate into the existing technical and process flow environments.

Q: What are some examples of where KX for Sensors has been deployed?

I’ll give you two.

One goes back to the semiconductor wafer production environment I mentioned earlier. The client had two identical machines but the problem was one had a yield of 97%, whereas the other yielded around 92%. If you recall, I talked earlier about the importance of improving yield. Now that 5% difference is significant when you talk about producing 50,000 wafers per month in a manufacturing facility. A 5% loss decreases output to the level of 30,000 wafers per year.

The client was unable to identify why the second tool was performing so badly. Using KX for Sensors, we started capturing real-time data across the two devices and combined it with historical data they had accumulated, but had been unable to process. We then ran a series of number crunching algorithms and models over billions of data points which enabled us to identify the problem – a mismatch in their electrical current discharge. That was a good example of where data held the solution, but it was not something an engineer could have solved manually.

In another example, from quite a different world, we had a utility client who had rolled out smart meters across its customer base. As a result it was now ingesting billions of additional data readings but the problem was they were not able to extract value from them. The incumbent system was effectively constrained to doing only what it was originally designed for – billing customers. It was not able to run models on the data to analyze consumption patterns or support end-customer queries, which was what the whole smart meter initiative was about.

To solve the problem we installed KX for Sensors alongside the existing system in order to provide a separate data mart that was able to provide, via web services, analytics and querying capabilities to over five million customer on trillions of data points spanning over seven years of collection history. In short, the client was able to retain their incumbent systems, and thereby de-risk their project, yet still provide new services in an example of what I described earlier as “turbo charging your legacy system.”

Q: In these examples, in addition to the functional benefits were there cost benefits?

This is a question we are often asked, and interestingly the answer comes from two different angles: one relating to open source software alternatives and one to commercial software competitors.

On the open source side, especially in relation to Hadoop, there is a perception that because open source software incurs no direct license charges, that the solutions are cheaper. But this ignores the other elements of software integration, maintenance, hardware and support staff that all contribute much more to the Total Cost of Ownership (TCO) than an initial software license outlay. Crucially, when using a software solution like KX for Sensors, with a lower TCO, our customers have found that saving on these additional costs does not mean compromising performance. As an example, we have conducted tests where not only were we are shown to be hundreds of times faster than a Hadoop cluster, we required one twelfth of the infrastructure!

Similarly, when compared against commercial solutions, the same cost benefits can be realized. As an example, we worked with a client who wanted to support roughly 250 queries per second of real-time and historical data emanating from around 1,000 factory machines, at an ingestion rate of 5 trillion sensor readings per day. In comparison to the incumbent well-known commercial solution, KX for Sensors significantly reduced the hardware footprint (from 28 servers down to two) and it lowered license costs (from 480 cores down to less than 40). The client benefited from a 90% reduction in running costs in power, cooling, space and other overhead.

We are not being trite when we say we may be not only the world’s fastest time-series database but also the greenest!

Q: Finally, what do you see as the next great progression on how industry uses technology?

Over the past 25 years, KX has been a leader in the demanding world of low-latency algorithmic trading. Now, with the Industry 4.0 revolution there is a “raising of the bar.” Going forward we will need to meet similar demands in processing, but on orders-of-magnitude larger volumes and velocities of sensor data than have been traditionally faced.

KX is constantly progressing its technology to meet these challenges. The five trillion (yes, 5 x 10¹²) sensor data points discussed in relation to a smart meter customer solution, was run on a single server, with a second server used for high availability. We are able to scale both horizontally and vertically, so that we can run our technology, and place this data, in the appropriate compute and data tiers, whether they are on the industrial device, on-premises infrastructure, or in the cloud.

Manufacturing is on the cusp of a massive digital transformation where the enabler is technology that can support the volume and velocity of data it entails. It’s an exciting time to be involved in that transition with a technology that has such pedigree and capability. In fact we are just starting a series of blogs covering different aspects of KX and how it can be deployed across a range of applications – so keep an eye on this section of the website!

Data Challenges in Industrial IoT – with Hugh Hyndman

Demo kdb, the fastest time-series data analytics engine in the cloud