Yesterday was the opening day for the Structure Data Conference in San Francisco, and one of the first panelists was Neha Narkhede, the co-founder of Confluent. She and the company are preaching the value of real time data processing, and are deploying it themselves with Apache Kafka, a highly scalable messaging system.
We sat down with her after her panel to talk about what real-time data means for IoT.
First, in your mind, for IoT to get past the hype stage, what has to happen from a data perspective to make this mature?
Narkhede: I think the first thing that needs to happen, from a data point of view, is really two steps. First, how do we collect the data from all these different devices, and the second is how do you process it and analyze it. And I think right now we are still on the first step, which is how do you collect the data from all of these devices. And the technology exists, such as Kafka, to collect the data from the devices. However, all the connectors to all these different devices still need to be built out.
The community has to catch up and write all these connectors and we (then) can start getting all this data in. And the second phase for IoT will be “OK, we have all this data, now how do we build all these cool applications on top of it?” and that’s where stream processing comes in. I think it’s a fascinating area and yes, IoT is on the hype curve and it has a lot of different aspects such as security, data processing, and data movement – and we are still on the data collection problem. Let’s get this data in first.
So security is hard enough with static data. Walk us through the additional complexities of continuous streaming data.
Narkhede: You know, I think the problems with streaming data are all the same, whether it’s security or latency. In my experience stream data is very much like batch data, although the technology is a little different in how you process the data. The main concerns are all the same, whether it’s data at rest or data in motion. How do you allow users to set up all the rules correctly, how do you implement it in systems so you can lock down different streams. Those are all very well-studied areas in security. I think it just needs to mature to the point where these things get implemented and operationalized and stabilized.
And we are in the phase in streaming data where we are still stabilizing those new feature that are built in – whereas data at rest has already gone through decades of research, as well as decades of stabilization. So it’s really about maturity and less about any new innovation when it comes to security.
Do you see the problems similar for B2C IoT as well as the B2B scenarios you describe?
Narkhede: I think a lot of the problems are the same, but the applications are very, very different. The ones we see are in industrial IoT because we are an enterprise company. It’s very cool, machines have sensors and you want to collect quality data in real time and that is happening with a lot of these personal devices. And car companies are collecting data from cars to ensure driver safety in real time, which otherwise would have happened in months.
And I think medical organizations are collecting patient data in real time which I think is really cool application of IoT. The consumer applications are somewhat cooler, if you will – Nike and a lot of the other health companies and devices that are collecting data and telling you if you are doing a good job exercising or not. I think there is a whole hype curve about how your refrigerator is going to talk to your toaster and I don’t know if that will happen, but that is what the consumer will see.
So I think the bigger impact will be in enterprises and industrial data and IoT and if we succeed there it will be an amazing thing to see.
This morning you mentioned IoT application and you talked about the “shelf life of data.” Talk a little more about that.
Narkhede: When you look at IoT and the data coming from devices, it’s a natural stream, and it’s a new area that people understand has a continuous stream of data that devices are going to manage. So I think that data has a shelf life of value because a lot of the data – and the applications that people want to built on top of it – have to harness that data quickly. So the whole value proposition is around real time, and so we have to ingest all that data and react to it in real time. The current systems are more batch oriented that react to events in a couple of hours. That is not going to cut it at this point. This is stream data, with low latencies, so let’s use these new platforms like Kafka and Spark and figure out how we can actually build out these applications in real time. A lot of the rest of the argument is around data at rest and (it) actually being in motion.
For example, in finance you have stock data, which is a stream. In retail you have sales and shipment data, which is a stream. Except right now, people don’t view it as a stream but it is absolutely a stream. And a lot of the data at rest will also move to data in motion as this technology matures. And that’s kind of the new trend in stream processing.
What do you think is the one application of this that no one has thought of?
Narkhede: I think better, faster machine learning. We have a lot of data, we know a lot about things, how about we assist humans in making better decisions, and making faster, higher quality decisions. My favorite is healthcare. If we had enough data processing and machine learning around what we could prevent instead of what we can fix, I think that would be a big deal. I think in the whole promise of real time data and stream processing, if we get to that point, it will do a lot of things for all of us, as people. And that’s interesting to me.