Real-Time Data Streaming Gets Standardized

One of the advantages of open source is that it can accelerate standards adoption on a level playing field. If there is a big enough problem to solve, smart people can attract the best minds to work together, investigate and share the solution.

That said, standards bodies often become little more than a parlor game for incumbent vendors seeking to position the standard to their market advantage.

In other words, there’s lots of talk, but not much code.

In such a scenario, it’s easy to end up with implementations of a standard that each works differently due to unclear or ambiguous specifications. I recently sat down with Viktor Klang, Chief Architect at Typesafe, one of the lead organizers of reactivestreams.org, an open source attempt to standardize asynchronous stream-based processing on the Java Virtual Machine (JVM). 

Klang and his group—along with developers from Twitter, Oracle, Pivotal, Red Hat, Applied Duality, Typesafe, Netflix, the spray.io team and Doug Lea—saw the future of computing was increasingly about stream-based processing for real-time, data-intensive applications, like those that stream video, handle transactions for millions of concurrent users, and a range of other scenarios with large-scale usage and low latency requirements.

The problem? Lack of backpressure for streaming data means if there’s a step that’s producing faster than the next step can consume, eventually the entire system will crash.

ReadWrite: What is driving this shift in computing to reactive streams today?

Viktor Klang: It’s not a new thing. Rather, it’s more like it was becoming a critical mass as more people started using Hadoop and other batch-based frameworks. They needed real-time online streaming. Once you need that, then you don’t know up front how big your input is because it’s continuous. With batch, you know up front how big your batch is.

Once you have potentially infinite streams of data flowing through your systems, then you need a means to control the rate at which you consume that data. You need to have this back pressure in your system to make sure the producer of data doesn’t overwhelm the consumer of data. It’s a problem that becomes visible once you start going to real-time streaming from batch-based.

Users have been asking for more “reactive” streams for a long time, for building their own network protocols or for their specific application needs. Any time you need to talk to a network device, you want to use this abstraction. Anything that has an IP address.

With reactivestreams.org, we’re trying to address a fundamental issue in a compatible way to hook all these different things together to work while being inclusive. Long-term, the plan for this is to build an ecosystem to build implementations that can be connected to other implementations and then have developers building more things on top of it. For example, connect Twitter’s streaming libraries with RxJava streaming libraries, and pipe into Reactor, Akka Streams, or other implementations on the JVM.

RWWho are key members today?

VK: Certainly Typesafe jumped in early, since we have an open-source software platform that deals with a lot of what the industry calls “reactive application challenges.” We were thrilled to have Twitter join, the Reactor guys from Pivotal, and Erik Meijer from Applied Duality, as well as Ben Christensen and George Campbell who work at Netflix. Red Hat’s in there with Oracle, and we also have some critical individuals like Doug Lea, inventor of “java util concurrent,” driving all concurrency stuff in the JVM. One of the goals of the project is to create a JSR for a future Java version.

Everyone pulls their weight. It’s just really hard to get engineering time from people at this level.

RWStandards don’t tend to be very popular with developers. How are you trying to approach this to attract more key people?

VK: You’re right, the average developer is about as interested in standards as cats are in water. Jokes aside, however, we start with open source. I think of this project as a non-standard standards thing. We are inverting the usual process. We have created a spec, a test suite that verifies the spec and we created a description of why the spec is what it is and why it isn’t what it isn’t. We’re really creating solutions, picking them apart, and confirming they do what they say they do and using this process to create the best specification.

RW: It sounds like developers in this case are also addressing an ops or a dev ops problem?

VK: As a developer, you can make life really difficult for your ops guys. This is about getting it right so your ops guys don’t come over and mess you up. Previously they’d have to make sure you don’t feed the system more information than it can process, so you’re not blowing up resources, making sure the processing is always faster than the input. It’s really tricky to do that for variable loads.

RWWhat are some examples that might inspire your core audience of Java developers?

VK: What’s a hard case for an enterprise Java developer? If you have a TCP connection with orders coming in and you need to perform some processing to it before passing it on to another connection, you need to make sure you aren’t pulling things off the inbound connection faster than you are able to send to the outbound connection. If you don’t, then you’ll risk blowing the JVM up with an OutOfMemoryError.

For web developers, it could be streaming some input from a user and storing it on Amazon S3 without overloading the server, and without having to be aware of how many concurrent users you can have. That’s a challenging problem to solve now.

Image courtesy of Shutterstock

Facebook Comments

New

Rising

Popular