Six years ago, Swedish programmer Jonas Bonér set about trying to crack some of the most challenging problems in distributed computing. These included scalability, so that a system as large as the Internet of Things won’t fail no matter how large it gets; elasticity, a way of making sure that its computing problems are matched with the right hardware and software at the right time; and fault-tolerance. And he wanted to make sure his system would work in a “concurrent” world in which zillions of calculations are happening at once—and often interacting with one another.
He may or may not have been listening to ABBA while doing so.
Bonér had built compilers, runtimes and open-source frameworks for distributed applications at vendors like BEA and Terracotta. He’d experienced the scale and resilience limitations of existing technologies—CORBA, RPC, XA, EJBs, SOA, and the various Web Services standards and abstraction techniques that Java developers have used to deal with these problems over the last 20 years.
He’d lost faith in those ways of doing things.
This time he looked outside of Java and classical enterprise computing for answers. He spent some time with concurrency-oriented programming languages like Oz and Erlang. Bonér liked how Erlang managed failure for services that simply could not go down—i.e., things like telecom switches for emergency calls—and how principles from Erlang and Oz might also be helpful in solving concurrency and distributed computing problems for mainstream enterprises.
In particular he saw a software concept called the actor model—which emphasizes loose coupling and embracing failure in software systems and dataflow concurrency—as a bridge to the future.
See also: What’s Holding Up The Internet Of Things
After about three to four months of intense thinking and hacking, Bonér shared his vision for the Akka Actor Kernel (now simply “Akka”) on the Scala mailing list, and about a month later shared the first public release of Akka 0.5 on GitHub.
Today Akka—celebrating the five year anniversary for its first public release on July 12—is the open source middleware that major financial institutions use to handle billions of transactions, and that massively trafficked sites like Walmart and Gilt use to scale their services for peak usage.
I recently caught up with Bonér—now CTO and co-founder of Typesafe—to get his take on where Akka has seen traction, how it has evolved through the years and why its community views it as the open-source platform best poised to handle the back-end challenges of the Internet of Things, which is introducing a new order of complexity for distributed applications.
How To Manage Failure When Everything Happens At Once
ReadWrite: What is the problem set that initially leads developers to Akka?
Jonas Bonér: Akka abstracts concurrency, elasticity/scale-on-demand and resilience into a single unified programming model, by embracing share-nothing design and asynchronous message passing. This gives developers one thing to learn, use and maintain regardless of deployment model and runtime topology.
The typical problem set is people want the ability to scale applications both up and out; i.e., utilize both multicore and cloud computing architectures. The way I see it, these scenarios are essentially the same thing: it is all scale-out. Either you scale-out on multicore, where you have multiple isolated CPUs communicating over a QPI link, or you scale-out on multiple nodes, where you have multiple isolated nodes communicating over the network.
Understanding and accepting this fact by embracing share-nothing and message-driven architectures makes things so much simpler.
The other main reason people turn towards Akka is that managing failure in an application is really hard. Unfortunately, to a large extent, failure management is something that historically has been either ignored or handled incorrectly.
Failing At Failure Management
The first problem is that the strong coupling (between components) of long, synchronous request chains raises the risk of cascading failures throughout the application. The second major problem is that the traditional way to represent failure in the programming model is through exceptions thrown in the user’s thread, which leads to defensive programming with the error handling (using try-catch) tangled with the business logic and scattered across the whole application.
Asynchronous message passing decouples components by adding an asynchronous communication boundary—allowing fine-grained and isolated error handling and recovery through compartmentalization. It also allows you to reify errors as messages to be sent through a dedicated error channel for management outside of the user call chain and not just throw it in the caller’s face.
The broad scenarios where Akka gets a lot of traction are those where there are a lot of users and unexpected peaks in visitors, environments where there are a lot of concurrently connected devices and use cases where there is just a ton of raw data or analytics that need to be crunched. Those are all domains where managing scale and failure are of critical importance, and those are where Akka quickly got a lot of traction.
In The Actor’s Studio
RW: What is an “actor,” and why is the actor model that’s been around for more than 40 years seeing a renaissance?
JB: Actors are very lightweight components—you can easily run millions of live actors on commodity hardware—that help developers focus on communications and functions between services. An actor encapsulates state and behaviour and communicates through its own dedicated message queue, called its “mail box.” All communication between actors is message-driven, asynchronous and fire-forget.
Actors decouple the reference to the actor from the runtime actor instance by adding a level of indirection—the so-called ActorRef—through which all communication needs to take place. This enables the loose coupling that forms the basis for both location transparency—enabling true elasticity through an explicit model for distributed computing—and the failure model that I mentioned.
The actor model provides a higher level of abstraction for writing concurrent and distributed systems—it frees the developer from having to deal with explicit locking and thread management, and makes it easier to write correct concurrent and parallel systems. Working with actors also gives you a very dynamic and flexible programming model that allows you to upgrade actors independently of each other and shifting the them around nodes without changing the code—all driven through configuration or adaptively by the runtime behavior of the system.
Like you say, actors are really nothing new. They were defined in a 1973 paper by Carl Hewitt, where discussed for inclusion in the original version of Smalltalk in 1976 and have been popularized by the Erlang programming language which emerged in the early 1980s. They have been used by Ericsson, for example, with great success to build highly concurrent and extremely reliable (99.9999999% availability—equal to 31 millisecond downtime per year) telecom systems.
The main reason that the actor model is growing in popularity is because it is a great way to implement reactive applications, making it easier to write systems [that] are highly concurrent, scalable, elastic, resilient and responsive. It was, like a lot of great technology, ahead of its time, but now the world has caught up and it can start delivering on its promises.
Scaling The Internet Of Things
RW: There is a lot of interest about Akka in the context of the Internet of Things (IoT). What’s your view of the scale challenges that are unique to IoT?
JB: The Internet of Things—with the explosion of sensors—adds a lot of challenges in how to deal with all of these simultaneously connected devices producing lots of data to be retrieved, aggregated, analyzed and pushed back out the the devices while maintaining responsiveness. Challenges include managing huge bursts in traffic in receiving sensor data at peak times, processing of these large amounts of data in both batch processes and real-time, and doing massive simulations simulating real-world usage patterns. Some IoT deployments also require the back end services to manage the devices, not just absorb the data sent from the devices.
The back-end systems managing all this needs to be able to scale on demand and be fully resilient. This is a perfect fit for reactive architectures in general and Akka in particular.
When you are building services to be used by potentially millions of connected devices, you need a model for coping with information flow. You need abstractions for what happens when devices fail, when information is lost and when services fail. Actors have delivery guarantees and isolation properties that are perfect for the IoT world, making it a great tool for simulating millions of concurrently connected sensors producing real-time data.
RW: Typesafe recently collaborated with a number of other vendors on the reactive streams specification, as well as introducing its own Akka Streams. What do the challenges look like for data streaming in an IoT world?
JB: If you have millions of sensors generating data, and you can’t deal with the rate that this data arrives—that’s one early problem set that we’re seeing for the back-end of IoT—you need a means to back-pressure devices and sensors that may not be ready or have the capacity to accept more data. If you look at the end-to-end IoT system—with millions of devices, the need to store data, cleanse it, process it, run analytics, without any service interruption—the requirement for asynchronous, non-blocking, fully back-pressured streams is critical.
We see Akka Streams playing a really important role in keeping up with inbound rates and managing overflow, so that there are proper data bulkheads in IoT systems.
Lead image courtesy of Shutterstock; image of Bonér courtesy of Jonas Bonér