The cloud is a very different order of machine than the computer. It’s less like a telephone switchboard and more like one of those automat cafeterias you’d find all over New York City in the 1940s, like the Horn & Hardart. It’s a long wall full of desirable resources accessible by way of simple tokens. Imagine instead of selecting already prepared food from one of these walls, you had to schedule the ordering and cooking as well, dish by dish.
You’d need something that the cafeteria — or rather, that the cloud didn’t provide for itself. Java started something with the incorporation of a middleware layer. Python has Django, and Ruby is so well defined by Rails now that HR managers think the language itself is called “Ruby on Rails.” But every middleware layer is vastly different, designed according to the requirements of its language. For Scala — and its new cloud environment of choice, Heroku — there’s Akka.
“The library for high reliability, asynchronous computation is called Akka,” says Scala’s creator, Prof. Martin Odersky, in an interview with RWW (continued from last Wednesday). “It takes a lot of its ideas from the Erlang language, but transplanted into Java and the JVM.”
Leave a message at the tone
Scala does not actually look or read like Erlang, whose unique style reads more like Lisp. But it does borrow the concurrent programming model, which relies upon asynchronous messaging to let processes do what they will on their own time, and to let applications scale up when they can.
“If you have a computation that is too compute-intensive for a single server or a single core, you want to make it elastic,” explains Prof. Odersky. “If your application is a Web app, then there’s a very easy way to essentially scale that. You replicate your database, and you have several servers that all work off of your connection pool. At first approximation, a Web app is just a bunch of parallel strands, where the database gets served into the browser of the user. The different strands for different users really don’t have a lot to do with each other. But not every app is a Web app; a lot of apps require much more communication between the different nodes… There are applications that are much more interconnected — let’s say, transaction processing or event-driven computing or high availability scenarios.”
A simulation is one example Odersky provides of a class of application that is the structural opposite of a Web app. Here, processes aren’t distributed through replication as much as selective assignment over multiple nodes. Communication needs to take place between them in order for the simulation agenda to proceed. A control process may need to respond to node failure in such a way that the entire simulation doesn’t crash when one goes down, like a domino effect.
The way simulations used to be programmed decades ago, the sequences of events and the relative status of those events were often represented by flag variables — the kind that Scala eschews. In more formal models implemented since the 1990s (this 1997 academic research paper in PDF format shows one example), static architecture replaces the old system of registers and flags. Here, tasks are dispatched to discrete nodes, which only come back when there’s a result to report.
Models such as this are the breeding ground for Scala applications. But they also require a kind of interconnection layer through which the distribution may take place, especially if the Heroku cloud is being leveraged for hosting the worker nodes.
This is where Akka comes in. Just a few weeks ago, Scala became one of the Heroku PaaS platform’s “first-class” languages.
A fully formed Scala app (one that has shed its reliance on Java syntax) can make any number of functional statements which set distributed processes in motion. The app does not have to specify each process and dispatch each one individually. With compilers intended to produce object code for single computers, the nodes to which processes are dispatched typically reside on those single computers. Compilers can’t presume the varying conditions that are present in the cloud, so in Scala’s case, the Akka middleware layer takes over.
The shift to implicitness
“There’s one other mode of thinking, which is called parallel collections. In Scala, it’s a mode of implicit parallelism,” Odersky tells us. “Essentially, you leave it to the system to just make use of multicores as they’re there. Implicit means that you have available control of how exactly your scheduling works, and also you have essentially no control of what happens in the case of a failure. It’s okay to leave it to the system as long as it doesn’t matter, but sometimes it matters and you need something more explicit, and the same time much more flexible. So if you want to go beyond that, and say, ‘We need to deal with failure, and we need on-demand, fine-grained control,’ that’s why there’s Akka.”
When an entire multithreaded program relies upon mutable variables, or properties of objects, to report the state (status) of ongoing processes, the threads have to adopt some form of synchronization to ensure consistency and concurrency — for instance, to make certain changes one thread makes to the view of the database take into account all pending changes from elsewhere, and don’t override changes being made by another thread. What’s more, multiple threads must often share the same resources, especially in Java, which means some threads must wait their turn to gain access to a shared resource. This necessitates the creation of callback functions, for instance, to signal when a resource is available. Imagine hiring an army to watch every parking space in a shopping mall, and waiting for the entire army to return before deciding which open space to pull into, and you get the idea.
A model has been retroactively given to this type of task distribution, called shared-state concurrency. The opposite approach, which Scala takes, is called shared-nothing or message-passing concurrency, and it’s becoming critical to the operation of NoSQL databases with their sharded, widely distributed partitions. This model treats every process like a vending machine, accessible only through the front panel, or the endpoint. You give it arguments, you send it on its way, and it comes back with a result.
Message-passing concurrency, naturally, relies upon messaging. This is what the Akka layer provides. With Akka comes a new layer of abstraction, letting developers focus on more nuclear, nebulous processes while trusting the messaging layer to work in the background.
The end of von Neumann’s road
“Essentially, C succeeded brilliantly in abstracting the different hardware, and still letting you program very, very close to the metal. You’re close to the metal, but you’re not specific to the particular metal you’re working on. That’s the reason for its longevity and ubiquity. For Scala, it’s slightly different. We are on a much higher level. What we try to abstract [ourselves] from is the concept of a von Neumann machine itself, the idea that a machine is essentially something that always has a store, which has a bunch of variables that it can change, and that changing these variables means you pass single words from a process in and out. That’s the classical machine model, of which C is the paradigmatic abstraction.”
Then the Programming Methods Group professor offered us a history lesson: “A long time ago, John Backus gave a Turing Award lecture in 1978 where he said, ‘Can we overcome this von Neumann model?‘ It was essentially the beginning of the wave of functional programming… [Backus’] idea was to say, if you want to overcome this model of single words, single variables in memory, then we need to have programs that manipulate large data structures as a whole, not word-by-word. Scala’s very much in that stream of thought. Its specific contribution is to show that the stream of thought is, after all, completely compatible with object-oriented programming.”