Home How Open Source Succeeds In The Cloud—It Trades Freedom For Simplicity

How Open Source Succeeds In The Cloud—It Trades Freedom For Simplicity

Those new to open source won’t remember just how much of the early code amounted to little more than crappy-but-free clones of popular proprietary products. Boy, how times have changed.

Open source, once a clumsy (but free!) imitator of proprietary innovation is now doing taking the lead on industry innovation, with Big Data being the most obvious example. While this is a hugely positive industry shift, it also introduces complexities. Namely, with so much exceptional open source software contending to power your next Big Data project, how do you choose which to use?

Opening Up Innovation

Black Duck Software recently named its annual “Open Source Rookies of the Year,” pulling data from thousands of projects relative to project activity, commits pace, project team attributes, and other factors. Spanning cloud and virtualization, mobile, social media and more, they reflect the ever-increasing scope of code that is successfully developed in the open, rather than behind closed doors.

See also: Why Your Company Needs To Write More Open-Source Software

Nowhere is this trend more evident than in Big Data.

As Cloudera co-founder Mike Olson declares, “No dominant platform-level software infrastructure has emerged in the last ten years in closed-source, proprietary form.” That’s a stunning assessment, but it’s absolutely true. Open source may have come to life as an imitator, but it’s innovating at a frenetic pace in Big Data land.

Which may be a problem.

Spoiled By Open Source Riches

Big Data projects are now being released at such a frenetic pace that developers struggle to keep up. In case you’re just getting your feet wet with Hadoop, for example, you now need to consider Spark, Samza or a variety of other oddly-named but increasingly important Big Data tools.

See also: Applications Drive The Biggest Money In Big Data

Importantly, these tools are largely being born within enterprises like LinkedIn that have serious Big Data needs that no commercial software can solve. Even the National Weather Service has jumped in, open sourcing the code that powers its global forecast system.

While most companies won’t need such niche code, they may want the sorts of things released by the big Web companies. Take for instance, LinkedIn’s release of Apache Samza:

The LinkedIn-developed framework is designed to process complex real-time workloads that require special handling after ingestion. It embeds a local key-value store in every stream that makes it possible to store the kind of contextual information needed to carry out advanced operations such as merging datasets locally instead of having to query a remote system every time they’re needed.

This leads to fantastic performance. It also leads to the question: what should a developer use to tackle her organization’s data load?

On the database side, there are hundreds of options, ranging from NoSQL databases like MongoDB and Cassandra to relational mainstays like Oracle and MySQL. Should a developer choose the most popular database, picking from a list like DB-Engines’ ranking? That’s one approach, but you could easilyend up with a big mismatch between the workload and the tool managing it.

If this seems like a trivial problem, it’s not. At all. I spent years working for Big Data infrastructure providers, and now work for a company trying to make sense of the deluge of open source Big Data tools. It’s hard to keep up, and very difficult to know which to use.

Closing Off Choices

One reason that Amazon Web Services (AWS) has become the go-to public cloud is that the company has managed to simultaneously offer a broad array of open source solutions to run (supported and unsupported) on its cloud, and a suite of proprietary services for everything from email to data warehousing.

Developers, anxious to “get stuff done,” can turn to AWS and know that they’ll have both a variety of options and the safety of a paved path.

Microsoft Azure has followed suit. Not content to roll out a Hadoop-based analytics service, for example, Microsoft is now close to releasing Cosmos, its parallel processing and storage service. Or take the company’s support for MongoDB, an open source document database, to appeal to those that want the popular NoSQL database. At the same time, Microsoft has rolled out its own document database as a service, for those that want a document database but may prefer Microsoft’s packaging of it.

Microsoft, in short, wants to provide choice to its customers, but curated and nicely packaged.

This looks like the future of open source infrastructure: free to download, but perhaps more useful rolled into a cloud service that removes complexity (and choice). It may not be what the open source crowd would prefer, but it may end up being the ideal way to turn open source Big Data innovation into solutions mainstream enterprises can actually use.

Photo by George Thomas

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.