Home A Twitter Storm Arrives: Storm Project Open Sourced

A Twitter Storm Arrives: Storm Project Open Sourced

In August, Twitter acquired BackType, a social media analytics company. One of the things that Twitter picked up in that acquisition was Storm, the “Hadoop of realtime processing.

At the time, Twitter said that it would open source Storm in September at the Strange Loop conference in St. Louis. Guess what? They did. As of this week, Storm is on GitHub under the Eclipse Public License (EPL).

What is Storm?

“Storm is a distributed, reliable, and fault-tolerant stream processing system. Its use cases are so broad that we consider it to be a fundamental new primitive for data processing.”

Other than Marz’s announcement at Strange Loop and his

announcement on Twitter

, the official open sourcing of Storm hasn’t gotten much attention. Which is a shame, because Storm looks to be a solution that many developers are going to want to check out. According to Storm’s documentation, data processing has seen a revolution in the past decade, but systems like Hadoop are not real-time. Services like Twitter need to process data in real time.

So what is Storm anyway? Nathan Marz (formerly with BackType, and the guy maintaining it on GitHub), described it like so: “Storm is a distributed, reliable, and fault-tolerant stream processing system. Its use cases are so broad that we consider it to be a fundamental new primitive for data processing.”

The use cases for Storm include stream processing (like processing Tweets), continuous computation and distributed real-time processing. In Twitter’s case, Storm is being used to do things like compute trending Twitter users and figure out the “reach” of a tweet; reach being the unique number of people that would see a tweet. Says Marz, “To compute reach, you need to get all the people who tweeted the URL, get all the followers of all those people, unique that set of followers, and then count the number of uniques. It’s an intense computation that potentially involves thousands of database calls and tens of millions of follower records.”

That requires, well, a lot of computing, but is made much simpler by Storm. Marz says “It can take minutes or worse to compute on a single machine. With Storm, you can do every step of the reach computation in parallel and compute reach for any URL in seconds (and less than a second for most URLs).”

Storm: distributed and fault-tolerant realtime computation

View more

presentations

from

nathanmarz

Storm is meant to deal with real-time computation, be scalable, guarantee no data loss, and be programming language agnostic.

Want to catch up on Storm? Check out Marz’s presentation from Strange Loop and the rationale page. We think it’s pretty interesting, and very much worth keeping an eye on.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.