acquired social media analytics company BackType. Much of BackType's technology (such as ElephantDB and Cascalog) are already open source, and this week Twitter announced that BackType's Storm will be open-sourced at the Strange Loop conference in September.Last month Twitter
Storm is a Hadoop-like system, but instead of running MapReduce "jobs" that eventually end, Storm runs never ending "topologies." It can be used for continuous computing, processing streams of data, etc.
Here's the rundown of the use-cases from the Twitter Engineering blog:
- Stream processing: Storm can be used to process a stream of new data and update databases in realtime. Unlike the standard approach of doing stream processing with a network of queues and workers, Storm is fault-tolerant and scalable.
- Continuous computation: Storm can do a continuous query and stream the results to clients in realtime. An example is streaming trending topics on Twitter into browsers. The browsers will have a realtime view on what the trending topics are as they happen.
- Distributed RPC: Storm can be used to parallelize an intense query on the fly. The idea is that your Storm topology is a distributed function that waits for invocation messages. When it receives an invocation, it computes the query and sends back the results. Examples of Distributed RPC are parallelizing search queries or doing set operations on large numbers of large sets.
Much more detail can be found in the blog post.
We're still not sure how Twitter will be using BackType's technology, but it's good to see that at least this part of it will be released. I'm always happy to see tech startups open-sourcing tools. I've made the case before that as companies come and go open source leaves a legacy.
Twitter has explained its use of Hadoop in the past, and it does seem that Storm is well-suited for certain elements of Twitter's operation. The Storm announcement specifically mentioned streaming trending topics to the browser.