Yahoo! says it is the world's biggest Hadoop supporter. We say that's undoubtedly correct. Yahoo! supports community developer events throughout the world. In February it supported the first Hadoop! event in India. In June, it will host the Hadoop Summit.
Yahoo! is not always recognized for its cloud computing efforts but its deep commitment to Hadoop shows how the company views the ways that big data can be used to solve major technology issues such as spam.
Hadoop, according to Wikipedia, "is a Java software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data."
The developer conference featured discussions from the Hadoop community, including a presentation about using it to fight spam lead and a discussion led by a lead engineer from Facebook.
Vishwanath Ramarao is director of anti-spam engineering for Yahoo! Mail. According to the Yahoo! developer blog, Vish described the intricate cat-and-mouse games played with spammers, and how Yahoo! uses Hadoop to abstract away the complexity of large scale data analysis and provide deep insight into spammer campaigns.
Johhn Sichi, lead engineer for Facebook's data infrastructure team provided an overview of Facebook's work using Hadoop to manage data that is growing 8x annually, In March, 2008 traffic volume hit 200 GB per day. By the end of last year, traffic bumped to 12 terabytes per day.
Companies like Yahoo! and Facebook use Hadoop to organize data and process it from multiple sources. For instance, Facebook might use it to organize how it deploys its ad network.
Yahoo! may be on to the most powerful use for cloud computing or at least the most interesting. And it shows how the company is thinking about cloud computing and the ways it applies to its overall strategy.