Today Yahoo will release security and workflow products for Hadoop to the open-source community.
The products to be donated have been built internally at Yahoo and are designed to provide the enterprise with more incentives to use Hadoop for distributed data storage. The products will be donated to the Apache Foundation as part of the events at the Hadoop Summit, taking place today at Yahoo corporate headquarters.
Yahoo began using Hadoop in 2005, and now uses Hadoop in all aspects of its business: managing traffic, annotating content and personalizing the network for visitors to the home page. Yahoo has 170 petabytes of storage, and could grow to three times that amount in the next three months as the company introduces new products.
Since enterprise is motivated to protect its assets above anything else, it is Hadoop’s lack of apparent security infrastructure that gives pause to IT managers. Yahoo’s new security service virtualizes clusters and applies different levels of security based upon the user’s permissions. It’s essentially a service for managing IT security.
Streamlining the process is Oozie, a server workflow engine designed to automate the complex processes required to set up and manage a Hadoop infrastructure. It allows people to configure a set of stages, steps and decisions. The workflows manages the sometimes-customized methods people use when setting up Hadoop.
Yahoo is the leading contributor to and user of Apache Hadoop: The company currently runs the world’s largest Hadoop implementation, with 35,000 servers running in production.
The Yahoo Distribution of Hadoop with Security (beta) and Oozie, the server workflow engine for Hadoop, are available through the Yahoo! Developer Network.