Rumors have been circulating for the past few months that Yahoo would create its own Apache Hadoop commercialization company to compete with Cloudera. Today GigaOM's Derrick Harris reports that Yahoo will make an official announcement this week.

According to Harris, the spin-off will be called HortonWorks, a reference to the elephant-themed Dr. Suess book Horton Hears a Who.

HortonWorks is expected to offer a set of high level management tools on top of the Hadoop core, which will also be open source. Harris notes that Yahoo will try to maintain a good working relationship with Apache. However, Harris has reported that Yahoo's relationship with Apache was strained in the past. Competing with Cloudera, which employees core Hadoop developers, could make some uncomfortable.

Yahoo has long been a major backer of Hadoop, and has contributed the majority of the current code base. Hadoop creator Doug Cutting worked at Yahoo before moving on to Cloudera. Yahoo is moving into an increasingly crowded space also populated by companies like DataStax, EMC and IBM. Also, LexisNexis just open-sourced a competing big data technology HPCC. Harris writes "HortonWorks will have to ensure it advances Hadoop development across industry lines and not just in a manner optimized for Yahoo's webscale needs if it wants to gain adoption."

It's still early in the game, but Harris also points out that Hadoop companies aren't producing much revenue yet. "Cloudera is leading the charge right now with what I've heard is a few million in annual revenue, but that's hardly enough to sustain the amount of investment in Hadoop," he writes. However, The Wall Street Journal reports that analysts believe that commercial Hadoop could be a billion market.