Hadoop seems to be on everyone’s minds this year. It’s certainly a hot topic for Forrester’s James Kobielus, who’s recently released several reports on Hadoop – including a best practices guide aimed at enterprises.
Hadoop is still pretty young, so Kobielus first starts with a couple of challenges facing early adopters. The big challenges, according to Kobielus? An immature market, evolving core specifications, a need for custom coding and a lack of a widely adopted “stack” for Hadoop.
Does this sound at all familiar? If you followed the early days of Linux, you should be feeling a bit of Déjà vu here, because this was exactly the sort of thing analysts were saying around 1999 to 2002 (give or take) about Linux. It’s not wrong, per se, but it shouldn’t stand in the way of adoption.
Best Practices
That said, Kobielus does have some good advice for best practices when adopting Hadoop. For example, Kobielus says that companies need to build staff skills – even a “center of excellence” that has training with MapReduce and connections to the Hadoop community.
He also waves companies away from “science projects” with Hadoop, or adoption that lacks business value. In short – don’t jump on the Hadoop bandwagon just because it’s the Next Big Thing.
Another recommendation from Kobielus that companies should heed is to avoid “overbuilding” the Hadoop cluster if a smaller cluster does the job. And make sure that you can combine Hadoop “silos” at a later date. Says Kobielus, “most of these companies’ Hadoop clusters implement a common stack of Hadoop subprojects, from the storage layer on up. This architectural approach facilitates subsequent convergence of silos as well as easy promotion of MapReduce and other jobs between the silos. Yahoo, in particular, has architected its Hadoop clusters to minimize the interoperability glitches that might result from silos.”
And if you don’t have the in-house expertise, Kobielus guides companies towards cloud/Software-as-a-Service (SaaS) providers like Amazon and Appistry. If you want in-house control, but lack expertise, there’s also a slew of commercial options – but you’ll need to do plenty of research before settling on one.
No doubt, quite a few ReadWriteEnterprise readers have had some hands-on experience with Hadoop in production already. I’d be curious what best practices you have to share.