Home Yelp’s mrJob: Powering Recommendations and now Open Source

Yelp’s mrJob: Powering Recommendations and now Open Source

Yelp has a few nifty features on its network that gives it that special sauce. It’s what you see with most world-class social networks where features provide context and allow for discovery. It make it simple to use the service with such features as review highlights, autocomplete, spelling suggestion and top searches.

“People Who Viewed this Also Viewed…” is one of its popular features. It shows you photos by other people who also have similar viewing habits.

Take the King Burrito page on Yelp. It is a favorite Mexican spot in North Portland, Oregon. The food rocks. On Yelp, the sidebar shows what visitors to the King Burrito page are also viewing.

Yelp once used its own Hadoop cluster to power these types of services. But they had a few issues. Now they use what they call mrJob.

On Friday, they opened the distributed computing service for anyone to use.

According to the imformation on GitHub, mrJob “supports Amazon’s Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. It also works with your own Hadoop cluster.”

MrJob emerged after Yelp had issues with Hadoop. It would sometimes get in the way of other jobs. From the Yelp engineering blog:

“We had a dozen or so machines that we otherwise would have gotten rid of, and whenever we pushed our code to our webservers, we’d push it to the Hadoop machines.

This was kind of cool, in that our jobs could reference any other code in our code base.

It was also not so cool. You couldn’t really tell if a job was going to work at all until you pushed it to production. But the worst part was, most of the time our cluster would sit idle, and then every once in a while, a really beefy job would come along and tie up all of our nodes, and all the other jobs would have to wait.”

The Yelp team heard about EMR and decided to move its Hadoop cluster to the AWS platform. It took some time to move the code base but in May they retired their Hadoop cluster and switched its production to AWS. It’s that framework that became mrJob.

The Yelp team is encouraging developers to give mrJob a try. Details can be found on the Yelp Engineering blog.

Hadoop is for big data but the elasticity on a cluster can be minimal compared to what AWS can provide. It shows the value of using a service like AWS when the requirements go way beyond what an enterprise data center can handle with the best possible efficiencies.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.