Home Yelp’s mrJob: Powering Recommendations and now Open Source

Yelp’s mrJob: Powering Recommendations and now Open Source

Yelp has a few nifty features on its network that gives it that special sauce. It’s what you see with most world-class social networks where features provide context and allow for discovery. It make it simple to use the service with such features as review highlights, autocomplete, spelling suggestion and top searches.

“People Who Viewed this Also Viewed…” is one of its popular features. It shows you photos by other people who also have similar viewing habits.

Take the King Burrito page on Yelp. It is a favorite Mexican spot in North Portland, Oregon. The food rocks. On Yelp, the sidebar shows what visitors to the King Burrito page are also viewing.

Yelp once used its own Hadoop cluster to power these types of services. But they had a few issues. Now they use what they call mrJob.

On Friday, they opened the distributed computing service for anyone to use.

According to the imformation on GitHub, mrJob “supports Amazon’s Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. It also works with your own Hadoop cluster.”

MrJob emerged after Yelp had issues with Hadoop. It would sometimes get in the way of other jobs. From the Yelp engineering blog:

“We had a dozen or so machines that we otherwise would have gotten rid of, and whenever we pushed our code to our webservers, we’d push it to the Hadoop machines.

This was kind of cool, in that our jobs could reference any other code in our code base.

It was also not so cool. You couldn’t really tell if a job was going to work at all until you pushed it to production. But the worst part was, most of the time our cluster would sit idle, and then every once in a while, a really beefy job would come along and tie up all of our nodes, and all the other jobs would have to wait.”

The Yelp team heard about EMR and decided to move its Hadoop cluster to the AWS platform. It took some time to move the code base but in May they retired their Hadoop cluster and switched its production to AWS. It’s that framework that became mrJob.

The Yelp team is encouraging developers to give mrJob a try. Details can be found on the Yelp Engineering blog.

Hadoop is for big data but the elasticity on a cluster can be minimal compared to what AWS can provide. It shows the value of using a service like AWS when the requirements go way beyond what an enterprise data center can handle with the best possible efficiencies.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the gambling and blockchain industries for major developments, new product and brand launches, game releases and other newsworthy events. Editors assign relevant stories to in-house staff writers with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest iGaming headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Gambling News

    Explore the latest in online gambling with our curated updates. We cut through the noise to deliver concise, relevant insights, keeping you informed about the ever-changing world of iGaming and its most important trends.

    In-Depth Strategy Guides

    Elevate your game with tailored strategies for sports betting, table games, slots, and poker. Learn how to maximize bonuses, refine your tactics, and boost your chances to beat the house.

    Unbiased Expert Reviews

    Honest and transparent reviews of sportsbooks, casinos and poker rooms crafted through industry expertise and in-depth analysis. Delve into intricacies, get the best bonus deals, and stay ahead with our trustworthy guides.