Home LinkedIn Opens DataFu: A Library for Working with Hadoop and Pig

LinkedIn Opens DataFu: A Library for Working with Hadoop and Pig

LinkedIn has been making heavy use of Apache Hadoop and Pig with its People You May Know and skills features (among others), and has pulled together a lot of User Defined Functions (UDFs) for Pig in the process.

On January 10th, LinkedIn’s Matthew Hayes announced the release of DataFu on the LinkedIn engineering blog. DataFu is available on GitHub under the Apache 2.0 license. DataFu is a collection of UDFs that LinkedIn has developed for data mining and statistics.

The DataFu library has been tested against Pig 0.9. The library provides a number of functions for running PageRank, performing operations on Pig data bags, filtering input data and more.

Hayes’ post walks through using DataFu to work through an example scenario computing quantiles from a fake data set, so interested developers can jump in and try the DataFu library out immediately. The project also includes a set of unit tests for each UDF.

It’s impressive to see just how much work is coming out of the Hadoop community these days. Any projects that you’re keeping an eye on?

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.