On January 10th, LinkedIn's Matthew Hayes announced the release of DataFu on the LinkedIn engineering blog. DataFu is available on GitHub under the Apache 2.0 license. DataFu is a collection of UDFs that LinkedIn has developed for data mining and statistics.
Hayes' post walks through using DataFu to work through an example scenario computing quantiles from a fake data set, so interested developers can jump in and try the DataFu library out immediately. The project also includes a set of unit tests for each UDF.
It's impressive to see just how much work is coming out of the Hadoop community these days. Any projects that you're keeping an eye on?