Google Refine, formerly known as Freebase Gridworks, has been updated to version 2.1. Refine is an open source tool for cleaning up messy data sets before linking them into systems such as Freebase. The update includes new HTML parsing functions, the ability to import Google Fusion Tables and more.
Freebase Gridworks was one of the tools included in Google’s acquisition of Metaweb. We covered its last big update here.
New features include:
- HTML parsing functions (based on JSoup)
- Metaphone3 (American English) & Cologne Phonetic (German) coders & clustering
- Google Fusion Table import support
- Facet for exact duplicates
- Ability to star favorite expressions for reuse later
- Latest Apache POI library including a number of Excel bug fixes
As we’ve noted before, Google Refine can be used with other Google tools to create a fairly powerful stack for working with big data.
Refine competes with other data cleaning tool such as DataWrangler.