When Google acquired Metaweb last summer it got Freebase Gridwork, a tool for cleaning up messy datasets, as part of the deal. Today Google released a new version of the tool, now called Google Refine. Like its predecessor, Google Refine is open source.
Google Refine is a tool for working with datasets, including but not limited to Freebase. According to Google, it can be used for: “cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases.”
Major new features include:
- New extension architecture.
- Generalized reconciliation framework that allows plugging in standard reconciliation services.
- Support for QA on data loads into Freebase.
There are also many new commands and expressions – a complete list can be found here.
For more information on Freebase, see our overview.