You may have heard about the do-it-yourself structured data extraction and management service called Needlebase, which Google will acquire along with its parent company ITA Software if regulators aprove the merger. ITA will power travel search results for Google, and I’m sure that will be great, but Needlebase is a much more interesting technology to me.
Needlebase is a point and click tool for scraping data from web pages, turning that data into a database, normalizing it, de-duplicating it, and reconciling data similarities. It requires absolutely no technical knowledge – but with a little imagination it can do incredible things. ReadWriteWeb is excited to offer 100 invites to the closed beta of Needlebase today. Read on for screenshots and info about how to get an account.
For my first Needlebase database, I looked at the community around our geolocation-specific Twitter account, @rwwgeo. I used Needlebase to extract the usernames, geographic locations and website URLs for all the people who have created a Twitter List that includes @rwwgeo. Geonerd list curators. Who are these people and where do they live?
It took me minutes to do and the possibilities are incredible. For one thing, I was disappointed to find that though 72 people have put @rwwgeo on a Twitter list, only 3 of them live in Colorado, the geotechnology capital of the US. I’ll run the same analysis on all the account’s Twitter followers before getting down on myself too much, but I suspect that means the account is getting more traction among Silicon Valley tech geeks than it is among Colorado geogeeks. That’s something I, as the manager of that account, would like to know. You can view my dataset here.
Twitter is just one type of website to scrape and turn into a database. All kinds of sites offer lots of possibilities. Location data can be grouped by metro area and entries can be merged and unmerged by drag and drop.
The service’s key differentiator is its ability to look at a big set of data and recommend which entries should be merged. If you tell it to look at a list of restaurants by name and location, it will recommend that “Bob’s Burgers, Tampa, Florida” be merged with “Bobs Burgers, Tampa, Florida” but not with “Bob’s Burgers, Portland, Oregon.” The interface displays a list of recommended merges, which can be managed in bulk.
It’s a really incredible tool, I want to take a week off of work just to think up, scrape, sort and analyze databases from all kinds of different pages around the web. I really, really hope that Google doesn’t drown the tool if the ITA acquisition goes through.
The first 100 readers who email [email protected], subject line Needlebase, with your name, email and perhaps a little information about yourself, will receive invites from the company.