Home Data Scraping Comes of Age With ScraperWiki.com

Data Scraping Comes of Age With ScraperWiki.com

A scrappy company to help journalists dig into Big Data has come into its own in the past year, including the requisite all-night hacking codeathon this week at the Investigative Reporters and Editors Computer-Assisted Reporting Conference in St. Louis. The company is called ScraperWiki.com and was started by Julian Todd and Aidan McGuire, two U.K.-based analysts who have been long involved in opening up government data to the public.

Take a look at this data that was mined from the UN peacekeeping troop levels, as one example of what you can do. It is really like the Wild West of data visualization. Todd says in one blog post about his own data scraping efforts, “Look, you have just got all this way starting from nothing, from finding something out in the world, to recognizing its potential, all the way to pulling in and transforming the original raw data and struggling for a way to analyze it.”

If you are interested in writing your own data scraping routines, you can watch several how-to screencasts on ScraperWiki here. You can program in php, Python, or Ruby. Most of the time you are gonna have to know some SQL code to work your way around these data sets. At the St. Louis conference, work was begun on scraping various public data sets such as the US federal prisoners or FDA drug and food recalls.

IRE.org also has a collection of different databases, too, such as ones on environmental data and campaign spending, but these are only available to member journalists.

There are even bounties to be had (not much, a couple hundred bucks) if you write your own data scraping tool and make it available as part of the Open Corporates effort.

Clearly, as more data becomes available online, scraping apps abound. But part of the problem is that journalists don’t necessarily know SQL, let alone Ruby or where to find these treasure troves. That is where the conference and the codeathon this week come in handy, where dozens of folks learned how to start to take a stab at these visualizations as part of their reporting jobs. We’re glad to see this happening!

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.