Home Where to Find Open Data on the Web

Where to Find Open Data on the Web

Today, a story on Techmeme caught our eye. It was entitled “We Need a Wikipedia for data,” and the article, written by X-Googler Bret Taylor, discussed the difficulty of finding open data sets on the internet, something which could spur innovation, allowing programmers to build new applications the likes of which have never been seen before. What was interesting about this story, in addition to, obviously, the concept of a Data Wiki itself, was the amazing and insightful commentary around this concept, not just on the blog, but all over the net, something which led to the discovery of some pretty good data sources that are already available.

In Bret’s story, he mentioned some of the common data sources currently available, like the US Census Bureau’s map data and the Reuters corpus, but his commenters came up with a few more. (See? This is why blog comments matter).

In addition, as CNet and Ryan Stewart’s blog spread the story, more people chimed in with suggestions. And of course, the Hacker News guys had some more ideas themselves.

So what did everyone come up with? A lot of data sources are already freely available on the net, as it turns out, if you just know where to look. Here’s a summary, do you have anything to add?

CKAN (Comprehensive Knowledge Archive Network)

The CKAN site is a registry of open knowledge packages and projects. Here, you can find open knowledge resources or register one of your own. What kind of stuff can you find at CKAN? They mention a set of Shakespeare’s works, a global population density database, the voting records of MPs, or 30 years of US patents as some examples, but they also point you to some useful URLs, like flickr’s Creative Commons page, where photos can be searched by license type.



This project is attempting to assemble and interconnect the world’s best repository for raw data – like a giant, free, open almanac. The best way to describe it comes from MetaFilter, where the project was spotted recently: “Just as Wikipedia will help you find out something about everything, infochimps.org will help you find out everything about something.” What can you find there? Every wikipedia infobox, each infobox type in its own table, 50 years of global hourly weather data, all the tables from the US Census Statistical Abstract, oh and 100,000 official crossword words, too.



Not a data set in the traditional sense, but definitely a useful tool, OpenStreetMap is a free, editable map of the world where you can view, edit, and use your own geographical data. The project was started because most maps actually have legal or technical restrictions on their use.



A user-maintained community metadatabase


which collects music “metadata” like artist name, release title, list of tracks, etc. You can browse through the site or you can use a client program,

like their own taggers

, to help identify music collections. 



Dismissed by the blogosphere as a bad idea, if not downright evil, Jigsaw, the marketplace that pays you to give up other people’s contact info now boasts 7 million complete contacts for the taking.


This site is a community effort to extract structured info from Wikipedia and make that data publicly available on the web, essentially turning Wikipedia into a database you can query. Is this the beginnings of a semantic web? Check out their downloads section for the datasets and then scroll to the bottom for even more links to data sources on the web.


flickr wrappr

Where DBpedia takes Wikipedia and makes it semantic, flickr wrappr extends DBpedia with RDF links to photos posted on flickr. Here’s an example. Here’s another. This is pure geek hotness.


Freebase, an open, shared database of the world’s knowledge, received a lot of mentions in the comments, so this must be a good one. Community built and maintained, it pulls from open data sources like Wikipedia, MusicBrainz, and the SEC archives to create structured information on many topics, including more popular ones like movies, music, people, and locations. The site, unlike some of the others in this list, is also easy to navigate and well-designed, which makes it that much better to use.



Perhaps one of the less interesting items due to its dry subject matter – financial data – it’s certainly worth a mention because a free database of real-time and historical market data for trading systems and platforms is the kind of thing that really floats some people’s boats.


Thanks to LibraryThing, ThingISBN is the site’s first API, and even though its competitor became a paid service, ThingISBN is still free for non-commercial use. The API doesn’t just return the usual book data, but also something called “edition disambiguation,” meaning it also returns a list of “related” ISBNs—other editions, other media, and translations.


Like the title suggests, Numbrary is a library for numbers. This free service helps you find, use, and share numbers from public record data sets, like census data or the CIA World Factbook.



This site isn’t just a place to build or collect data sets, of which they have quite a nice list, but a place where you can interact with other number-lovin’ folks like yourself.


The Data Wrangling blog

This blog post lists a bunch, and I mean a bunch, of open datasets on the web, which just goes to show how much of a cursory list my post really is.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.