DBpedia is a community driven effort that treats Wikipedia like a database, enabling people to do more sophisticated queries, distribute the open encyclopedia’s data to the Web and add back to Wikipedia for the purposes of enriching it.
In a blog post this week, the community showed again what makes the service a unique effort with the launch of the latest version of the technology.
You can get into the weeds pretty quick with DBpedia when seeking to better understand how it functions. We see it useful to think of it in terms of context. On Wikipedia, you can do keyword searches for the Rhine River in Germany. But you can’t ask it questions that have more context such as the rivers that flow into the Rhine that are longer than 50 kilometers. DBpedia is intended to create a new data layer on the Web and on Wikipedia so context is better by sharing more data and automatically mapping it across a network of distributed systems.
For example, here are the DBpedia results for the rivers that flow into the Rhine, using DBpedia:
This latest release from DBpedia is based upon data extractions it did in October and November of 2010. More than 3.5 million pieces of data were extracted. This data included information about:
- 364,000 people
- 420,000 places
- 99,000 music albums
- 54,000 films
- 16,500 video games
- 148,000 organizations
- 148,000 species
- 5,200 diseases
In total, 286 million pieces of information were extracted from the English edition of Wikipedia and 386 million were extracted from other language editions and links to external data sets.
To understand the scope it’s important to recognize the value of a database, which works best when the data in it is continually enriched. On the Web, there is an increasing requirement for distributed data systems that can connect pieces of information to different places. What we are seeing is the emergence, an ontology so to speak, that shows the connections to the data and its paths to making it more valuable and insightful.
DBpedia is a fascinating project, showing the importance again for communities to develop distributed systems that enhance the Web and in turn enrich Wikipedia.