Wikipedia? One startup technology company, recommendation service Hunch, announced today that it is dropping its use of Wikipedia data. Its stated reasons explain well why Wikipedia's incredible platform potential is not likely to be realized anytime soon.Creativity, they say, always builds on the past. So too do many wonderful things on the internet. What could be better to build the future on top of than our collective knowledge of the world as represented by
Wikipedia will celebrate its 10th birthday tomorrow and while it's changed the world in incredible ways, it enters its second decade in the same position it began: as a destination website. In an era of web services and applications, Wikipedia could be so much more. Wikipedia as an organization would like it to be more. Unfortunately, it's not well positioned to realize its full potential yet - and the world isn't ready for it, either.
What it Could Be
Wikipedia could be a provider of canonical descriptions of all kinds of things, a platform that applications of all sorts could use to populate themselves with rich information on almost any topic. Much like place databases are used by so many location-based technologies today. If you want to build an app that talks about different places around the world, you don't have to create a brand new map of places from scratch. That problem has been solved.
If Wikipedia entries could be used like this, software companies could focus on building fabulous experiences for their users, without having to worry about re-creating descriptions of all thing topics they are discussing.
"There is a fair amount of structured data that is reasonably machine readable," Wikipedia co-founder Jimmy Wales told ReadWriteWeb this week in a press call. "I'm definitely supportive in general of initiatives to take information from Wikipedia and do interesting things with it."
Steven Walling, a former ReadWriteWeb writer now with a fellowship at the Wikimedia Foundation, considers Wikipedia as a platform "an open question" - which must be a wiki nerd's most enthusiastic endorsement. "Our core mission is broad," he told us, "it's not just to create a web based encyclopedia, it's explicitly designed for reuse. Right now, Wikipedia is a destination site, but it's an open question.
It seems that Wikipedians want the site to be used as a service, though it's not clear they really want it enthusiastically.
Right now, Wikipedia isn't being used as a data platform very much - and it appears unlikely that it will be. It's a real loss for everyone.
The Problem With Wikipedia
The Hunch blog this morning announced that the company would no longer use Wikipedia entries to populate the descriptions of the things, like foods, vacations, articles of clothing etc., that it recommended to its users.
The reasons will probably feel familiar to many people:
- Many of the descriptions from Wikipedia were either overly general, overly technical, or very out of context for the specific result on Hunch.
- Search engines tend to penalize sites that have a significant amount of what they consider "non-original" content. Thus, by having many results with Wikipedia-like descriptions, we may have been limiting the visibility of those results in search engines and thus reducing the number of people who could use a search engine to find relevant information on Hunch.
Right: Hunch, describing something now not at all with the words of Wikipedia.
It's possible that the company may have other concerns it's not discussing, regarding its commercial dealings with 3rd parties, but both of those reasons above seem quite valid.
It's a real shame. As Creative Commons licensed content, the Wikipedia corpus seems like something that ought to commoditize high-quality knowledge, allowing software developers to build even more value on top of it. The Hunch team specializes in machine learning and recommendation technology. Why should it have to also recreate descriptions of the whole world it's drawing recommendations from? In theory, that's a solved problem - Wikipedia solved it.
Unfortunately, Wikipedia articles aren't written with this in mind and are often far too technical. They aren't written with re-use in mind, for example, they often don't put the most accessible content at the top.
The search engine problem is a whole other story. The duplicate content punishment clearly hasn't succeeded in weeding spam or low-quality content from search results. It seems like a hold-over from the days when good actors online were creating all their own original content by hand and bad actors were the only ones who used machines to work with data in bulk.
What the duplicate content penalty has done is punish innovative startups like Hunch, who have a lot to add in terms of technology but who would do best to rely on someone else's text data and descriptions of things.
"That's a shame," Walling says, "that someone feels they are being punished for reuse, but we can't change the way search engines work."
"One of the biggest reuse projects right now is Facebook community pages," Walling told us. "They reuse a lot of Wikipedia content. Our business development people worked with them to set that up, but their content isn't indexed by search engines, so that isn't an issue for Facebook."
That is an issue for smaller companies though. This isn't just a loss for Hunch, it's a loss for a whole ecosystem of startups that could serve their users with innovative new technology built on top of a commoditized standard of collaboratively edited descriptions of the world. As users, that means it's a loss for us.