It's been more than ten years since Tim Berners-Lee first spoke about the semantic web and computers indexing all web-based data. He said, "The day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The 'intelligent agents' people have touted for ages will finally materialize." Since then a handful of companies have attempted to tackle the issue of machine-based indexing and language interpretation. None of them are perfect. Below are 6 unique approaches to semantic data collection.
1. PowersetThis site was one of the first to publicly apply machine-based natural language processing to a consumer search engine. Nevertheless, because public expectations were so high, when Powerset launched a Wikipedia-only beta, reviewers were harsh. The site was acquired by Microsoft shortly after the initial launch and the team has been low key ever since. While Powerset is one of the definitive semantic engines in existence, Microsoft is currently concentrating on using Powerset's technology to index Wikipedia pages in Bing. Powerset's search result pages actually contain a "Try this on Bing Reference" note in the sidebar of the site.
2. CuilThis team touted its language processing product as being much faster to index pages than Google; however, consumers rarely covet speed over quality and the site was criticized right from the start. Expectations were not met as Cuil's claim to 120 billion pages indexed did not match up to the results on Google's reported 1 trillion unique URLs. However, what Cuil did right was separate related search results from regular web results. That being said, without any human intervention, the related results are often bizarre and irrelevant. For instance, my name produces the rankings of Ultimate Fighting Challenge Champions.
3. HakiaThis is a natural language search engine where sponsored results, regular web results and "credible" web results are broken down visually into separate categories. Similar to Wikipedia, Hakia employs a community monitoring system for credibility and "credible" results must be peer reviewed and seemingly free of corporate interest. One of the great features of Hakia is that users can tab over the site to show only images or news.
4. WorioWorio is considered a "discovery engine" as it is not technically a search engine destination site. While users are still required to visit the Worio destination, search is actually powered by Yahoo, Google or Windows Live search. Regular web results appear in the larger left-side column and natural language-based "discoveries" appear on the right. These discoveries are further refined by personal bookmarks and shared relevancy with Facebook friends.
6. SemantiFrom a consumer standpoint, Semanti sits somewhere on the spectrum between Worio and Ubiquity. ReadWriteWeb reviewed the product earlier this week and like Ubiquity it is a Firefox plug-in rather than a destination site. However, like Worio, it employs leading search engines, bookmarking and Facebook friends to produce results. Semanti's key difference is that it prompts users to choose from multiple definitions prior to completing the search. Decision-making is actually human-powered rather than machine-powered. CEO, Bruce Johnson, said, "I tried machine-based semantic tagging, but my priority has always been a faster search experience." While this is not the "use of intelligent agents" that Berners-Lee suggested, it is a "semantic" tool in that it helps the user distill meaning and relevancy from language.
If you've got more examples of semantic data collection tools, list them in the comments below.