Home Search and Rescue: 6 Approaches to Semantic Data Collection

Search and Rescue: 6 Approaches to Semantic Data Collection

It’s been more than ten years since Tim Berners-Lee first spoke about the semantic web and computers indexing all web-based data. He said, “The day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.” Since then a handful of companies have attempted to tackle the issue of machine-based indexing and language interpretation. None of them are perfect. Below are 6 unique approaches to semantic data collection.

1. Powerset

This site was one of the first to publicly apply machine-based natural language processing to a consumer search engine. Nevertheless, because public expectations were so high, when Powerset launched a Wikipedia-only beta,

reviewers were harsh.

The site was acquired by Microsoft shortly after the initial launch and the team has been low key ever since. While Powerset is one of the definitive semantic engines in existence, Microsoft is currently concentrating on using Powerset’s technology to index Wikipedia pages in Bing. Powerset’s search result pages actually contain a “Try this on Bing Reference” note in the sidebar of the site.

2. Cuil

This team touted its language processing product as being much faster to index pages than Google; however, consumers rarely covet speed over quality and the site

was criticized right from the start

. Expectations were not met as Cuil’s claim to 120 billion pages indexed did not match up to the results on

Google’s reported 1 trillion unique URLs.

However, what Cuil did right was separate related search results from regular web results. That being said, without any human intervention, the related results are often bizarre and irrelevant. For instance, my name produces the rankings of Ultimate Fighting Challenge Champions.

3. Hakia

This is a natural language search engine where sponsored results, regular web results and “credible” web results are broken down visually into separate categories. Similar to Wikipedia, Hakia

employs a community monitoring system for credibility

and “credible” results must be peer reviewed and seemingly free of corporate interest. One of the great features of Hakia is that users can tab over the site to show only images or news.

4. Worio

Worio is considered a “discovery engine” as it is not technically a search engine destination site. While users are still required to visit the

Worio destination

, search is actually powered by Yahoo, Google or Windows Live search. Regular web results appear in the larger left-side column and natural language-based “discoveries” appear on the right. These discoveries are further refined by personal bookmarks and shared relevancy with Facebook friends.

5. Ubiquity

Ubiquity for Firefox from Aza Raskin on Vimeo.

Ubiquity is perhaps the opposite of a semantic web engine, but it serves a similar function for those looking to aggregate useful data. The Firefox plugin allows users to create command lines that incorporate natural language search with a series of mashups. Users can then combine relevant data from Craigslist, translation tools, maps, reviews and social networks for easy user visualization. While the end product is an extremely useful document, users may not be ready for the drastic behavioral change of using command lines for semantic data collection.

6. Semanti

From a consumer standpoint, Semanti sits somewhere on the spectrum between Worio and Ubiquity. ReadWriteWeb

reviewed the product earlier this week

and like Ubiquity it is a Firefox plug-in rather than a destination site. However, like Worio, it employs leading search engines, bookmarking and Facebook friends to produce results. Semanti’s key difference is that it prompts users to choose from multiple definitions prior to completing the search. Decision-making is actually human-powered rather than machine-powered. CEO, Bruce Johnson, said, “I tried machine-based semantic tagging, but my priority has always been a faster search experience.” While this is not the “use of intelligent agents” that Berners-Lee suggested, it is a “semantic” tool in that it helps the user distill meaning and relevancy from language.

If you’ve got more examples of semantic data collection tools, list them in the comments below.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.