Search and Rescue: 6 Approaches to Semantic Data Collection

It’s been more than ten years since Tim Berners-Lee first spoke about the semantic web and computers indexing all web-based data. He said, “The day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.” Since then a handful of companies have attempted to tackle the issue of machine-based indexing and language interpretation. None of them are perfect. Below are 6 unique approaches to semantic data collection.

1. Powerset

This site was one of the first to publicly apply machine-based natural language processing to a consumer search engine. Nevertheless, because public expectations were so high, when Powerset launched a Wikipedia-only beta,

reviewers were harsh.

The site was acquired by Microsoft shortly after the initial launch and the team has been low key ever since. While Powerset is one of the definitive semantic engines in existence, Microsoft is currently concentrating on using Powerset’s technology to index Wikipedia pages in Bing. Powerset’s search result pages actually contain a “Try this on Bing Reference” note in the sidebar of the site.

2. Cuil

This team touted its language processing product as being much faster to index pages than Google; however, consumers rarely covet speed over quality and the site

was criticized right from the start

. Expectations were not met as Cuil’s claim to 120 billion pages indexed did not match up to the results on

Google’s reported 1 trillion unique URLs.

However, what Cuil did right was separate related search results from regular web results. That being said, without any human intervention, the related results are often bizarre and irrelevant. For instance, my name produces the rankings of Ultimate Fighting Challenge Champions.

3. Hakia

This is a natural language search engine where sponsored results, regular web results and “credible” web results are broken down visually into separate categories. Similar to Wikipedia, Hakia

employs a community monitoring system for credibility

and “credible” results must be peer reviewed and seemingly free of corporate interest. One of the great features of Hakia is that users can tab over the site to show only images or news.

4. Worio

Worio is considered a “discovery engine” as it is not technically a search engine destination site. While users are still required to visit the

Worio destination

, search is actually powered by Yahoo, Google or Windows Live search. Regular web results appear in the larger left-side column and natural language-based “discoveries” appear on the right. These discoveries are further refined by personal bookmarks and shared relevancy with Facebook friends.

5. Ubiquity

Ubiquity for Firefox from Aza Raskin on Vimeo.

Ubiquity is perhaps the opposite of a semantic web engine, but it serves a similar function for those looking to aggregate useful data. The Firefox plugin allows users to create command lines that incorporate natural language search with a series of mashups. Users can then combine relevant data from Craigslist, translation tools, maps, reviews and social networks for easy user visualization. While the end product is an extremely useful document, users may not be ready for the drastic behavioral change of using command lines for semantic data collection.

6. Semanti

From a consumer standpoint, Semanti sits somewhere on the spectrum between Worio and Ubiquity. ReadWriteWeb

reviewed the product earlier this week

and like Ubiquity it is a Firefox plug-in rather than a destination site. However, like Worio, it employs leading search engines, bookmarking and Facebook friends to produce results. Semanti’s key difference is that it prompts users to choose from multiple definitions prior to completing the search. Decision-making is actually human-powered rather than machine-powered. CEO, Bruce Johnson, said, “I tried machine-based semantic tagging, but my priority has always been a faster search experience.” While this is not the “use of intelligent agents” that Berners-Lee suggested, it is a “semantic” tool in that it helps the user distill meaning and relevancy from language.

If you’ve got more examples of semantic data collection tools, list them in the comments below.

Search and Rescue: 6 Approaches to Semantic Data Collection

1. Powerset

2. Cuil

3. Hakia

4. Worio

5. Ubiquity

6. Semanti

Most Popular Gambling Stories

Latest News

House lawmakers advance bipartisan bill targeting federally regulated sports prediction markets

Evolution pays UK settlement after gambling games reached unlicensed British websites investigation

Polymarket challenges France block, Kalshi launches Midterms Hub, today in Prediction Market News LIVE

Tabcorp pays more than $1.9M after ACMA uncovers widespread breaches

Popular Topics

Search and Rescue: 6 Approaches to Semantic Data Collection

1. Powerset

2. Cuil

3. Hakia

4. Worio

5. Ubiquity

6. Semanti

About ReadWrite’s Editorial Process

Related News

Most Popular Gambling Stories

Latest News

House lawmakers advance bipartisan bill targeting federally regulated sports prediction markets

Popular Topics<img width="16" height="17" src="https://readwrite.com/wp-content/themes/twentytwentyone-child/images/Arrow-right.svg" alt="Arrow right.svg"/>

Popular Topics