Home ClearForest: a Top-Down Approach to Semantic Web

ClearForest: a Top-Down Approach to Semantic Web

By Alex Iskold

We’ve been writing recently
about the rise of semantic web and how in 2007 we’ll see many interesting semantic
technologies. The fundamental problem that all these technologies need to solve is
explaining the meaning of things to computers. There are several approaches to this, all
of which in principle can work.

There are companies and technologies that are doing it bottom up – by embedding
semantical annotations (meta-data) right into the data. The opposite camp is exploring
the top-down approach, which relies on analyzing existing information. The ultimate
top-down solution would be a fully blown natural language processor, which is able to
understand text like people do.

In this post, we are going to look at ClearForest – one of the companies in the top-down
camp. At first glance, you might not think much of the company’s web site, but a deeper
dive reveals that ClearForest is restructuring – to apply its core natural language
processing technology to facilitate next generation semantic applications. The fact that
ClearForest has released both a Web Service and a Firefox extension that leverages an API
to deliver the end-user application, says that the company gets what the next generation
web is all about.

Gnosis – Firefox extension for annotating web pages with semantics

The first Clear Forest product that we looked at was the Firefox extension called Gnosis. Here is how it is described
on the Mozilla extensions page:

“With a single click, Gnosis will identify the people, companies,
organizations, geographies and products on the page you are viewing. Using the built-in
navigation sidebar you can gain immediate understanding of the page’s
contents.”

Downloading and installing Gnosis was as easy as any Firefox add-on. We used the
Read/WriteWeb home page to try the extension. With one click from the menu, the page was
filled with various types of annotations. The current version of Gnosis recognized
Companies, Countries, Industry Terms, Organizations, People, Products and Technologies –
an impressive range of things. Each word that Gnosis recognized, got colored according to
the category.

This was interesting, but overwhelming. A better approach would be to have the
coloring appear on a mouse over or another gesture. But this is a usability nuance that
will get polished in the next iteration on the product. Overall, I was impressed. At an
instance, the page was analyzed and annotated. It was not perfect (it thoughts that all
the Jasons on the page were Jason Briggs), but it was more accurate than I expected it to
be.

Next I turned my attention to the sidebar. The extension created a categorised tree of
all words and phrases that it found on the page. We could expand and collapse each
category to find the terms. It looked like vertical search for a single page. It was
interesting and is very useful for blogs and lengthy pages.

Again, the interface needs to evolve – but the idea that key terms and concepts on any
page can be identified and organized in such a way seems compelling. In addition to the
organization, the extension offered to search for any keyword on Google, Wikipedia or
Technorati. If you are interested in a keyword, you are likely to want to find more
related information. So the context search seems like a logical extension of
categorisation, as it makes this data further searchable.

Overall, this seemed unpolished but intriguing. The question is, how does this work?
The Firefox page stated that this extension is based on a web service. So this is what I
want to explore next…

ClearForests’s Semantic Web Service (SWS)

Behind every great service there in an API. Modern web companies have re-discovered an
old software engineering wisdom – interfaces are a powerful way to build complex
software. Today we are seeing the rise of the most complex software system yet – a
service powered web. ClearForest has also recognized the value (both can be monetized
independently) of building a product on top of a service. Gnosis leverages the interface
to offer a powerful natural language processing service.

The Semantic Web Service (perhaps the name is a bit broad) offers the SOAP interface
for analyzing text, documents and web pages. The service returns the categorization and
annotation information which can be further leveraged by consumer facing applications
(the company recommends building mashups). I am fairly certain that SWS is powered by a
web crawler, because it is able to recognize people like Richard MacManus, Jason Biggs
and Alex Iskold. My guess is that the crawler is used to build a giant index, that is
then used by the document parser to annotate the terms in the document.

The service right now is free to try, but you need to contact ClearForest to use it
commercially. To encourage the usage of the service the company announced a mashup
contest. The contest was advertised on ProgrammableWeb and ended December 11th. It is not
clear to me that it was successful, as there are no announcements of winners and no
showcase – but it certainly seems like a creative way to promote the new API.

Conclusion

Clear Forest might not have a glamorous/Ajaxy web site and might not have a polished
product yet. But it is a company that has been around and has been backed by top tier VC firms.
Both the approach and technology are worth attention and consideration. Their natural
language processing technology, first applied to business data mining, is able to clearly
distill useful information. To offer it as a service shows the insight and the
understanding of the new market opportunities (think Amazon). And to create a Firefox
extension that showcases the technology demonstrates their desire and the readiness to go
mainstream. 

All these factors indicate that Clear Forest is worth watching. And it is yet another
brick to support the top-down semantic web approaches. Let us know what you think about
this company.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.