Written by Alex Iskold and edited by Richard MacManus.
There has been a lot of talk lately
about 2007 being the year when we will see companies roll out Semantic Web technologies.
The wave started with
John Markoff’s article in NY Times and got picked up by Dan Farber of ZDNet and in other media. For
background on the Semantic Web in this era, check out our post entitled The Road to the
Semantic Web. Also for a lengthy, but very insightful, primer on Semantic Web see Nova
Spivak’s recent article.
The media attention is not accidental. Because Semantic Web promises to help solve
information overload problems and deliver major productivity gains, there is a huge
amount of resources, engineering and creativity that is being thrown at the Semantic
Web.
What is also interesting is that there are different problems that need to be solved,
in order for things to fall into place. There needs to be a way to turn data into
metadata, either at time of creation or via natural language processing. Then there needs
to be a set of intelligence, particularly inside the browser, to take advantage of the
generated metadata. There are many other interesting nuances and sub-problems that need
to be solved, so the Semantic Web marketplace is going to have a rich variety of
companies going after different pieces of the puzzle. We are planning to cover some of
these companies working in the Semantic Web space, so watch out for more coverage here on
Read/WriteWeb.
Hakia: how is it different from Google?
The first company we’ll cover is Hakia, which is a
“meaning-based” search engine startup getting a bit of buzz. It is a venture-backed,
multi-national team company headquartered in New York – and curiously has former US
senator Bill Bradley as a board member. It launched its beta in early November this year,
but already ranks around 33K on Alexa – which is impressive. They are scheduled to go
live in 2007.
The user interface is similar to Google, but the engine prompts you to enter not just
keywords – but a question, a phrase, or a sentence. My first question was: What is
the population of China?
As you can see the results were spot on. I ran the same query on Google and got very
similar results, but sans flag. Looking carefully over the results in Hakia, I noticed
the message:
“Your query produced the Hakia gallery for China. What else do you want to know about
China?”
At first this seems like a value add. However, after some thinking about it – I am not
sure. What seems to have happened is that instead of performing the search, Hakia
classified my question and pulled the results out of a particular cluster – i.e. China.
To verify this hypothesis, I ran another query: What is the capital of china?.
The results again suggested a gallery for China, but did not produce the right answer.
Now to Hakia’s credit, it recovered nicely when I typed in:
Hakia experiments
Next I decided to try out some of the examples that the Hakia team suggests on its
homepage, along with some of my own. The first one was Why did the chicken cross the
road?, which is a Hakia example. The answers were fine, focusing on the ironic
nature of the question. Particularly funny was Hakia’s pick:
My next query was more pragmatic: Where is the Apple store in Soho? (another
example from Hakia). The answer was perfect. I then performed the same search on Google
and got a perfect result there too.
Then I searched for Why did Enron collapse?. Again Hakia did well, but not
noticeably better than Google. However, I did see one very impressive thing in Hakia. In
its results was this statement: Enron’s collapse was not caused by overstated
resource reserves, but by another kind of overstatement. This is pretty witty….
but I am still not convinced that it is doing semantic analysis. Here is why: that reply
is not constructed out of words because Hakia understands the semantics of the
question. Instead, it pulled this sentence out of one of the documents which had a high
rank, that matches the Why did Enron collapse? query.
In my final experiment, Hakia beat Google hands down. I asked Why did Martha
Stewart go to jail? – which is not one of Hakia’s homebrewed examples,
but it is fairly similar to their Enron example. Hakia produced perfect results for the
Martha question:
Hakia is impressive, but does it really understand meaning?
I have to say that Hakia leaves me intrigued. Despite the fact that it could not
answer What does Hakia mean? and despite the fact that there isn’t sufficient
evidence yet that it really understands meaning.
It’s intriguing to think about the old idea of being able to type a question into a
computer and always getting a meaningful answer (a la the Turing test). But right now I
am mainly interested in Hakia’s method for picking the top answer. That seems to be
Hakia’s secret sauce at this point, which is unique and works quite well for them.
Whatever heuristic they are using, it gives back meaningful results based on analysis of
strings – and it is impressive, at least at first.
Hakia and Google
Perhaps the more important question is: Will Hakia beat Google? Hakia itself
has no answer, but my answer at this point is no. This current version is not exciting
enough and the resulting search set is not obviously better. So it’s a long shot that
they’ll beat Google in search. I think if Hakia presented one single answer for each
query, with the ability to drill down, it might catch more attention. But again, this is
a long shot.
The final question is: Is semantical search fundamentally better than text
search?. This is a complex question and requires deep theoretical expertise to
answer it definitively. Here are a few hints….
Google’s string algorithm is very powerful – this is an undeniable fact. A
narrow focused vertical search engine, that makes a lot of assumptions about the
underlying search domain (e.g. Retrevo) does
a great job in finding relevant stuff. So the difficulty that Hakia has to overcome is to
quickly determine the domain and then to do a great job searching inside the domain. This
is an old and difficult problem related to the understanding of natural language and AI. We
know it’s hard, but we also know that it is possible.
While we are waiting for all the answers, please give Hakia a try and let us know what
you think.