• Show

Hakia - First Meaning-based Search Engine

December 7th, 2006

Written by Alex Iskold and edited by Richard MacManus.

There has been a lot of talk lately about 2007 being the year when we will see companies roll out Semantic Web technologies. The wave started with John Markoff's article in NY Times and got picked up by Dan Farber of ZDNet and in other media. For background on the Semantic Web in this era, check out our post entitled The Road to the Semantic Web. Also for a lengthy, but very insightful, primer on Semantic Web see Nova Spivak's recent article.

The media attention is not accidental. Because Semantic Web promises to help solve information overload problems and deliver major productivity gains, there is a huge amount of resources, engineering and creativity that is being thrown at the Semantic Web. 

What is also interesting is that there are different problems that need to be solved, in order for things to fall into place. There needs to be a way to turn data into metadata, either at time of creation or via natural language processing. Then there needs to be a set of intelligence, particularly inside the browser, to take advantage of the generated metadata. There are many other interesting nuances and sub-problems that need to be solved, so the Semantic Web marketplace is going to have a rich variety of companies going after different pieces of the puzzle. We are planning to cover some of these companies working in the Semantic Web space, so watch out for more coverage here on Read/WriteWeb.

Hakia: how is it different from Google?

The first company we'll cover is Hakia, which is a "meaning-based" search engine startup getting a bit of buzz. It is a venture-backed, multi-national team company headquartered in New York - and curiously has former US senator Bill Bradley as a board member. It launched its beta in early November this year, but already ranks around 33K on Alexa - which is impressive. They are scheduled to go live in 2007.

The user interface is similar to Google, but the engine prompts you to enter not just keywords - but a question, a phrase, or a sentence. My first question was: What is the population of China?

As you can see the results were spot on. I ran the same query on Google and got very similar results, but sans flag. Looking carefully over the results in Hakia, I noticed the message:

"Your query produced the Hakia gallery for China. What else do you want to know about China?"

At first this seems like a value add. However, after some thinking about it - I am not sure. What seems to have happened is that instead of performing the search, Hakia classified my question and pulled the results out of a particular cluster - i.e. China. To verify this hypothesis, I ran another query: What is the capital of china?. The results again suggested a gallery for China, but did not produce the right answer. Now to Hakia's credit, it recovered nicely when I typed in:

Hakia experiments

Next I decided to try out some of the examples that the Hakia team suggests on its homepage, along with some of my own. The first one was Why did the chicken cross the road?, which is a Hakia example. The answers were fine, focusing on the ironic nature of the question. Particularly funny was Hakia's pick:

My next query was more pragmatic: Where is the Apple store in Soho? (another example from Hakia). The answer was perfect. I then performed the same search on Google and got a perfect result there too. 

Then I searched for Why did Enron collapse?. Again Hakia did well, but not noticeably better than Google. However, I did see one very impressive thing in Hakia. In its results was this statement: Enron's collapse was not caused by overstated resource reserves, but by another kind of overstatement. This is pretty witty.... but I am still not convinced that it is doing semantic analysis. Here is why: that reply is not constructed out of words because Hakia understands the semantics of the question. Instead, it pulled this sentence out of one of the documents which had a high rank, that matches the Why did Enron collapse? query.

In my final experiment, Hakia beat Google hands down. I asked Why did Martha Stewart go to jail? - which is not one of Hakia's homebrewed examples, but it is fairly similar to their Enron example. Hakia produced perfect results for the Martha question:

Hakia is impressive, but does it really understand meaning?

I have to say that Hakia leaves me intrigued. Despite the fact that it could not answer What does Hakia mean? and despite the fact that there isn't sufficient evidence yet that it really understands meaning. 

It's intriguing to think about the old idea of being able to type a question into a computer and always getting a meaningful answer (a la the Turing test). But right now I am mainly interested in Hakia's method for picking the top answer. That seems to be Hakia's secret sauce at this point, which is unique and works quite well for them. Whatever heuristic they are using, it gives back meaningful results based on analysis of strings - and it is impressive, at least at first.

Hakia and Google

Perhaps the more important question is: Will Hakia beat Google? Hakia itself has no answer, but my answer at this point is no. This current version is not exciting enough and the resulting search set is not obviously better. So it's a long shot that they'll beat Google in search. I think if Hakia presented one single answer for each query, with the ability to drill down, it might catch more attention. But again, this is a long shot.

The final question is: Is semantical search fundamentally better than text search?. This is a complex question and requires deep theoretical expertise to answer it definitively. Here are a few hints.... 

Google's string algorithm is very powerful - this is an undeniable fact. A narrow focused vertical search engine, that makes a lot of assumptions about the underlying search domain (e.g. Retrevo) does a great job in finding relevant stuff. So the difficulty that Hakia has to overcome is to quickly determine the domain and then to do a great job searching inside the domain. This is an old and difficult problem related to the understanding of natural language and AI. We know it's hard, but we also know that it is possible. 

While we are waiting for all the answers, please give Hakia a try and let us know what you think.

Tags:
comments powered by Disqus