Editor's note: we offer our long-term sponsors the opportunity to write 'Sponsor Posts' and tell their story. These posts are clearly marked as written by sponsors, but we also want them to be useful and interesting to our readers. We hope you like the posts and we encourage you to support our sponsors by trying out their products. This one is by Hakia, one of the participants in the recent 2009 Semantic Technology Conference.
Participants in the 2009 Semantic Technology Conference walked away considering fundamental questions about what is and isn't semantic technology. The relevance of this post's title will hopefully become clear by the end to those of you mischievous readers who may have stumbled upon it with other ideas. The conference was a great and well-organized affair in San Jose, California. One of the highlights was the Semantic Search Keynote panel, with all of the major players on stage (Ask, Bing, Google, Hakia, TrueKnowledge, and Yahoo!), as seen in the picture below.
Bear in mind that semantic technology can be as heavy and stifling for any audience as stem-cell research can be to high-school students. But Carla Thompson of Guidewire did a terrific job of coming up with discussion topics and moderating the panel. Everyone survived the ordeal without any sign of dozing.
Despite the positive outcome, some responses from the panelists made me wonder if we should go back to the basic question of, "What is semantic search?" Or, better yet, what isn't semantic search? Here is my list:
Folks, semantic technology is not structured data. A database that can, given the query "social drinking," pull up a list of beer brands, their manufacturers, and their contact information has nothing to do with semantics. Some people seem to have the impression that a search engine somehow uses semantic technology if it retrieves structured data for its results. It is a trick as old as the ancient Egyptians who used beats to organize harvesting information. Organized information is not semantic information.
If a search engine is robust and returns the same results for the query "top ten" as it does for "top 10" (i.e. it recognizes that "ten" means 10"), calling the search engine semantic would be a stretch. Anyone could come up with a substitution list like this without a drop of linguistic knowledge. Similarly, distinguishing the name "Fisher" from the noun "fisher" by detecting the capitalization of the first letter does not go beyond the application of simple linguistic rules. These capabilities are not semantic search capabilities.
A certain amount of semantic information can be salvaged from syntax. Unfortunately, if syntax were enough for us to detect the meaning of text, then an 8-year-old with perfect reading ability (i.e. who is able to syntactically parse strings of English-language letters) could be expected to understand the meaning of Shakespeare's works. The difference between reading and understanding is the difference between syntax and semantics. The former requires the skill to parse things out, while the latter requires vast amount of associative knowledge.
An infinite number of monkeys typing on an infinite number of keyboards would eventually come up with the complete text of the Declaration of Independence. This is a scientific statement; it is not a joke. However, if a search engine is expected to be semantically relevant using statistical algorithms, one would have to wait until the monkeys finished their job. Statistics have no place in semantic technology. A simple test would reveal that. For example, your brain is able to understand a unique sequence of words that you have never seen before, such as "Polar bears don't eat alligator eggs before dawn." If semantics were built on statistics, computers and algorithms would not understand this and billions of other sentences.
Scalability is the narrow bridge between science and technology. What you can carry from science to technology over this bridge determines the level of capabilities in the real world. The science of semantics is huge and stems from the roots of philosophy. But Web search is a very particular problem with stringent constraints (a narrow bridge). Designing semantic algorithms to drive a Web search engine is like walking on egg shells and requires a completely new approach. Thus, a semantic search algorithm could be very sophisticated but still not suitable for the Web.
These five areas cover what isn't semantic search and should help readers understand the questions that emerged from the Semantic Technology Conference. Structured data, morphology, syntax, statistics, and scalability are key areas to discuss moving forward. Of course, contrary to the title of this post, no one was actually afraid of asking these questions. But if you caught the reference in the title, that was your semantic brain in action, one last example of what is semantics technology.