Is Google a Semantic Search Engine?

Written by Phill Midwinter,
a search engineer from the UK. This is a great follow-up to our article last
Friday, Hakia
Takes On Google With Semantic Technologies
.

What is a Semantic Engine?

Semantics are said to be ‘the next big thing’ in search engine
technology. We technology bloggers routinely drum up articles about it and
sell it to you, the adoring masses, as a product that will change your web
experience forever. Problem is, we often forget to tell you exactly what
semantics are – we just get so excited. So let’s explore this…

Wikipedia says:

“Semantics (Greeksemantikos, giving signs, significant, symptomatic, from sema, sign)
refers to the aspects of meaning
that are expressed in a language,
code,
or other form of representation. Semantics is contrasted with two other aspects
of meaningful expression, namely, syntax,
the construction of complex signs from simpler signs, and
pragmatics
, the practical use of signs by agents
or communities
of interpretation in particular circumstances and contexts. By the usual
convention that calls a study or a theory by the name of its subject matter, semantics
may also denote the theoretical study of meaning in systems of signs.‚Ä?

…which is absolutely no help.

Semantics as it relates to our topic, search engines, actually covers a few
closely related fields. In this instance what we are looking at deciphering (as
a basic example) is whether a computer can discern if there is a link
between two words, such as cat and dog. You and I both know
that cats and dogs are common household pets, and can be categorized as such.
The human brain seems to comprehend this easily, but for a computer it is a much
more complex task and one I won‚Äôt go into here – because it would most
likely bore you.

If we take as read then, that the search engine now has semantic
functionality, how does that enable it to refine its search
capability?

  • It can automatically place pages into dynamic categories, or tag them
    without human intervention. Knowing what topic a page relates to is
    invaluable for returning relevant results.
  • It can offer related topics and keywords to help you narrow your search
    successfully. With a keyword like sport the engine
    would offer you a list of sports perhaps as well as sports related news
    and blogs.
  • Instead of offering you the related keywords, the engine can directly
    incorporate them back into the search with less weight than the user
    inputted ones. It’s still contested as to whether this will produce better
    results or just more varied ones.
  • If the engine uses statistical analysis to retrieve it‚Äôs semantic
    matches to a keyword (as Google is likely to do) then its likely that
    keywords currently associated with hot news topics will bring those in as
    well. For example, using my engine to search for the keyword police, brought
    up peerages (relating to the uk’s cash for honors scandal recently).

So, according to me:

“A semantic search engine is a search engine that takes the sense
of a word as a factor in its ranking algorithm or offers the user a
choice as to the sense of a word or phrase.‚Ä?

This is not in line with the purists of what is known as ‘The Semantic
Web’, who believe that for some reason we should spend all our time tagging
documents, pages and images to make them acceptable for a computer to read.
Well, I’m sorry but I’m not going to waste my time tagging when a computer
is able to derive context and do it for me. I may have offended Tim Berners Lee
by saying this, but as the creator of the Web he should know better.

How does Google match up?

Until extremely recently, Google’s semantic technology (which they’ve had
now for quite a while) was limited to matching those adsense blocks to your
website’s content. This is neat, and a good practical example of the
technology – but not relevant to their core search product. However if you make
a single keyword search today, chances are you may spot a block like this
at the bottom of your results page:

This is more or less exactly what I was just writing about. They’re
offering you alternatives based upon your initial search, which in this case was
obviously for citizen. Citizen is a bank, a watchmaker and (if I’m not
mistaken) it means you’re a member of a country or something. This is the first
clear example of Google employing a semantic engine that works by analyzing the
context of words in their index and returning likely matches for sense.

Some of you may be wondering why they aren’t doing this for multiple
keyword phrases, which I can take a guess at from some of my own work. Analyzing
the context of a word statistically is intensive and slow; and if you try and
analyze two, you slow the process further and so on. It is likely they have
problems doing so for more than one keyword currently, and Google as ever is cautious
about changing their interface too radically too quickly. This
implementation of semantics gives hope that they haven’t adopted the purist
view of ‘The Semantic Web’ where everything is tagged and filed neatly into
nice little packages.

Google is all too aware of the following very large problems with that idea:

  • Users are stupid.
  • Users are lazy.
  • Redefining the way they‚Äôve indexed what is assumed to be petabytes of
    data would require them to effectively start again.
  • It‚Äôs not as powerful or dynamic.

How Google can utilize Semantic technologies

It’s my belief that Google will increasingly tie this technology into their
core search experience as it improves in speed and reliability. It has some
phenomenally powerful uses and I’ve taken the liberty of laying out a few of
my suggestions on where they can go with this:

Self aware pages

  • Tagging pages with keywords has always been used on the internet to let
    search engines know what kind content the page contains.
  • Using a Google API we can generate the necessary keywords on the fly as
    the page loads. This cuts out a large amount of work for SEO.
  • A Google API enabled engine wouldn‚Äôt even need to look at these
    keywords, it could generate them itself.
  • Not only a page can be self aware these days, people tag everything –
    including links. The Google API could conceivably be used to tag every
    single word on a page, creating a page that covers every single keyword
    possibility. This is overkill – but a demonstration of the power available.

Narrow Search

  • When you begin a search, you enter just one or two keywords in the topic
    you’re interested in.
  • Related keywords appear, which you can then select from to target your
    search and remove any doubts about dual meanings of a word for example.
  • This step repeats every time you search, also possible is opinionated
    search.

Opinionated Search

  • Because of the way Google statistically finds the senses of keywords from
    the mass of pages in its index, what in fact it finds is the majority
    opinion from those pages of what the sense of a word is.
  • At the base level, you can select from the average opinion of related
    keywords and subjects from its entire index.
  • You can find the opinion at other levels as well though, and this is where
    the power comes in in terms of really targeting what the user is looking for
    quickly and efficiently. All the following mean that this is the first true
    example of social search:
  • Add the sites or web pages to your personal profile that you think most
    closely reflect your opinions, this data can then be taken into account in
    all future searches returning greater personal relevancy.
  • Find the opinion over a range of dates, good for current events, modern
    history, changes in trends.
  • Find the opinion over areas of geography, or by domain extension (.co.uk,
    .com).
  • Find the opinion over a certain group of websites, or just one website
    in particular – compare that with another site.
  • Find the opinion not only over the above things but also subjects,
    topics, social and religious groups.
  • At the most ridiculous example level, you could even find what topics 18
    year olds on myspace living in Leeds most talk about – but that I could
    probably guess. The point is that this is targeting demographics on a
    really unprecedented level.

Conclusion

Google is using semantic technology, but is not yet a fully fledged semantic
search engine. It does not use NLP (Natural Language Processing), but this is not
a barrier to producing some truly web changing technology with a bit of thought
and originality. NLP may well be (I hate myself for writing this) web 4.0 and
semantics is web 3.0 – they are in fact different enough to be classified as
such in my eyes and the technology Hakia is developing is certainly markedly
distinct from Google’s semantic efforts.

There are barriers that Google needs to overcome… is it capable of becoming
fully semantic without modifying it’s index too drastically; can Google
continue to keep the results simple and navigable for its varied user base? Most
importantly, does Google intend to become a fully semantic search engine and to do
so within a timescale that won’t damage their position and reputation? I like to think that
although the dragon is sleeping, that
doesn’t mean it’s not dreaming!

Facebook Comments