In what appears to us to be a new addition to many Google search results pages, queries about birth dates, family connections and other information are now being responded to with explicitly semantic structured information. Who is Bill Clinton’s wife? What’s the capital city of Oregon? What is Britney Spears’ mother’s name? The answers to these and other factual questions are now displayed above natural search results in Google and the information is structured in the traditional subject-predicate-object format, or “triples,” of semantic web parlance.
The answers aren’t found structured that way on the web pages they come from – Google appears to be parsing the semantic structure from semi or unstructured data. That’s something Microsoft paid over $100 million to try to do this summer when it acquired Powerset. Check out these screen shots below.
We’re sure that Google’s been doing this analysis for some time behind the scenes, but for the company to expose the data in this structured way and to include a link to view other sources appears new to everyone we’ve asked about it so far. We’ve got inquiries in with some people who specialize in search but our semantic web contacts say they’ve not seen it before. (Update: Some readers have said in comments that they’ve seen variations of this for some time, including a three year old Google program called “Direct Answers.” None of the coverage we’ve seen of that program offers the kind of examples we’re seeing here – but we’re not sure what to think! We’ll see how feedback goes.)
It appears that the feature isn’t being bucket tested, either, it is globally available. Could 3rd parties make use of the data now that it’s available in a structured format? Possibly. The search results pages aren’t being marked up with RDF in the HTML, which is a shame.
Is Google Creating Structured Data Where There Was None Before?
Bruno Haid of Austrian enterprise semantic startup System One pointed all this out to us and offers the following:
What’s interesting is that while Justin Timberlake’s mother is being parsed, amongst others, from http://www.celebritywonder.com/html/justintimberlake.html , there is no structured source visible that holds “Lynne” as string for Britney Spears mother. So either Google utilizes a trusted source that is not listed in “more sources” or they really extract that information from the unstructured text at http://ububu.com/BritneySpears.html . Which would make this whole thing quite huge.
This is really the crux of the question. To conclude that there is semantic analysis going on just because some of the info displayed appears in subject-predicate-object format would be a mistake (an after the fact, therefor because of the fact fallacy) but if those connections were being discovered by Google automatically when they where not displayed in a structured or straightforward way before – then we could conclude there’s some semantic analysis going on. That appears to be the case, but we may be wrong! (Update: For what it’s worth, Google’s Matt Cutts, often company’s public face when it comes to search algorithm changes, gave this very blog post a thumbs up on FriendFeed. On the other hand, ex-Googler Jonathan Betz says in comments that he lead Direct Answers when he was at the company and believes we’re just seeing an expansion of that program.)
Yahoo, Ask.com and Live.com are all unable to answer these same questions so clearly.
Many of the data points are being pulled in from the structured part of Wikipedia entries, which is interesting. Other sources are wide ranging, from a license plate website to Jason Calacanis’s Mahalo.
We’re not sure what to make of this – have readers seen it before? We think it’s new and we think it’s pretty interesting.
Why is This Important?
As we’ve said about the semantic web before: Once our software is capable of deriving meaning from web pages it looks at for us, there’s a whole lot of work that will already be done, allowing our human, creative minds to reach new heights. Structured data is a layer of standardized abstraction upon which new innovation can be created.
That’s why we’re interested to see what Google is doing.
The answers aren’t always accurate – try searching the birth date of Jesus Christ, for example. Yahoo! has far more clearly articulated what they intend to do with semantic data. None the less, Google now appears to be doing something that no one else is doing. Maybe readers here search for “Britney Spears’ mother” all the time, though, and have already seen this. We believe this may be different from the kinds of info-tips that have been shown above search results in the past, however.
If this speculation based on limited observation and Google is not exposing semantic data in search results – then a logical question would be, why not? Creating structured data where there previously was none is much harder than you might think. We hope that’s what Google is doing!