Udi Manber, Google's VP of Engineering, gave a brief 15 minute presentation at Supernova today entitled Search is a Hard Problem. He explained that with an audience like Supernova, he imagines we understand to some extent how difficult a problem it is, but it's probably a harder problem then we even appreciate. He laid out three reasons why this is the case:

  • Scale and diversity are almost beyond comprehension
  • Expectations and needs will continue to grow
  • 20 to 25% of the queries we see today, we have never seen before

I found the third point quite amazing. I would think with the number of queries that Google processes, they would have seen a much higher percentage of the queries before.

A Deeper Understanding

Next Udi explained that there are three levels involved in trying to deliver relevant information back to users:

  • Users and Queries
  • Models
  • Languages

Users and Queries

Udi gave some examples of Google's ability to understand the different between two very similar queries. For example, Google understands that 'GM' stands for 'General Motors', while 'GM foods' is actually 'genetically modified.' If you search for 'B&B AB', Google knows that is 'bed and breakfast in Alberta', while 'Ramstein AB' is 'Ramstein Airbase'.

Google also will recommend queries that may deliver better results. For example, if you query 'Types of dogs' it will give results, but also suggests 'breeds of dogs' as a better search.

He then explained that they still can't find all the answers. As a fun example, he said the query "Why Search is Hard" is actually a very difficult query for Google to parse.

Models

Next Udi reviewed some new Google search functionality, which while not live yet - will be soon. Apparently, Google is going to start trying additional queries based on certain user queries. For example, the query "How much does it cost for an exhaust system" will pull up results from "cost of an exhaust system." Beyond just removing certain general words, they are also interpreting the question as part of the model; for example the following two queries:

  • ‘overhead view of bellagio pool' to 'bellagio pool pictures'
  • ‘fedora 5 losing network connections' to 'fedora 5 network configuration’

Different Queries for Different Locations

Finally, Udi talked about how results need to be different when the query is conducted in different locations. For example, the query 'government' needs to return results about your countries' government. I haven't tried this in other countries, but here in San Francisco the first result is for the US Government.

He also reviewed a tool at Google.com.eg, which actually takes a query in another language, translates it to english, runs the query, and then returns the results in that language. You can actually view the page in that language. There are a whole suite of language tools Google seems to be leveraging at: http://www.google.com/language_tools

I'm surprised there aren't more copyright issues here, but I'm not a lawyer. For an example, here is the Read/WriteWeb homepage translated into Arabic. Apparently, when Udi was demoing this for Larry Page, he asked why the images weren't translating. Obviously, there is still work to be done, but it is quite amazing.

Conclusion

After listening to Udi's talk, I must agree that while I thought search was complex, I probably underestimated some of the areas of real difficulty. It is amazing to step back and think about how conceptually complex this is. It sheds new light on many of our challenging experiences around the AltSearchEngine's Day without Google.