Interview with Google's Matt Cutts about Next-Generation Search

Last week I had the pleasure
of interviewing the head of Google’s Webspam team, Matt Cutts. The topic of our conversation was
Next-Generation Search. In my pitch to get an interview with someone at Google, I
explained how Read/WriteWeb has been covering Next-Gen search a lot and so it
would (obviously) be great to get Google’s views on this topic!

Matt Cutts is a well-known Google identity, who apparently gets mobbed by fans at SEO
conferences. His Wikipedia page
states that he co-invented one of the most well-known patent filings from Google,
involving search engines and web spam. One note about the following interview: Google has a policy of not discussing competitors, so a few of my original questions had to be dropped or re-phrased.

Richard: When we write about ‘next-generation search’ on Read/WriteWeb, a
lot of times we position it as: how can a startup become the next Google? But obviously
Google is also hard at work with next-generation search technologies. Can you give us an
overview of what Google is working on in regards to next-gen search – e.g. personalized
search, AI, etc.

Matt: I think personalization has a very high chance of being able to improve
search for the average user. One of the great things about it is, you don’t really have
to do a lot of work. Once you decide this is something you’re interested in, Google can
take care of a lot of the details. I recently saw a post online where somebody was
complaining about metadata and having to make metadata – and the nice thing about
personalization is that it’s free for the user. So as far as the next generation of
search, I think that is something that is very exciting.

Richard: Can you give us a couple of examples of how Google is implementing
personalization?

Matt: I think of localization as a type of personalization. If you type in a
query like “football”, that will give you different results in the US versus the UK. And
a query like “bank” done on Google – in New Zealand it will get New Zealand banks, in
Australia it will get Australian banks. And it makes a big difference to know those sorts
of things. So that’s just personalization at a country level, but it already shows the
sort of potential that you can reach.

Matt Cutts flanked by two fans; Photo by Chris
Pirillo, inspired by Jim Boykin.

Richard: Also recently Google
implemented personalization with Google Accounts, so I believe personalization can
happen out of that, via the main Google search?

Matt: Absolutely, yes. It’s nice because the mental model that users have to
keep has been simplified. So now if you’re signed into Google search, we will be able to
help personalize your search results. And that’s a really nice win, because it’s much
easier for people to know. If I don’t want personalized search results, I can just click
in the top right and sign out. But if I am signed in, I can check that by just looking at
my email address in the top right – then I know that I’m benefiting from personalization
automatically.

Richard: What do you think about semantic technologies (like for example
Hakia)? How important is natural language understanding for search and is Google
doing anything in this direction?

Matt: We do pay a lot of attention to a lot of different technologies, so I
would define Google’s approach as very pragmatic. And we keep an eye on the entire space
and we try to say, ‘ok what are the areas that are most promising for users?’
Historically it’s always interesting to view the progress of semantic technologies. For
example if you do a search like: ‘how many states are in America?’. Some search engines
that claim to be semantic won’t do a good job in delivering the right results, whereas
Google can do a very good job – even if you think, ‘ok how can they handle natural
language, or how can they handle the semantics of that search.’ And I think what
Google benefits from is the sheer size of the Web and the sheer amount of data, and it
really does help us understand the meanings of words and synonyms. So we do have a
pragmatic approach and we don’t necessarily place all our bets on one particular way of
doing things. We are exploring a lot of different things all at once.

Richard: So you would say that Google is already doing that kind of semantic
technology, that it’s just integrated into the current service you provide?

Matt: Yeah, I would say there’s a lot of semantic technology already built in,
under the hood of Google.

Richard: One of the most popular posts this year on R/WW was one called The
Top 100 Alternative Search Engines. What are some of the “alternative” search engines
that have most impressed you lately? Or if you can’t mention names, what are some of the
technologies that impress you? The February
list had 32 changes and so it perhaps indicates the sheer speed of innovation in
search.

Matt: You also did a really good job in another
post, where you had a poll that asked what would be next [in search]. It was
interesting that 209 votes were for personalized search, and after that Artificial
Intelligence. I think a lot of those trends are very interesting. Having a lot of data,
we are able to try things as different as visualization, all the way up to things like
clustering, or query refinement. Sometimes at the bottom of our search results, if we
think it’s relevant, we’ll take the user’s query and suggest other related queries. And
that’s something that Google didn’t launch for a while, but we wanted to test it and get
the best possible result. It didn’t make sense to launch it until we found a combination
that we thought was very good for the user. But I do think that we watch a lot of those
different technologies and try to stay aware of what people are doing in the industry and
what people are trying.

Richard:SearchMash is an experimental
site from Google [introduced around Oct/Nov 2006], with some new Ajax-powered UI
ideas. Can we expect any of the SearchMash features to be implemented into the main
google.com UI any time soon?

Matt: There is a possibility, but not a guarantee that the features you see on
SearchMash will be seen on Google search. It’s always a trade-off and we have to consider
things like how well something might be supported by different browsers, how much users
like it, and also how much screen real estate or time to ramp up on a feature it might
take. For example there was an interesting feature on SearchMash where you could start
typing anywhere on the page and it would start filling in the search box for you. But
that wouldn’t work with every single browser. I think the big value in SearchMash is that
it lets us try a lot of very different user interfaces – things that might throw your
average user. And we can try out those really unusual interfaces and see how people
respond.

Richard: On our Alt
Search Engine list, there were some search engines with amazing UIs – e.g. one had a
talking avatar. So I guess you could, in future, experiment with that kind of UI on
SearchMash…

Matt: Yeah, it’s fun because once you step off the Google domain, you’ve got a
lot more freedom to try different things – including bringing in image results, results
from news, all sorts of fun things. So it’s a fun playground to have, and I’m glad that
we introduced it.

Richard:Google Base is essentially a
database of structured content and home for many different verticals currently (jobs,
vehicles, classified). There’s also GData and the Google
Base API. Can you explain how all these things fit together and what (if any) impact
it will have on search going forward? I presume that structured data will become very
useful for Google search over time, so perhaps you could help our readers understand that
some more…

Matt: It’s certainly the case that structured data is really interesting,
because once you have data in different fields, you can imagine doing different types of
searches over it. And GData is especially interesting, because it almost provides a way
to plug data into Google. Which throws up a lot of interesting possibilities. For
example, Google’s had a couple of other types of searches – we’ve had patent search, code
search, book search – and those are slightly different verticals, a little more
free-form. But you could certainly imagine being able to search over new verticals; and
having that fielded search, or the structured content (however you want to refer to it)
can definitely be really useful as far as letting people have more flexibility. So I’m
pretty excited about it, but it’s always hard to say how things will go in the future and
the direction things will go.

Richard: Do you have any plans for vertical search
beyond blogs, I mean the major verticals… for example Microsoft
bought a health search company recently. So is Google going to do anything in those
major verticals?

Matt: Well, there are two answers to that. Firstly things like patent search,
code search, book search – whether you want to call them vertical search is kind of up
for dispute. They search over different types of data. So for example with Google
Calendar, being able to search over calendar data or Gmail being able to search over
email, is an entirely different and new capability. And really, really interesting. I’ll
let you decide whether to call that vertical search or not.

My second answer though, is
that I think it’s really interesting that Google has taken a step back and looked at the
general issue of vertical search – and as a result has introduced Google Custom Search Engine (CSE). It’s built on
the power of Google Co-op, and the wonderful thing about it is that it lets anybody
define their own custom search engine. And not just something feeble, we’re talking about
the ability to add 5,000 URLs very easily – and not just to filter over them, but to be
able to boost for some sets of URLs, and detract or downgrade other sets of URLs.

So what’s really interesting to me is if you think about a new vertical, for example
podcasts, you could certainly have Google say: ‘well ok, how do we search over
podcasts?’ But if you go into Google Custom Search Engine, I think there’s been
dozens of people who’ve actually made their own podcast search engines – by using the
power of CSE. For example, the other day I found a search engine for ‘engineering
podcasts’, so you could search for Google and get all the podcasts about tech talk, etc.
I think that’s a really interesting approach. I’d certainly say that we want to return
the best results to users, so in some cases it might make sense for Google to look at
individual areas. But the general issue is often well addressed by giving the power to
the people, so to speak, and letting them build their own search engines. So it’s really
been fun to see just how many people have signed up for it, and how much growth the
custom search engine area is getting.

Richard: Your particular area of expertise is fighting spam. Can you tell us
the latest on how Google is trying to keep its results pure… what are some of the
trends in fighting spam?

Matt: We’ve done a lot of stuff to return better search results for users over
the last year, including on web spam. For example, we’ve got internal metrics that we
keep track of to show that we’re doing a much better job than even a couple of years ago,
to make sure that a user doesn’t randomly come across spam. One of the big trends last
year and continuing into this year is internationalization. It’s really important
for us to be able to offer spam-free search in any language, whether it’s French,
Italian, German, Chinese or Japanese. So a lot of what my team looks at is trying to make
sure that any new approach that we do, we are also able to do in a scalable and robust
way across many languages. So that’s probably the biggest trend.

Image courtesy of
stefan2904

Richard: With the acquisition last year of YouTube, together with Google
Video, being able to search and index video is obviously a key thing going forward. Not
to mention being able to insert advertising into video. What kind of things is Google
doing in the area of video search?

Matt: Video by itself is a lot more interesting and challenging to search,
because it’s got audio and visual components that are interesting and sometimes more
difficult to index than words alone. The nice thing is that by using a lot of different
information, Google can often return a very good set of search results. Even more than
you would expect sometimes, given how hard a type of content like video is to
index.

But it’s also fun because in the Web we have this notion of reputation – which is
PageRank, it’s how many people link to your site and it’s also the quality of that
incoming set of links. So it’s fun to think about things like reputation in video search
– whether it be for Google Video or YouTube – because you don’t have links necessarily.
You might have things that are somewhat similar to links, but you look at the quality of
the users, the quality of the ratings. I think in lots of ways it gives Google good
practice to think about the power of people, and the power of trust – and how to apply
that in a lot of different areas.

Conclusion: Our thanks to Matt for the illuminating interview about Google and
Next-Generation search! We would love to get peoples comments or questions on this topic,
so do leave a comment below.