This morning someone emailed me and asked me to send them a headshot for my bio. I went to Google image search and typed in alex iskold. The results that I got were a mix of photos of me, my diagrams, as well as photos and pictures of people and objects I had never seen before.
It was not surprising that Google Image search did not know what I look like, but it is curious how and why images got labeled with my name. It was tempting to experiment. This post is the result of my short image ego-surfing adventure, around the image search engines.
How Image Search Engines work
Most image search engines today classify images based on text - not on the content of the image. This means they do not run pattern recognition algorithms over images, but instead rely on the text that is part of the image label; i.e. tags as well as image surroundings.
So it is not reasonable to expect that an image search engine would be able to understand that I am searching for a picture of a person. But still, our expectations are that search results are correct and also consistent.
This is a actually a big problem with the text-based approach to searching for rich media. There are ways to improve the matching by being more conservative, by presenting results differently, and by letting the users tag the images - or even vote with their clicks! Let's first take a look at my image ego-searching to see how various engines do. Then we'll discuss ways that text-based image search can evolve.
So what does Google think Alex Iskold looks like?
Here are some top results when searching for alex iskold on Google Image Search:
So there are certainly matches, but there are also a lot of mismatches. The two headshots that came up of different people (the two in the left column) is confusing. Someone searching for me would have no idea which one is the correct one - now that is a bad false positive! Another problem, admittedly a much smaller one, is that it pulled out the picture of Mike Arrington at the TechCrunch NY party. But I suppose that's flattering, so I can live with that :-).
Where do bad results come from?
The first problem that the Google algorithm ran into was came from Amazon profile pages. In the picture below, you can see that a picture of an individual called Alexander Arvei and my name appear together - on a profile page of a third Amazon user. Curiously, there are multiple images of Alexander, multiple mentions of my name and also the string Alex is found inside the string Alexander. So it seems that a combination of all these things threw Google off.
The second match also came from an Amazon profile, confusing me with a person named Allan Heydon. The reason for this was much less clear, as there was only a single occurrence of his photo and my name on this Amazon profile page.
Next I investigated why I got confused with Michael Arrington. The culprit was this post on bub.licio.us about the recent TechCrunch party in New York. In this post there are two pictures - one above the other - with my name in between. So the results are wrong because the pictures are not clearly labeled.
Google Image Search generated much worse results when I tried to tweak the search query and control the images selected. For example, when I selected the Small Images Only option, none of the results returned had my picture. There were two RSS buttons, what appeared to be a naked body part, two book covers from my Amazon lists, as well as two pictures of Bijan Sabet - who simply commented in his blog on one of my R/WW posts. The same thing happened when I searched for "alex iskold" in quotes - all my pictures went away! Desperate, I decided to try other search engines.
Checking other search engines
In a nutshell, other search engines I tested had the same mixed results - although less of them. Microsoft Live Search returned only 22 results:
Yahoo! Search returned only 9 results. To its credit, the first one was right. But it also returned a picture of a high school girl and a soccer player.
But the search results that really made my day came from Snap:
Alex Ferguson, the manager of Manchester United!
A lot of image searches are likely to be people searches. So a natural question is: can a text-based search engine do a better job than we get today? It seems like it has a long way to go before we get good results. So what can be done to make this type of image search better? To start with, it would be helpful to tell the engine we are looking for a person - this can be easily done with a simple checkbox. Next, it would be really helpful if the results were ordered by proximity to the picture. Just because my name is on the page, does not mean that all pictures on that page should be displayed in the results!
And since we live in the social web era, it would make sense to let people correct the results. It seems that Google is already doing this with clicks. But it would be great if it would let us tag the images right there as well. Feeding peoples corrections back into the search engine ranking would make a big impact on the overall results. And people would do it, because it's likely you want your picture to come up when someone searches for your name.
Finally, there is a question of whether it even makes sense to search for peoples pictures on Google. Even if it does not, people will do it anyway - so it better supply decent results. And if it can't, there will be an opportunity for specialized vertical search engines for people pictures.
In the meantime, please head over to your favorite image search engine and try searching for your name. Of course share your findings with us!