IBM's New Image Recognition-Based Search

We’ve all seen photos of ourselves in locations we can’t quite remember. Often they’re from exotic travels or from days long past. Regardless of the reason for your memory loss, IBM is working on a tool that can help. In collaboration with the European Union consortium, the company is testing SAPIR (Search in Audio-Visual Content Using Peer-to-peer Information Retrieval). The image matching search technology allows users to pull results from large collections of audio-visual content without using tags for search. Instead, users can upload images and match them to similar ones – perhaps even ones with signage and labels. The system analyzes everything from digital photographs, to sound files to video. From here it automatically indexes and ranks the media for retrieval.

A recent IDC white paper reports, “The digital universe is messy…95% of the data in the digital universe is unstructured, meaning its intrinsic meaning cannot be easily divined by simple computer programs. There are ways to imply meaning to unstructured data, and the semantic web project is promising to develop the tools to help us do that in the future.”

Two such “divining” projects include CoPhIR (Content-based Photo Image Retrieval) Test-Collection and IBM’s MUFIN (Multi-Feature Indexing Network). These projects tie into SAPIR’s back end by extracting data from the Flickr archive and indexing features such as scalable color, color structure, color layout, shape edges and texture.

As shown in the video of Madrid’s Plaza de España, SAPIR identifies matching media in the same way that humans derive intrinsic value from visual and sensory clues. Users can also choose to combine search terms with additional text to further drill down in search results. As is the case with regular search, if you already know the city where your image was taken, you’re one step closer to finding your result. Additionally, SAPIR also has the ability to index sound and video files.

While the catalogue of media is still very limited, theoretically we may one day be able to search for almost anything using this technique. If Ashton Kutcher wears a pair of sunglasses we like, we can scan the image and search for the storefronts stocking them. If we’re looking for the name of a town square, we can find it in the tags of similar images. And finally, if we’re looking to self-diagnose we can compare photos of ourselves against jaundice or malaria patients.

The advantage of this tool is that we may one day have a chance to collect up the disparate bits in the digital ether and identify them as useful points of information. To test SAPIR in its early research stage, visit the homepage. You may also want to test out MUFIN to compare results.