European researchers working on the MOBVIS project have developed a new system that will allow camera phone users to hyperlink the real world. After taking a picture of a streetscape in an urban area, the MOBVIS technology identifies objects like buildings, infrastructure, monuments, cars, and even logos and banners. It then renders relevant information on the screen using icons that deliver text-based details about the object when clicked.
This project goes beyond today’s mapping applications like Google’s Street View, for example, which first identifies your location either via GPS or triangulation and then shows you pictures of that area. Instead, MOBVIS actually lets you “see” the world through your mobile phone. This is computer vision, or rather, mobile vision.
There are obviously numerous potential applications for such a technology. On the MOBVIS homepage, they offer up some scenarios for how their application could be used, including the following:
Tourism/Augmented City Maps: The MOBVIS technology could be used to inform visitors about the objects in an area be them buildings or landmarks. The images could also be annotated with additional information like history, event information, or information about nearby shops.
Visual Localization: For phones without GPS technology, triangulation could be combined with the computer vision technology to locate a user’s position and orientation in a manner that would be comparable to GPS and just as accurate.
Motion Estimation: Also comparable to GPS, MOBVIS could enable continuous position updates to determine the location of objects in motion as well as their speed.
Incremental Map Updates: MOBVIS supports incremental updating of maps which would allow for the automated authoring of the urban infrastructure. No longer would Google need to send their vans around taking pictures of streets – the data could be uploaded from users’ phones as they took their photos.
Picture-Driven Search Engine: Because the mobile phone could now “see” the surrounding landscape, the world – reality – becomes the backdrop for a sort of picture-driven search engine in which the objects in the world are all hyperlinked and annotated like a real-life semantic web.
How It Works
The MOBVIS system begins with a pre-populated database of geo-referenced panoramas (such as Google’s Street View, perhaps). The objects in the images are then manually annotated with information. Once that’s complete, the system is ready for search queries from mobile users. After a user takes a picture, MOBVIS compares the photo to the photos in its database and returns the relevant links.
The challenge here is getting a mobile phone picture to match up with the more pristine photos found in the database. The database photos would likely be clear, crisp, and detailed, but a user’s photo could be grainy, taken on a dark and cloudy day, or taken from an odd angle.
The MOBVIS system’s main strength comes from its feature-matching algorithm developed by the University of Ljubljana in Slovenia, one of the partners of the project. This algorithm can very accurately detect minute differences between similar objects. In real-world tests, it’s reported that this system was highly accurate, detecting the right building 80 percent of the time.
Ale Leonardis, head of the Ljubljana team, believes that number can be improved, too. He also notes that that the system, though not always right, was never wrong. “It was remarkable that there were no false positives,” he says. “Sometimes the system couldn’t identify a building, but it never put the incorrect link on a building.”
You can read more about the research here on the MOBVIS project’s homepage.