Last May we asked the question, “are social bookmarking sites better at search than Google?” Though some readers questioned our specific methods, our conclusion was that “while social bookmarking and ranking sites don’t make great search engines on their own, they offer a wealth of user-vetted data that could be used to augment search results in a positive way.” Recently, Yahoo! began testing including del.icio.us data in search results. While it is unclear whether the del.icio.us data is affecting search rankings, the more important question is: would it even matter?
A group of researchers at Standford recently presented a paper in which they answered that very question: can social bookmarking augment traditional search results? Marshall Kirkpatrick made brief mention of the paper in a post earlier today.
The paper, which is entitled, “Can Social Bookmarking Improve Web Search?,” was presented at the First ACM International Conference on Web Search and Data Mining (WSDM’08) and includes eleven experiments designed to evaluate “different aspects of social bookmarking and their impact on web search,” using del.icio.us bookmarking data, Yahoo! and AOL search data, and ODP data gathered between May and June of 2007.
Overall, the group concluded that the relatively small size of the social bookmarking community (the paper’s authors estimate that only about 1/1000th of the web has been bookmarked and tagged in del.icio.us) means that it is not yet ready to make a significant impact on search, but that ther are still ways in which social bookmarking data can be used to improve how search engines work. That’s a similar conclusion to one we made in December when we noted that del.icio.us is mostly being used to bookmark stories related to a few narrow fields, which means that its usefulness in augmenting search rankings is limited.
One of the commenters on that post theorized that the reason social bookmarking isn’t being used as much for cataloging things like celebrity gossip or sports news is because social bookmarking is used mainly for storing information that is not time sensitive. “Delicious does not do good at celebrities, sports news & so on because it’s meant for what you want to keep over time. Not the latest updates. It’s as simple as that to me.” (NatC)
That sounds logical, however, the Standford group found that del.icio.us users tend to post pages “that are actively updated or have been recently created” and recommended that search engines could use social bookmarking data to augment or improve their crawl schedules. In fact, about 25% of URLs entered into del.icio.us are not seen in search engines for another 4 weeks to to 6 months, which indicates, says the researchers, that social bookmarking could be used “as a (small) data source for new web pages and to help crawl ordering.”
What we’re most interested in, though is how social bookmarking can be used to affect search engine rankings. The Stanford team found that tagging at social bookmarking sites would probably not be very helpful because 80% of tags are found in page text or the surrounding text and so those pages would likely be found by search engines anyway. The team also found, though, that del.icio.us has a high level of redundancy for about 20% of URLs, and that there is generally an adequate level of overlap between top search results and bookmarked pages. While the paper doesn’t make any conclusions regarding the relevancy of URLs that are tagged more than once, it has long been our contention that the number of times a URL has been saved, in conjunction with tag data for that URL, could be used by search engines to augment ranking algorithms — i.e., URLs that are saved more (or more often) are likely to be more useful.
Certainly there are problems with relying too heavily on how many times a URL has been saved to social bookmarking sites when determining its search result position. For example, that number may be easily gamed or influenced via blackhat techniques. But even without that use case, the researchers at Stanford outlined a number of ways in which social bookmarking sites could be used to theoretically improve search engines, if not yet on a grand scale.
What do you think? Will social bookmarking data ever be used to enhance search engines? Should it? Let us know in the comments below.