Written by Guest Blogger Emre Sokullu and edited by Richard MacManus.
Introduction
Personalized Content is one of the two most popular approaches in next generation news sites - the other is Power of Masses, which we will cover in a future post. The leading examples of these approaches are reddit for Personalized Content and digg for Power of Masses. In this article, we will cover the personalized content approach and in particular reddit. We will describe the technical details and compare existing personalized content solutions.
First a brief technical explanation: the Personalized Content approach uses a very similar technique to spam detection software. The idea is that everyone has their own pattern of reading. To recognize your pattern, Personalized Content services omit stopwords and extract keywords from the news you read - then use Bayesian Statistical analysis to predict what kind of news you will like or dislike in future.
Reddit's quest for personalized news nirvana
Reddit, backed by Paul Graham's Y Combinator startup program, is the leading player in this field - and has put a lot of effort into having the best algorithms. Reddit has tried out 2 languages to achieve optimal results. They started with Lisp, which is known as a very suitable programming language for artificial intelligence and natural language processing applications. But then they turned to a more widely used language in the web 2.0 world, Python.
However, as the dharmesh.com site explains in detail, many users still complain about not receiving relevant news recommendations. This might be a bad sign, because it shows that their pattern recognition technology doesn't seem to work in some cases - even in a limited pattern span. Nevertheless, reddit appears to be on the right path - the latest code changes received positive signals from their community.
The competition
But competition is heating up for Reddit. For instance, an Israeli startup called Spotback targets a wider audience and offers a more attractive, Digg-like user interface. Their job is harder though, as they're covering a greater span of news. See Techcrunch's recent review of Spotback for more details.
Some sites are taking a wider approach to personalized news. Instead of personalizing news flowing just within their site (as Reddit does), they try to personalize external RSS feeds. As a result, their algorithms span much wider - because theoretically this means they can personalize news sites, blogs and more. A pioneering company in this area was SearchFox, which was almost immediately acquired by Yahoo in January. SearchFox enabled you to personalize your RSS feeds. Indeed its flexibility may allow Yahoo to integrate this technology into every corner of their network.
Personalized Start Pages (like Netvibes and Pageflakes) are also in this space, because feed filtration can be a differentiating factor for them. Imagine a start page full of your favourite widgets, RSS feeds and tools - but you see not all the news flowing from your favorite sites, only a smaller filtered set of relevant news items. However we have yet to see a working, satisfactory prototype of this.
Greece based Feeds2.0 and San Jose based LeapTag (which was just launched in the latest DemoFall) are tackling the same "machine learning" problem of personalized news, from different perspectives. Feeds2.0 is doing exactly the same as SearchFox, filtering RSS feeds. LeapTag is still in private beta and does link recommendation via their downloadable browser plug-ins.
Feeds 2.0 process
Let's also not forget one of the longest running personalized news sites of this era - Findory. It aims to be a personalized newspaper for the Web. Findory creator Greg Linden is an insightful commentator on personalized news issues and he says it is a technically challenging space. As he noted at the time SearchFox was acquired:
"Building scalable personalization systems is hard. Techniques that work fine on toy problems completely break down at scale. The systems have to be designed from the start to do fast recommendations in real-time for hundreds of thousands of users."
Findory process
Some comparison charts
The graph below shows the current Alexa traffic of the following personalized news sites: reddit, Spotback, Feeds2.0 and Findory. It should be noted that each of these sites has a slightly different focus, nevertheless it is clear that reddit has the most traffic.
The next graph shows that reddit, the leader in personalized content, is far behind Digg (the leader of the Power of Masses approach). Therefore, we can say that personalized content still has a long way to go.
Conclusion
Our guess is that personalized content will become a more popular paradigm in about 1 to 2 years, provided of course that the technical challenges can be overcome. Which is by no means certain, since a lot of smart developers think that personalized content is a huge challenge.
Personalized news has a couple of main attractions. Theoretically, if your news is personalized then it's not as vulnerable to gaming as the power of masses approach. Plus people are getting busier everyday, so personalized news has a strong appeal as a potential solution for information overload.
We're not sure who will end up being the key player in this space - maybe a giant like Google, maybe an existing startup like reddit, or maybe a whole new startup. But one thing we're sure of: the current personalized news services still need more work and the technical issues around personalizing content are far from solved.