Written by Guest Blogger Emre Sokullu and edited by
Richard MacManus.
Introduction
Personalized Content is one of the two most popular approaches in next generation news
sites – the other is Power of Masses, which we will cover in a future post. The leading
examples of these approaches are reddit for Personalized
Content and digg for Power of Masses. In this article,
we will cover the personalized content approach and in particular reddit. We will
describe the technical details and compare existing personalized content solutions.
First a brief technical explanation: the Personalized Content approach uses a very
similar technique to spam detection software. The idea is that everyone has their own
pattern of reading. To recognize your pattern, Personalized Content services omit
stopwords and extract keywords from the news you read – then use Bayesian Statistical
analysis to predict what kind of news you will like or dislike in future.
Reddit’s quest for personalized news nirvana
Reddit, backed
by Paul Graham’sY Combinator startup program, is
the leading player in this field – and has put a lot of effort into having the best
algorithms. Reddit has tried out 2 languages to
achieve optimal results. They started with Lisp, which is known as a very suitable
programming language for artificial intelligence and natural language processing
applications. But then they turned to a more widely used language in the web 2.0 world,
Python.
However, as the dharmesh.com site
explains in detail, many users still complain about not receiving relevant news
recommendations. This might be a bad sign, because it shows that their pattern
recognition technology doesn’t seem to work in some cases – even in a limited pattern
span. Nevertheless, reddit appears to be on the right path – the latest code
changes received positive signals from their community.
The competition
But competition is heating up for Reddit. For instance, an
Israeli startup called Spotback
targets a wider audience and offers a more attractive, Digg-like user interface. Their
job is harder though, as they’re covering a greater span of news. See Techcrunch’s recent
review
of Spotback for more details.
Some sites are taking a wider approach to personalized news.
Instead of personalizing news flowing just within their site (as Reddit does), they try
to personalize external RSS feeds. As a result, their algorithms span much wider –
because theoretically this means they can personalize news sites, blogs and more. A
pioneering company in this area was SearchFox, which was almost immediately acquired
by Yahoo in January. SearchFox enabled you to personalize your RSS feeds. Indeed its
flexibility may allow Yahoo to integrate this technology into every corner of their
network.
Personalized Start Pages (like Netvibes and Pageflakes) are also in this space,
because feed filtration can be a differentiating factor for them. Imagine a start page
full of your favourite widgets, RSS feeds and tools – but you see not all the news
flowing from your favorite sites, only a smaller filtered set of relevant news items.
However we have yet to see a working, satisfactory prototype of this.
Greece based Feeds2.0 and San Jose
based LeapTag (which was just launched
in the latest DemoFall) are tackling the same “machine
learning” problem of personalized news, from different perspectives. Feeds2.0 is
doing exactly the same as SearchFox, filtering RSS feeds. LeapTag is still in private
beta and does link recommendation via their downloadable browser plug-ins.
Feeds 2.0 process
Let’s also not forget one of the longest running personalized news sites of this era –
Findory. It aims to be a personalized newspaper for the
Web. Findory creator Greg Linden is an insightful commentator on personalized news issues
and he says it is a technically challenging space. As he noted
at the time SearchFox was acquired:
“Building scalable personalization systems is hard. Techniques that work fine on toy
problems completely break down at scale. The systems have to be designed from the start
to do fast recommendations in real-time for hundreds of thousands of users.”
Findory process
Some comparison charts
The graph below shows the current Alexa traffic of the following personalized news sites:
reddit, Spotback, Feeds2.0 and Findory. It should be noted that each of these sites has a
slightly different focus, nevertheless it is clear that reddit has the most traffic.
The next graph shows that reddit, the leader in personalized content, is far behind Digg (the leader of the Power of Masses approach). Therefore, we can say that
personalized content still has a long way to go.
Conclusion
Our guess is that personalized content will become a more popular paradigm in about 1
to 2 years, provided of course that the technical challenges can be overcome. Which is by
no means certain, since a lot of smart developers
think that personalized content is a huge challenge.
Personalized news has a couple of main attractions. Theoretically, if your news is
personalized then it’s not as vulnerable to gaming as the power of masses approach. Plus
people are getting busier everyday, so personalized news has a strong appeal
as a potential solution for information overload.
We’re not sure who will end up being the key player in this space – maybe a giant like Google,
maybe an existing startup like reddit, or maybe a whole new startup. But one thing we’re
sure of: the current personalized news services still need more work and the technical
issues around personalizing content are far from solved.