Zemanta is a an interesting European startup that is applying semantic technologies to blogging. Sarah Perez covered the company's launch in March. One can think of Zemanta as an auto-complete function for blogging. As you are typing up a new post, Zemanta's browser plugin fetches related content - images, articles, videos, links - and provides a simple and friendly UI for inserting the related content into your blog. We caught up with Andraz Tori, CTO and co-founder of Zemanta, at the SemTech conference at San Jose last week for an interview.
Just because Zemanta's product looks simple does not mean that it is not sophisticated. Beneath the product's UI there is a powerful semantic analysis engine that matches content to Zemanta's web index. The elements of their technology include clustering, natural language processing, dynamic ontologies - the full spectrum of semantic web tech that well-publicized companies like Powerset, Freebase, and Hakia are known for.
All of these algorithms are running on a scalable, distributed grid, powered by Amazon Web Services. After meeting with Tori, we instantly knew why Zemanta won a Red Herring 100 award this year in Europe - not only are Tori and his team doing some amazing work, there is a wonderful story and passion behind the company.
RWW: What is your background?
Andraz Tori: I started programming at age of 10 and have been successful at international programming competitions in high school. I went to study computer science, however always did some things in parallel. For example, I had a 5-year detour as TV host on Slovenian national television and established a successful computer center in Ljubljana. I always look for how to improve life with technology and decided to go entrepreneurial when seeing an interesting opportunity on how to do it on a large scale.
What is it like to be a tech startup in Europe?
Seedcamp (a UK competition inspired by Y Combinator) was a great boost for European early stage ventures and for us too. It is fun trying to bring a startup culture to Slovenia, a country that is not really used to it.It's fun. It's hard. But that is even more rewarding when you overcome the challenges.
How did Zemanta get started?
We've seen that local TV house was providing all their video production on the Internet. Naturally Google could not understand and index them. We discovered that TV house had subtitles for all the shows and wrote a program to automatically create web pages that are automatically indexed and then point people to the right videos. That was too easy so we added a bunch of natural language processing and automatically connected those pages to other stories on TV portal and to Wikipedia. Now full blown web pages were created automagically. We sold this solution for pocket change and then realized that it is actually a very unique product - like nothing else out there! Then we (with co-founder Bostjan Spetic) realized that this amazing technology works on the language that only two million people speak. So we decided to go international and applied to Seedcamp. There we got first seed funding and later proper seed round from UK investors.
What is the main idea behind Zemanta?
When dealing with secretary, do you instruct her how to do every single detail or do you tell her approximately what you want, wait for result and just correct it if there are any the mistakes? We use computers today in the first way, while at Zemanta we believe it should be more of the second. Zemanta applies that idea to content creation. When author writes initial text, the service analyzes it and suggests how it can be improved.
Right now it suggests images to add, related articles, tags and in-text links. All this unobtrusively and implemented via slick interface. The better the computer understands the text and its context, the more it can help you write it. That's the idea behind Zemanta. Right now we are applying it to bloggers (via plug-ins for Firefox and Internet Explorer so they work even on hosted platforms) and we also are planning to open up an API.
How does your product use semantic technologies?
When doing our analysis we need to connect pieces of text to their semantic meaning. When suggesting tags we need to know their semantic neighborhood. But all this stays in background, the user never sees the magical semantic hand which is hidden behind simple and slick user interface. Because we find out what parts of text are about, we are able to create correct semantic markup that helps pages to get better visibility in semantic search engines or applications such as Yahoo! SearchMonkey.
What is Zemanta's architecture and use of Amazon Web Services?
Deep processing of text is a processor intensive task. You need to make it scalable, AWS EC2 is the right answer. We created our own high-availability high-performance solution that makes sure service is kept alive and well. All existing solutions only map well to classical web server + SQL server combination. We also use S3 for backups and some SimpleDB. AWS (and similar services) make life easier for startups. However you need to design your systems to be 'cloudable' from the start.
What are your goals for the rest of 2008 and beyond?
Simple, be the best utility service for bloggers in 2008. Get bloggers on board so they tell us what they want from the 'smart' service. Then provide more functionality and benefits from using Zemanta and provide an API to early adopters that want to integrate it in their own CMS or other types of applications.
Beyond 2008, we envision suggestion service so helpful that the experience becomes ubiquitously expected. In a few years you will want it whenever you will create content - be it writing a blog post, or using word processor or even in your email client. Users are going to expect computers to understand their intentions better. And help with good, insightful, directly usable suggestions. Zemanta is going to provide that service to large many of them via different delivery methods.
What companies are competing with you in the space? What other Semantic Web companies do you find interesting?
You could create Zemanta experience if you pulled different companies' products together. But we are the only one having a rounded product, not just API and not just one or two types of suggestions. You could find parts of Zemanta experience in Sphere, Calais, BlogRovr, Watson, etc.
I am a big fan of Cyc and Metaweb and hope people will build wonders on the foundations those two companies are building. I am also interested in Powerset and Twine which both could become very important if/when they make it into the mainstream.
What is one insight, business or technical, that you want to share with our readers?
Developing diverse skills pays off. And doing things with your whole heart always means an interesting journey, even when you end up at different place than you initially expected.