A month ago we wrote about Reuters launching an API called Open Calais, a technology that “does a semantic markup on unstructured HTML documents – recognizing people, places, companies, and events.” I mentioned Calais in my Media08 presentation last week entitled Web Technology Trends for 2008 and Beyond. It generated interest in the media-focused audience I presented to, so in this post we follow up with Reuters and ask what progress is being made. Specifically we look at what apps have been built so far on Calais and get feedback from Reuters’ Tom Tague.

Quick Recap of Open Calais
Open Calais is a Semantic Web technology – and in this case the next generation of the Clear Forest product, which Reuters acquired in April ’07 (see our Dec ’06 review). Alex Iskold’s post last month is ‘must read’ to understand what Open Calais is and why Reuters bought it. This diagram summarizes:
The API is free for both commercial and non-commercial use and Reuters told us last month that it is prepared to scale for a massive concurrent demand. The API is great for third party developers, because it gives them access to Reuters data. And it benefits Reuters, because it enables Reuters to aggregate metadata for its own uses.
Alex listed some possible uses: intelligent search engines that look for related content, automatically inserting links into raw text, structured alerts, on-the-fly text analysis within your browser.
Example Apps?
So it sounds great in theory, but are there any examples of Open Calais apps so far? Reuters has a “bounty” program set up, whereby developers are invited to create Open Calais applications and Reuters will pay for that. However, it seems there has been little – if any – takeup of the bounties.
Top of the list of wanted apps was a WordPress plugin. Tom Tague, who is leading the Calais initiative at Reuters, noted in the forum that “unfortunately – and unexpectedly – we haven’t seen any reasonable applications for the bounty process so we’ll most likely be contracting for the development of the WordPress plugin.” Perhaps the amount of the bounty in this case was an issue – Reuters only offered $5000 for the WordPress plugin, which doesn’t seem like much of an incentive.
So Reuters has been forced to take the initiative and release some apps of their own. One is a new web based document submission tool and viewer. There is some sign of action in the Open Calais forum, on a page where developers can list what they’re working on. A developer named Craig has built an example of Calais semantics using pure PHP and Abhay Kumar has a similar service. These are all ‘data input’ tools. For an ‘output’ example, check out Mark Choate’s RSS implementation of Calais data (example below).
Interview with Reuters’ Tom Tague
Clearly, it’s early days. I asked Open Calais lead Tom Tague how the initiative is progressing? Tom replied that “were about where we expected to be in terms of applications for Calais.” He told us that the service is “just a little over 45 days old and much of the effort were seeing is in building tools to explore the capabilities themselves.”
At this time Open Calais has just over 1,500 developers signed up; with about 30% of those developers actually making calls to and experimenting with the service. “One of the more exciting things thats going on,” Tom Tague told us, “are several community-led efforts to build Calais libraries for Ruby, PHP, ASP.NET and others. These will provide a great accelerant for developers to gain access to the service.”
How is Reuters using Calais In-house?
So, at this point there is nothing to see for non-developers – the apps that have come out so far are developer-focused and not something the rest of us can use. So my next question to Tom was: how is Reuters itself using the Calais technology?
Tom replied that Reuters has several things underway:
“We’re in the process of adding rich metadata to over 20 years of historical news archives (many millions of articles) to improve searchability and organization. Were doing a lot of work in automating and generally improving the efficiency of a massive real time content ingestion process. Were working with one of the community platforms deployed for Reuters customers to improve the tagging and classification of user generated content. And, of course, we have significant efforts under way to generate machine readable news to drive low-latency algorithmic trading. All of these efforts are based on the same technology platform driving the Calais initiative.”
Conclusion: Show Us The Apps!
I must admit that I was expecting to see some working apps by now. Perhaps it is a similar case to Marshall Kirkpatrick’s experience of Twine (published earlier today), the Semantic knowledge management service that received much early hype. Marshall thinks that Twine is underdone at this time and that the ‘consumer’ experience is lacking. Calais is much newer of course and, as Tom Tague said, it has only been out in the open for 45 days. So it would be unfair to compare the two efforts. Nevertheless, it would be great to see some compelling consumer-facing apps for Open Calais; even better would be to see something from Reuters that shows the public the benefits of semantic technologies.
Alex Iskold listed a number of consumer apps that could be built using Calais, by Reuters or external parties. I think people need to see at least one of those pretty soon – in order to translate the interest that Open Calais is generating from media and other people, into something non-geeks can see working on the Web and producing noticeably better information results. To paraphrase the famous Jerry Maguire quote, ‘Show me the apps!’.