Thomson Reuters is today launching the latest version of its Calais web service and open API, Calais 4.0. Calais is a toolkit of products that enables publishers to incorporate semantic functionality within their properties – enabling them to categorize content as people, places, companies, facts, events, and more. Calais 4.0 is perhaps the most significant version since the launch of Calais one year ago, because it enables publishers to connect to the Linked Data web standard that Sir Tim-Berners Lee and others in the Semantic Web community have been promoting over the past few years.
Up till now, we have yet to see much commercial activity in Linked Data – developments have been largely confined to the academic and scientific communities. So we think Calais 4.0 represents an important move forward in the commercial Semantic Web – and we expect to see some big media companies using it before long.
Specifically, Calais 4.0 goes beyond metatagging and enables publishers to integrate their content with Linked Data assets from Wikipedia, GeoNames, the Internet Movie Database (IMDB), Shopping.com and others. Calais 4.0 also lets publishers share semantic metadata about their content with “content consumers” such as search engines, news aggregators, related stories recommendation services and more.
ReadWriteWeb named Calais as one of our top 10 Semantic Web Apps of 2008, due to the progress it made last year. Since launching the Open Calais API early in 2008, over 9,000 developers have registered with it and Calais has processed 200+ million articles.
What’s New in 4.0
We spoke with Thomas Tague, Calais lead at Thomson Reuters, about what specifically is new with Calais 4.0 and what use cases we might see over the coming year for it.
Tague explained to ReadWriteWeb that there are 3 pillers to the Calais initiative:
1. Getting semantic data out of text; which is what the first 3 versions of Calais focused on.
2. Connecting that semantic data to the linked data world.
3. Providing some way for people to share metadata, for example syndicating it – which Tague termed the “transport” piller.
Calais 4.0, explained Tague, fills in the final 2 of those pillers. It supports approximately 25 entity types in Linked Data – URIs are de-referencable to Calais RDF pages. Thomson Reuters is also publishing their ontology in RDFS. Calais will contribute data too, which Thomson Reuters claims is “the first contribution to the Linked Data cloud made by a major publisher.” The data that Thomson Reuters is giving to the Linked Data world includes company descriptions, stock tickers, management teams and more. This data will be available to external developers to programmatically use in their apps.
Thomas Tague told ReadWriteWeb that Thomson Reuters has some big data assets and that over time “we’re going to populate linked data endpoints with Thomson Reuters data”. We asked Tague whether he thinks Calais 4.0 is the biggest commercial use of the Linked Data standard yet? He thinks it is; in his opinion, Linked Data has mostly been used so far for open data projects and relatively small sets of data. Tague said that “we fundamentally believe that companies need to jump into this [Linked Data]”.
The Linking Open Data dataset cloud; by Richard Cyganiak
In terms of piller 3, the metadata transportation, Tague explained to us that a document gets a unique identifier – and to syndicate content, publishers just need to make available that unique identifier to external parties.
Conclusion
It will be interesting to see what companies make use of Calais over 2009. Last year we noted that IBM was using Calais – and we presume that with the extra Linked Data and transport functionality, other big companies will want to make use of Calais data too. Thomas Tague told us that they hope to announce 2 big product partners soon. He also said that they’re seeing major traction around Drupal. Healthcare IT News from MedTech Publishing, a site developed in Drupal, features the full Calais suite for publishers including “More Like This”, their related content plugin.
As we noted at the beginning of this post, we’ve been impressed with the progress Calais has made since its launch at the start of 2008. With 4.0, we expect to see it gain more traction among commercial publishers in 2009. Indeed as a (we like to think) ahead-of-the-curve ‘new media’ company ourselves, we’re about to embark on our own project using Calais! Stay tuned for more information on that.