The latest implementation of OpenCalais, the Semantic API by media company Thomson Reuters, has just been announced. It’s with ‘new media’ stalwart CNET, which has signed up to use OpenCalais for semantic analysis of its tech product reviews, news, and blog posts. CNET has also joined Thomson Reuters as one of the first commercial media companies to publish its data to the Linked Data community on the Internet. This basically means that external companies can use that data for their own purposes. While CNET won’t be releasing all of its commercial data, it will expose certain sets of product and editorial data.
Calais is a toolkit of products that enables publishers to incorporate semantic functionality within their properties – enabling them to categorize content as people, places, companies, facts, events, and more. OpenCalais 4 was launched in January, for the first time enabling publishers to connect to the Linked Data web standard that Sir Tim-Berners Lee and others in the Semantic Web community have been promoting over the past few years.
We spoke today with Peter Offringa, Vice President of Software Engineering at CNET, to find out how CNET has implemented OpenCalais.
The ways that CNET will use OpenCalais are twofold:
1) It will use OpenCalais to power ‘topic pages’ across CNET web properties. For example in the screenshots below, the word ‘Zune’ is linked and when clicked it leads to a topic page about the Zune.
Peter Offringa told us that currently CNET has a limited set of topic pages and that they were built in a “semi-automated way” – by which he means that the tags are automated using OpenCalais, but the topic pages are powered by RSS feeds and those need to be manually set up. However the tag generation is an automated process integrated within CNET’s publishing system.
CNET articles will include links to topic pages, e.g. Zune link above
Example topic page, largely powered by OpenCalais
How this works is that OpenCalais enables semantic analysis of CNET’s content, from which it creates tags that then help power the topic pages. Peter Offringa told us that CNET plans to scale up its topic pages, as well as doing more ‘recommender’ features in the near future using OpenCalais – perhaps replacing the current recommendation widget on its pages powered by Sphere.
2) The second part of the announcement today is that CNET is publishing some of its commercial data to the Linked Data cloud, which they will do in the RDF format. Offringa said that this will be primarily product data, but will also include other types of data – e.g. from their software download services. While Offringa told us that CNET is “not going to expose everything”, they will contribute a “base set of data” to the Linked Data ecosystem.
CNET also plans to mix this data with external Linked Data sets. In the above Zune example, they may add company data from Thomson Reuters. It’s this cross-linking of “content verticals” (in this case, within tech news) that will lead to interesting things in the Linked Data world.
Disclosure: OpenCalais is a sponsor of ReadWriteWeb and we are also working on a similar project with them.