Last week the Web’s three leading search companies – Google, Microsoft and Yahoo! – announced a new structured data collaboration called Schema.org. It includes more than 100 new types of website markup for content like movies, music, organizations, TV shows, products, places and more. The stated aim of Schema.org is to “improve the display of search results, making it easier for people to find the right web pages.”
However, is this collaboration routing around existing web standards, as promoted by the governing web body the World Wide Web Consortium (W3C)? Since the news was announced, we’ve discovered that the W3C was not consulted about Schema.org. And given that Google dominates the search market, should we be worried that Google will control a substantial part of the markup used on webpages if – as expected – Schema.org gets significant take-up? Here’s why the alarm bell should be rung…
Firstly, for big picture context, this situation is somewhat reminiscent of the Microsoft land grab in the dot com era of the Web. Remember when Microsoft controlled the browser market and was able to dictate how webpages were marked up? Webmasters and developers were forced to use markup that catered to Microsoft’s Internet Explorer browser. Schema.org may well be leading down the same path, with webmasters and developers having to use Schema.org markup in order to get their webpages ranked highly in the major search engines.
Specifically, here are the two main issues about Schema.org which leads us to suspect this is a land grab:
1) The 3 companies – Google, Microsoft and Yahoo! – write the schemas and host them centrally. These schemas sometimes directly compete with existing open standards – such as the e-commerce markup standard GoodRelations, which has been receiving solid take-up from the likes of Best Buy. Update: Martin Hepp, creator and lead developer of GoodRelations, replied in the comments that “Google and Yahoo have confirmed that they will continue to support GoodRelations in RDFa for product and offer information.”
2) Whereas open standards like GoodRelations use RDFa (a simpler version of RDF, the main markup of the W3C-sponsored Semantic Web), the Schema.org markup will use Microdata – which is a spec written by Google.
RDFa Adoption Will Suffer
Schema.org will certainly lead to a decrease in RDFa usage, which ultimately hurts the W3C’s long-running push towards the Semantic Web – that is, a Web with added meaning and structure.
Over the past year, RDFa received significant take-up from large companies like Facebook and Best Buy. It’s particularly notable that Facebook used RDFa in its Open Graph protocol. Facebook is Google’s main competitor in the social Web, so Schema.org could also be viewed as a competitive move by Google against Facebook.
Simply put, the argument here is that Schema.org is a strong push by Google (and less so Microsoft and Yahoo!) to be in centralized control of key aspects of Web markup – at the expense of W3C open standards. As Web data becomes more and more structured, we have to question any moves by a large, influential company that may put it in a position of control over that data.
Indeed, last year we raised the same questions about Facebook’s Open Graph. Because although Facebook used RDFa, they used their own custom version of it. Despite this, both Facebook and the W3C argued that the Open Graph would actually help the adoption of RDFa.
Why Did Schema.org Choose Microdata Over RDFa?
ReadWriteWeb has learned of rumors that Yahoo! wanted RDFa to be a core component of Schema.org, but that Google and Microsoft insisted on Microdata. Why is that?
Microdata is the markup specification written by Google on which Schema.org is based. It’s similar to RDFa, in that it adds semantics to HTML in order to provide more structure to Web markup.
Google explained the Schema.org decision to use Microdata over RDFa on a Google Webmaster Central help page:
“Historically, we’ve supported three different standards for structured data markup: microdata, microformats, and RDFa. Instead of having webmasters decide between competing formats, we’ve decided to focus on just one format for schema.org. In addition, a single format will improve consistency across search engines relying on the data. There are arguments to be made for preferring any of the existing standards, but we’ve found that microdata strikes a balance between the extensibility of RDFa and the simplicity of microformats, so this is the format that we’ve gone with.”
That explanation makes logical and business sense, but even so we have to ask why Google, Microsoft and Yahoo! chose to route around the W3C supported standard of RDFa.
There is some politics happening here, because Microdata is sponsored by a non-W3C work group called Web Hypertext Application Technology Working Group (WHATWG), which was formed in 2004 in response to the perceived slow development of web standards at W3C.
Is This a Land Grab by Google? You Tell Us…
Regardless of the politics, there is a real danger that Google in particular will come to control a significant part of Web markup through Schema.org.
While it is a positive sign that the major search companies are pushing for more structured data, the big question is about control. Why isn’t Schema.org using RDFa, the W3C open standard, as the base for its schemas? Does Google now have too much influence over the future of structured data? We’d love to hear your thoughts about these important issues regarding the future of the Web.