NYTimes Exposes 2.8 Million Articles in New API

What do you do when your industry is shifting under your feet? Taking the lead with radical steps is one strategy. The New York Times did just that this afternoon when it announced that it has released a new Application Programming Interface (API) offering every article the paper has written since 1981, 2.8 million articles. The API includes 28 searchable fields and updated content every hour.

This is a big deal. A strong press organ with open data is to the rest of the web what basic newspaper delivery was to otherwise remote communities in another period of history. It’s a transformation moment towards interconnectedness and away from isolation. A quality API could throw the doors wide open to a future where “newspapers” are important again.

What does that mean? It means that sites around the web will be able to add dynamic links to New York Times articles, or excerpts from those articles, to pages on their own sites. The ability to enrich other content with high quality Times supplementary content is a powerful prospect.

The Times has opened a wide variety of APIs over the last year; they are making “the newspaper as platform” (as journalist Mathew Ingram put it today) a major part of the company’s bid for the future. We discussed the significance of this strategy when the Times opened its first API in October. As we wrote then,

Reporting is no longer a scarce commodity. It’s hard for these huge news organizations to do it faster, cheaper or even as well as a whole web of new media producers around the world. They may be among the top sources for original content still today, but considering the direction technology is moving in – that’s not a safe bet for the future.

One thing that big media still does have a particularly good share of, though, is information processing resources and archival content. The Times’ campaign contribution API is a good example of this. The newspaper is far better prepared to organize that raw information, and perhaps offer complimentary content, than any individual blogger or small news publisher.

We’re excited to see how this API gets put to use and we look forward to seeing it develop all the more.

What could come next? We’d love to see some semantic parsing of all this content. As semantic web aficionado Tom Morris wrote today, “[These] Could be signs of something very good – imagine if the New York Times were to join the web of Linked Data, pointing from articles out to all sorts of distributed resources. The amount of information stored up inside an institution like the New York Times would be really interesting if it were linked together with other data on the Web. A search API isn’t tremendously interesting, but it is interesting to see someone like the NYT do this, rather than just Web 2.0 sites and hosts of user-contributed material publishing this kind of data.”

Or, as Tim Berners Lee reportedly told attendees of the TED conference today – the time has come for no “database hugging” – don’t just make your own website. Especially when it comes to government data, we should all demand raw data now.

Full raw data, marked up semantic or linked data, there are a number of options. This is an informational currency that could mean as much to the world of the future as mere delivery of the paper press used to in an otherwise isolated world. We hope this effort will succeed and be another model for more of the same from other companies.

Disclosure: The NYTimes is a syndication partner of ReadWriteWeb.