Home Tags as Far as the Eye Can See: New York Times to Publish Index as Linked Data

Tags as Far as the Eye Can See: New York Times to Publish Index as Linked Data

Today, at the Semantic Technology Conference, Rob Larson and Evan Sandhaus of the New York Times announced together that the Times will soon be publishing its copious index as Linked Data.

The Times‘ data will join content from Project Gutenberg, a vast online library of text from public domain books, data from the U.S. census, and information from many other formative and vital entities in the semantic web space. Larson and his team intend to make available hundreds of thousands of tags for content dating back to 1851. This will providing give developers an invaluable, automatically navigable roadmap for the publication’s vast directory of knowledge and will link that data to existing pages, people, and content around the web.

In his keynote address, Larson emphasized “How deeply we [at the Times] care about metadata.”

“It’s been fundamental to what we do for a long time. We feel we’re good at it, but our content is an island… we want to announce our intention to publish our thesaurus to the community under a license that will allow you to use it and contribute your improvements… The results of this effort will in time take the shape of the Times entering this Linked Data cloud. This is wholly consistent with our open strategy… to facilitate access to slices of our data for those who want to include it in their applications.”

Larson likened the Times corpus to a quarry of data. He said that the newspaper’s API provided the picks and shovels to mine data, and the Linked Data initiative would be the map.

The timing, licensing, format, and other factors of the project are yet to be determined.

This announcement comes on the heels of CNET’s partnership with Reuters to publish data to the Linked Data cloud. Moreover, exactly one month ago, we wrote that Linked Data was a concept “whose time has come” and gave a thorough overview of the concepts and standards it entails, for curious readers who would like to drill deeper on the subject.

In another recent interview, Sandhaus detailed the tagging process for the Times‘ corpus, both for print and online articles:

“There are two types of tagging that go on at the times… Every day, indexers take the paper and go article by article and associate each article with subject keywords. Then they manually summarize it. It’s like a Google list, but in dead tree form.

Another type of tagging we do is… when an article goes from the newsroom to the web, it’s put there by a producer who will augment the article with any number of rich features like images, multimedia… and subject keywords. Unlike the indexers, who do this completely by hand, the producers are assisted in their tagging by an automated classification system which suggests tags to be applied to the data and which are ultimately approved by the producer.”

An official announcement is expected at the Times‘ Open blog tomorrow, with details on the project to follow.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.