News volume has moved from infoscarcity to infobesity. For the last hundred years, news in print was delivered in a container, called a newspaper, periodically, typically every twenty-four hours. The container constrained the product. The biggest constraints of the old paradigm were periodic delivery and limitations of column inches.

Now information continually bursts through our Google Readers, our cell phones, our tablets, display screens in elevators and grocery stores. Do we really need to read all 88,731 articles on the Bernie Madoff trial? Probably not. And that's the dilemma for news organizations.

In the old metaphor, column-inches was the constraint. In the new metaphor, reader attention span becomes the constraint.

Taming the Beast

Chris Lamb is a business strategy executive in financial media, financial technology, and web services. Previously he was at Thomson Reuters. He can be reached via email at clambresearch@gmail.com.
When reader attention span becomes the constraint, relevancy becomes the coin of the realm. Applications surfacing relevant content and filtering flotsam drive competitive advantage.

The dilemma is that relevant is in the mind of the beholder. Emerging application - news readers such as Flipboard, Pulse and Feedly - are struggling to deliver relevancy. One proxy is to use a reader's social graph to curate stories. Another is user profiles and preferences.

These current approaches are doomed. With respect to social graph curation, people have different roles at during different times. On the weekend, a reader might be interested in arts, entertainment and sports news based on a friends and family. During the week, this same person may be interested in business news based on recommendations from trading partners in the capital markets. How do readers seamlessly reconcile this?

I am not able to predict when viable applications will emerge, but I do believe the industry will struggle for several years. The fate of these first generation news readers may be similar to first generation social networks. Remember Friendster and Orkut?

However, there is some clarity on key underlying technologies that will provide the scaffolding for next generation news consumption. Here are some of them.

Tagging and Semantic Extraction

Tagging or semantic extraction engines, process news articles and return structured metadata, to provide insight into the underlying text. A trivial application that uses tags is the tag cloud, where users click tags displayed on a web page to uncover underlying content. Tag clouds miss the point.

I am not able to predict when viable applications will emerge, but I do believe the industry will struggle for several years. The fate of these first generation news readers may be similar to first generation social networks. Remember Friendster and Orkut?
The power of metadata is that it is a machine readable asset. Machine readable assets form the basis by which applications can navigate online text. Now it is possible to build applications to bind, differentiate, collate and curate content, resulting in huge automation wins, for both newsrooms and news consumers.

Disambiguation

Disambiguation is a technique to uniquely identify named entities: people, cities, and subjects. Disambiguation can identify that one article is about George Herbert Walker Bush, the 41st President of the US, and another article is about George Walker Bush, number 43. Similarly, the technology can distinguish between Lincoln Continental, the car, and Lincoln, Nebraska, the town. As part of the metadata, many tagging engines that disambiguate return unique identifiers called Uniform Resource Identifiers (URI). A URI is a pointer into a database.

If tagging creates machine readable assets, disambiguation is the connective tissue between these assets. Leveraging tagging and disambiguation technologies, applications can now connect content with very disparate origins. Today's article on George W. Bush can be automatically linked to an article he wrote when he owned the Texas Ranger's baseball team. Similarly the online bio of Bill Gates can be automatically tied to his online New Mexico arrest record in April 1975.

Linked Data Structures

Typically, a URI entry holds some content related to the entity. URIs are linked to form a database called a Linked Data Structure. For instance, a URI on Barack Obama may contain a reference to his current position, former jobs, marital status, spouse's name, children, education and schools.

The URI Michelle Obama may similarly contain information on her spouse, children, college, graduate school, etc. The URI on Michele Obama's law school will contain information about the school, such as current and past deans.

With the ability to automatically extract key entities from text, create machine readable assets, disambiguate them, and query a linked data structure, it is now possible to build very powerful applications.
With the ability to automatically extract key entities from text, create machine readable assets, disambiguate them, and query a linked data structure, it is now possible to build very powerful applications.

For instance, one could build an application to retrieve articles on the Supreme Court and determine if the article mentions any justices who previously headed a law school attended by the wife of any U.S. President. This application would identify all articles mentioning Justice Elena Kagan, previously Dean of Harvard Law School, from which Michelle Obama graduated.

That example, per se, may be nonsensical, but the power of the technology is immense. For instance are there any CEOs of government contractors who spouses happen to sit on philanthropic boards along with lawmakers on the House Ways and Means committee?

Linked Data Cloud

The metaphor becomes even more powerful with the federation of Linked Data Structures into the Linked Data Cloud. This level of abstraction allows different owners (such as dbPedia, IMDb, the New York Times) to link their data to create a powerful ecosystem. The power here is the power of the network effect.

Utilizing these underlying technologies emerging applications will drive a completely news-reading metaphor.

Conclusion

Semantic infrastructure technologies will propel next generation news consumption. The fluidity and deluge of online news overwhelms us, but smart readers will tame this flow and enable new consumption models and insights.

Newspaper box photo by George Kelly