Home Wikileaks Data Spurs App Development

Wikileaks Data Spurs App Development

While politicians, pundits, military, and journalists assess and debate the fallout from Wikileaks’ release of the “Afghan War Diary” – the legality and ethics of Wikileaks, its impact on the war efforts, the rise of the “world’s first stateless news organization” – a number of developers are diving right into the 91,000 some odd classified documents and seeing what they can do with the data.

And it’s a substantial chunk of data. The documents dated from 2004 to 2010 are available in HTML, CSV, or SQL formats, as well as several KML files. But even in the HTML format, reading through the Afghan War Diary is no easy task. This is no Stephen Ambrose-presentation of history. It’s raw data, with the following queries available: type, category, region, affiliation, date, severity.

Analyzing the Wikileaks Data Dump

Der Spiegel, the Guardian, and the New York Times received the data a month before Wikileaks took it public, and their researchers and journalists have sifted through the information to present their “news” narratives. The Guardian also offers its readers some interactive online tools to help them understand the documents. But now that the information is publicly accessible, the research and analysis of the data is distributed. On his blog Zero Intelligence Agents, NYU Politics Department grad student Drew Conway has started undertaking a statistical analysis of the data, for example. His scripts join the other projects like it that are being built and shared by developers.

Building the Wikileaks CouchApp

One such project is the Wikileaks CouchApp, created by CouchDB community member Benoit Chesneau. The app was built using a number of open source tools including CouchDB 1.0, GeoCouch, jQuery, Simile Timeline, and OpenLayer and is integrated with Google Maps. These tools allow for the Wikileaks documents, imported to CouchDB from the CSV file, to be categorized and queried with geospatial and temporal data. Scrolling through the Wikileaks CouchApp’s timeline allows you to browse the reports by date and plots them in a map below. Clicking on the map point displays a popup, where you can read some information about the report or click through to read it in its entirety.

Why CouchDB?

CouchDB is a post-relational document database. Unlike the strict schemas in relational databases, CouchDB is more flexible, storing data in a semi-structured fashion and using a JavaScript-based view model for generating report results. This flexibility allows users to make queries on demand, rather than being, in the words of CouchDB creator Damien Katz, restricted to “however somebody else cooked the database up.” You can do more with the data in CouchDB argues Katz, as you can write queries, including full-text engine ones.

But it’s not just the flexibility of CouchDB that makes it an interesting choice for a Wikileaks database. CouchDB is a peer-based distributed database system. In other words, any number of CouchDB hosts – both servers and offline clients – can have independent replica copies of the same database. These copies can be fully interactive with the ability to query, add, edit, and delete, and changes to the database can be replicated across the mirror copies in near real-time.

For businesses using CouchDB, the ability to reliably synchronize databases between multiple machines can better provide redundancy and aid load balancing. And in the case of the Afghan War Diary CouchDB app, it means these mirrored copies make it impossible to shut Wikileaks down. Currently the app is hosted on the CouchDB server and while copies have been replicated, neither Katz nor Chesneau know of any other publicly available copies.

Katz calls CouchDB the “information dissemination platform of the future.” Touting its security, its scalability, and its flexibility, as well as its rigorous security features, Katz thinks the entire Wikileaks site, not just this app, should move to CouchDB. As the US military demands Wikileaks “return all documents” and some call for the organization’s Swedish ISP to shut the site down, who knows what sorts of technical steps Wikileaks will take.

Tech Tools for a Data-Driven Future

As with any large dataset, the Wikileaks documents provide raw data rich for building analytical and visualization tools. But the Wikileaks data – its content, the means by which it was secured and disseminated – remains highly controversial. Noting the risks involved with “possessing” the Wikileaks documents, PhD candidate Drew Conway still chose to move forward with his analysis of the data, arguing that “with the proper analytical tools, this data may reveal insights to the predicates of conflict in ways that previous aggregate-level data could not.”

That desire to analyze, visualize, and disseminate information seems to be the motive behind several of the new Wikileaks tools, including the CouchApp. But that desire – as well as the need – should encourage the development of new tech tools, crucial if we are to make sense of “the coming data explosion.”

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.