Last August we quoted outgoing Google CEO Eric Schmidt saying “There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” RJMetrics co-founder and CEO Robert J. Moore calls tk. According to Moore, a more honest quote would have been “23 Exabytes of information was recorded and replicated in 2002. We now record and transfer that much information every 7 days.” A lot less impressive a figure, huh?
Moore writes that he used Schmidt’s figure in a talk about big data at TEDxPhilly. But when he went looking for a primary source, he discovered that it’s unlikely.
Moore believes that the claim that five exabytes of data is created every two days comes from a May 2010 IDC report titled “The Digital Universe Decade – Are You Ready?” The report predicted that 1.2 zettabytes of digital information would be created or replicated in 2010. That’s 1,228.8 exabytes, or about 6.7 exabytes every two days. So actually a bit more than Schmidt’s number.
Moore had a harder time tracking down a source for the claim that only five exabytes of data had been created from the dawn of time until 2003. But he did find a 2003 UC Berkley study titled “How Much Information?” The study claims that “Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks.”
That figure apparently includes non-digital information, and doesn’t include the rate of replication. But it does suggest that far more than five exabytes of information was produced before 2003.
One commenter on Moore’s blog suggests that if the creation of information has been exponential since the dawn of time, then Schmidt’s claim might not be so far off. But Moore counters:
You make a totally valid point. The only thing I don’t totally agree with is that the increase in information generation and transfer is exponential since the dawn of the written human record.
You said we have “every reason to believe that has been the case,” but the 2003 study estimated 30% annual growth in the number between 1999 and 2002. That puts those four years alone around 3x of the 2002 number he was using. I understand there is steep decay as you go back in time, but I don’t think its unrealistic to think that all recorded history might represent 5x 2002’s number or more, putting Schmidt’s statement 20x off the mark.
What exactly was Schmidt saying, and is Moore’s speculation as to Schmidt’s sources correct? We’ve previously cited Google VP Marissa Mayer’s claim that there was five exabytes of data online in 2002 and that number rose to 281 exabytes in 2009. Could it be that Schmidt actually meant five exabytes of data had been put online between the dawn of time and 2003, not that five exabytes were created between the dawn of time and 2003?
Looking at the studies Moore cites it’s clear that no one really knows how much information is out there or how quickly it’s actually being produce, but everyone is certain it’s being produced increasingly quickly.
Photo credit: nasa1fan/MSFC