It’s a valid question: “Why has all the data the government has been collecting turned out to be too big to handle?” The results of a U.S. and state government IT survey released this week by the public sector IT community MeriTalk sheds a bright, halogen spotlight on the answer: It’s because it’s being collected in an unfiltered format and is waiting for someone – anyone – to claim it and write viable applications for it.
If you’ve followed along with RWW’s expanding coverage of technology trends in the public sector, you’ll recall we take an occasional look at the MeriTalk survey, which is an ongoing dialogue with IT professionals in the U.S. federal government. In a report released Monday summarizing the views of 151 respondents, only 60% say they capture “big data” to any degree. All respondents in that segment said they then analyze the data, but when asked to qualify what that meant, fewer than half (49%, or about 44 respondents) said they extrapolate any meaningful trends from it. Some 28% of respondents collecting big data say they don’t do any collaboration with other agencies whatsoever to discover what it all means.
But that may not be the whole picture, as 31% of the IT professionals polled whose agencies are affiliated with the Department of Defense or with intelligence (about 23 in all) say they cannot even discuss the subject of big data. Among all agencies polled, approximately 52% say they are in the learning stage about what they can actually do with all this data. But when asked how long this phase should be expected to last, the average response was three years!
What will happen to government big data during that time? MeriTalk’s calculations conclude that the agencies it polled currently store some 1.61 petabytes (quadrillion bytes) of total data, both structured and unstructured. (MeriTalk estimates that 31% of the data collected by agencies polled is actually unstructured.) In an astounding conclusion based on its respondents’ estimates, MeriTalk predicts that the rate of expansion of public-sector data stores is about 0.51 petabytes per year. So by the time these agencies have completed their assessment of what to do with all the data they have collected throughout their history, the size of those data stores will have nearly doubled.
[Source: MeriTalk Government IT Network]
Based on the rough conversion table provided by WhatsAByte.com, we calculated that if each individual agency polled was to store all the data it’s currently collecting on paper, the (very) old-fashioned way, it would need to requisition at least 20 four-drawer filing cabinets every minute of every day simply to contain it all. When asked to select the top three challenges to being able to manage all this data from a long list, 40% of respondents checked storage capacity, 36% being able to distribute and share it, 35% being able to query the data in any form and 34% said having enough time to actually process the results.
Of those who could accurately estimate their current big data storage capacities, some 57% said it was already too late: The infrastructure is not in place for them to be able to work with what they have, and that includes cloud capacity.
What appears to have happened is this: During the past few years, once it became feasible for government agencies to begin amassing data from Internet-based sources rather than through direct collection or communication, they deployed the first generation of big-data tools immediately, with the idea that they’d plan for how to use it later. The result was like the plan for how to pay for the first wave of tax credits in 2001: We’ll do it now, and figure out how to pay for it when the time comes. The trick, of course, is to know when that time comes, but now more than half of federal IT workers speaking with MeriTalk say that time has already passed.
Lead image courtesy of Shutterstock.