If pervasive networks, cloud computing and big data are going to deliver on their promise – to make increasingly vast quantities of information always available and understandable – something is going to have to be done about existing network architecture bottlenecks.
This quiet but uncomfortable truth permeates the discussion of technology use across many sectors. A report released Monday by the State Educational Technology Directors Association called “The Broadband Imperative” specifically identifies the need for more external broadband in today’s schools, citing “at least 100Mbps per 1,000 students/staff” as a 2014-15 school year target, and 1Gbps per 1,000 students/staff for the 2017-18 academic year.
The report called for even higher internal networking specs: The minimum internal WAN connections between schools and the district should be 1Gbps/1,000 for 2014-15, and 10Gbps/1,000 for the 2017-18 target. Most schools, of course, don’t even approach those speeds. Nearly 80% of survey respondents said their school’s broadband connections were inadequate to meet their current needs.
The reason for the need in schools? The increase in online curriculum and multimedia content that the organization expects will be delivered to the classroom.
The bandwidth squeeze in schools is hardly unique. The problem is being felt in the enterprise as well. As more organizations move their infrastructure to the cloud and replace traditional software with Software-as-a-Service (SaaS) solutions, the extra network demands are saturating networks and actually slowing down the user experience.
But nowhere is the bottleneck of networks and wide-area network (WAN) connectivity being felt as strongly as the big data sector.
MapR’s Jack Norris sees this pain all the time. MapR, like Cloudera and Hortonworks, is a commercial vendor of Apache Hadoop, the open source clustering storage tool that so many associate with big data. MapR works with companies every day to improve data processes.
“For many organizations, it takes longer to move the data across the network from the storage clusters to the compute servers than it does to perform the analysis,” Norris, MapR’s VP of marketing said. “Particularly when dealing with large data such as clickstreams, sensor data, credit card transactions, ad impressions, genomic data, etc. It makes more sense to perform data and compute together and send the results over the network.”
Fortunately, Hadoop does not typically replace existing data infrastructure, like relational databases and data warehouses. Rather, Hadoop augments these existing tools. This is helpful to the network bottleneck problem, because more data is staying put and being processed in situ within Hadoop, thus reducing network traffic.
Even Hadoop Can’t Save Us
But even with Hadoop doing more heavy lifting, big data infrastructures are starting to take network performance hits.
Cloud computing, SaaS and big data will all feel the weight of network capacity limits, brought on by much more data, aging networking technology and edge-of-network problems like bufferbloat. (For more on this issue, check out Jim Gettys’ blog.)
It’s a problem that engineers at networking companies like Cisco are working hard to address, by rethinking the way networks work, using traffic classification and engineering methods.
Until a solution arrives, however, the potential benefits of cloud computing and big data solutions will be compromised by the reality of the networks used to access them.