CERN today officially unveiled the massive computer network that will crunch the enormous amount of data coming from CERN’s Large Hadron Collider (LHC). CERN expects that the LHC will produce around 15 petabytes of data every year. While the LHC was in its planning stages, CERN’s IT department decided that the only realistic way to handle this amount of data would be by relying on the then still novel idea of grid computing. CERN’s grid consists of 100,000 processors at 140 scientific institutions in 33 countries.
How to Crunch 15 Petabytes of Data?
As Science reported last month (subscription required), CERN’s IT department quickly realized that no known data center could handle the amount of information the LHC would create. It was not even clear that Geneva’s power grid could supply the energy necessary to run this massive data center. In addition, most of the money for the LHC project was going toward the collider itself, so that very little funding was left for the actual computing resources.
In order to distribute this data, CERN relies on dedicated 10Gbit/s fiber-optic lines that connect CERN with the 11 Tier-1 data centers on the grid. The Tier-1 data centers (pdf) will do some processing, but will also function as the main archives for the LHC data. These Tier-1 centers then farm out a large part of the actual data crunching to the Tier-2 data centers spread around the world. The Tier-2 centers are connected to the grid via regular, public Internet connections.
Large Hadron Collider @ Home
While grid computing has been around for quite a while now and has been implemented successfully on the public Internet by projects like SETI@home or Folding@home, CERN’s grid is most likely the largest and most powerful grid established for scientific research so far.
CERN has also set up a project similar to Folding@home called (somewhat unimaginatively) LHC@home, which, thanks to the current shut-down of the LHC does not have much to do right now, but will allow individuals to contribute to CERN’s efforts by donating computing time on their own computers.
Image of CERN Computer Center used courtesy of CERN.