The explosion of big data will cause, in turn, a demand for new skills and new occupations – data scientists, computer scientists, statisticians, and researchers alike. In order to help prepare a future generation of scientists, the Blue Waters Undergraduate Petascale Education Program is teaching students how to use petascale-class systems.
The program helps students learn about the architecture of the 10 petaflop IBM Blue Waters supercomputer that will be online at the University of Illinois at Urbana-Champaign in 2011. It also helps them learn the programming languages they will need in order to use the supercomputer.
The program is a collaboration between the National Center for Supercomputing Applications (NCSA) and Shodor, a computational science education organization. The program gives students two weeks over the summer to work with experts in the field, as well as a year-long mentorship with professors at their home institutions. In doing so, students get to work on research projects that require this sort of massive computer simulation in fields like chemistry, biology, and astrophysics.
Big Data, Big Student Projects
Michael Laielli, an undergraduate at Stockton College in New Jersey been participating in the program. His project “Blue Waters: Towards Petascale Simulations of Sediment Fate and Transport in Rivers” creates a model as to how sediment particles in rivers travel downstream. As the research involves billions of particles, the computation is impossible to perform on a standard PC. And with the help of the petascale program, Laielli has constructed his model to run on many computer cores simultaneously. Laielli says that one of the goals has been to configure the model and the code to take advantage of parallel processing and particularly the Blue Waters computing system, and what he’s learned from the program has helped him both define his research goals and design a strategy to get there.
Laielli says he’s learned a lot about working in Linux, C, and MPI as well as learning new languages languages, including OpenMP and CUDA. “Regardless of what language I’m programming in,” he says, “the general skills and concepts of parallel programming I’ve gained have been the greatest asset to our project. I will take with me and expand on them throughout my future work.”
While having the opportunity to work at the petabyte scale is amazing, I asked Laielli what the biggest challenge was. His response: the “unforeseen problems that occur while working with multiple processors. We have these amazingly complex machines working for us, but it’s hard to imagine everything that’s going on, and it’s not like we can just open it up and see where the problem is. When we see something amiss in the data, we must brainstorm, research, and work pretty hard to seek out the source of our problem. But, that’s when and where the real learning begins.”
The “real learning” that the Blue Waters Undergraduate Petascale Education Program offers is meant to equip the next generation of scientists with the ability to handle massive amounts of data and massive amounts of processing power – a skill that is sure to be in demand with the explosion of big data.