Amazon EC2 Now #42 Supercomputer, IBM BlueGenes in the Dust

The question among both cloud computing consumers and supercomputer clients alike has been when the distinction between the little cloud and the big iron would disappear. Apparently that boundary evaporated several months ago. In its twice-annual survey of big computer power, the University of Mannheim has reported that Amazon’s EC2 Compute Cluster – the same one you and I can rent space and time on today – performs well enough to be ranked #42 among the world’s Top 500 supercomputers.

How far down is #42? In terms of time, not far at all. When EC2 was but a gleam in Jeff Bezos’ eye, Los Alamos National Laboratory’s BlueGene/L was king. Now, the 212,992-core beast ranks #22. Roadrunner, the amazing hybrid made up of 122,400 of both IBM Power and AMD Opteron cores, sits in #11. Meanwhile, EC2 – whose makeup is a little of this and a little of that – has achieved #42 status with only 17,024 cores.

Twice each year, the rankings of 500 of the world’s supercomputers are assessed by the University of Mannheim in association with Berkeley National Laboratory and the University of Tennessee, Knoxville. Those assessments use the industry standard Linpack benchmark, which measures floating-point performance. Supercomputers’ scores are sorted by tested clusters’ maximal observed peak performance, in gigaflops (GFlops, or billions of floating-point operations per second). This performance is called the “Rmax rating,” and computers are ranked on the Top 500 list according to this score. For comparison, Mannheim does publish theoretical peak performance (“Rpeak”), representing how fast each system’s architects believe it could or should perform. Dividing Rmax by Rpeak produces a yield ranking, which represents how well each system is performing to engineers’ expectations.

EC2’s yield is not particularly great, just a 67.8. By comparison, the winner and still champion in the November Top 2011 list belongs to a machine simply called “K” (shown above), assembled for the RIKEN Advanced Institute for Computational Science in Kobe, Japan. Its yield is an astonishing 93.17. Cloud architectures are not known for their processor efficiency; oftentimes they’re “Frankenstein” machines cobbled together with available parts, but marshaled by a strong, nimble, and adaptive cloud OS.

The Linpack Rmax score for EC2 topped out at 240,090 – almost a quarter of a petaflop. LANL’s Roadrunner was declared to have broken the one petaflop barrier (one thousand trillion floating-point operations per second) in June 2008. Japan’s “K” has now shattered the 10 petaflop barrier with an Rmax score of 10,510,000.

At this rate, EC2 actually may not catch up with the top 20; the rate at which the world’s supercomputers are improving in both speed and efficiency outpaces cloud clusters. What’s interesting about the latest turns of events in the November 2011 rankings is how processors made for supercomputers are outpacing clusters made with commercial off-the-shelf (COTS) processors like Intel Xeon, AMD Opteron, and IBM Power. “K,” for example, is made up of 352,512 dual-core Fujitsu SPARC64 chips, which run two threads in parallel per core, and which include an arithmetic unit separate from the instruction control unit in each core. They’re made for Fujitsu mainframes.

Faster optical interconnects between the chips, as opposed to commercial sockets, also account for huge speed gains. The first leap forward in interconnect technology took place a quarter-century ago, when computer designers began using four-dimensional mapping to link nodes to other nodes. The theoretical shape formed was a “tesseract,” and the theoretical number of feasible connections – a term popularized by the late Dr. Carl Sagan – was “googleplex.” That’s the door from which the word “google” entered our common vernacular.

Fujitsu’s architecture for “K” is based on a theoretical six-dimensional torus, which reduces the hop count for processes between nodes by half or more, and which enables as many as 12 fault-tolerance failovers per node. That’s a feature cloud architects might want to take into account; the failover features that cloud operating systems like OpenStack retrofit into conventional clusters by force, supercomputers implement by design.

Although IBM made headlines last year by demonstrating how Watson could win at “Jeopardy!” the 16 BlueGene architecture machines still on the current list are sliding. The Jülich Supercomputer Center’s JUGENE system that took the lead two years ago has dropped to #13, with only #17, #22, and #23 now in the top 25.

Once shut out of the list entirely, perhaps the most historically significant name in supercomputing history is making a supreme comeback. Cray is now responsible for having constructed 27 of the clusters on the latest list, including the #3 Jaguar at Oak Ridge National Laboratories (Rmax score: 1,759,000) plus #6, #8, #11, #12, #19, and #20. Cray’s surge also represents a boon for AMD, since Cray uses Opteron CPUs exclusively.

The United States still maintains 263 of the Top 500, with Jaguar being the fastest. But China is surging forward too with 74 clusters, Japan with 30, and South Korea with 3, its fastest being a Cray at #31.

For those of you at home still keeping score, Windows is almost out of the picture entirely. It powers only the #58 supercomputer, and that one was built in 2008: a 30,720-core Opteron cluster operated by the Shanghai Supercomputing Center. BSD powers the #94 machine, and Unix powers 30 systems. The rest belong to Linux. Most of those clusters, by the way, use IBM Power processors. Some 49 of the Top 500 run on Power chips, while AMDs power 63 clusters. Intel has the lion’s share with 384 clusters. Among them, 239 use the latest Core architecture, 141 the earlier x86 architecture, and only 4 now are on Itanium.