Tech Break - GPU Supercomputing: On-campus and on your computer Ian Littman The CTLM basement houses Ra, Mines's 268-node, twelve-rack Dell-powered supercomputer, devoted to energy research. While this machine's 17-teralop sustained performance is high, a new system, just 1.75 inches tall on its own, can accelerate a single server to the performance of two racks of normal equipment. The system is an nVidia Tesla S1070, a $7500 machine that connects to a standard server and brings a whopping 960 processing cores to bear on problems that would normally require a system of much greater size.
The catch: 960 1.44 gigahertz processors are harder to code for than, for example, the eight 2.66 GHz cores found on the average Ra node. "Parallelization is difficult," says Tim Kaiser, Director of Research and High Performance Computing Support. Additionally, "[t]he memory hierarchy is such that you really need to have stuff in what they call shared memory, and shared memory is fairly small. So you need to think about how to write your algorithms." The nVidia GPGPUs (General Purpose GPUs) in the Tesla node have memory that is segregated into portions with performance that differs by two orders of magnitude, so programmers must be aware of how they work with data on what are effectively souped-up, high-memory graphics cards.
That said, graphics processors like the Tesla (and consumer-grade cards found in most new computers) are becoming increasingly easy to program for, thanks to standardized high-level languages like nVidia's CUDA (Compute Unified Device Architecture) and the cross-platform OpenCL. All these, plus software compiler enhancements to detect and exploit parallelism in everyday code, serve to make the process of unlocking parallel processing power on such machines a lot easier. "[W]e will shortly have available on this machine the Portland Group compilers, which should enable you to do some…directive based parallelism, which is similar to OpenMP," says Kaiser of the development environment Mines is setting up on its Tesla-connected server. "[E]ssentially you tell the compiler that you think that the next loop can be parallelized, and you leave it up to the compiler to do it for you. It’s semi-automatic parallelism."
What can students and researchers do with this sort of power, properly harnessed? Quite a lot, actually. From fluid dynamics to signal processing to seismic imaging, GPU-powered compute nodes like the Tesla system can speed up processing by as much as 100x, if not more. For the technically minded, the current Tesla system pumps out 4.14 teraflops of single-precision floating point performance while using a mere 800W of wall power, barely more than the power consumption of an 85-gigaflop standard server. Even on slower double-precision processing, a Tesla system is more powerful than the equivalent-power Intel system. Fortunately for Mines however, the vast majority of the Tesla's workload will be single-precision "[b]ecause a lot of data is taken that is eight-bit [from] analog to digital converters," explains Kaiser. "There are very few 64-bit analog-to-digital converters. So your starting data is…in general single precision."
The greatest thing about having a GPU powerhouse available on the Mines campus is that it's available for student consumption. Between two high-level MCS courses being offered this semester, about twenty-five students are already going to have use of the system. "[W]e want people to…get the experience…but we also want to find out if people can make good use of this machine, to justify buying more of them," says Kaiser of the system, dubbed cuda1 in reflection of its primary programming language. "If nobody uses it there’s no point in buying more, but if…everybody wants to use it, I’ll go back and I’ll ask more of them." More information can be found about the nVidia Tesla system at http://geco.mines.edu/tesla.
The field of graphics card computing for massively parallel power is advancing quickly, with AMD's release of their two-teraflop Radeon 5870 card happening only a couple of weeks ago. With a move toward a standardized language across graphics processor brands and a build-out-not-up mentality where parallelization takes over performance increases rather than ridiculously high processor clock speeds, the sky is the limit for high performance, high-efficiency computing. Even better: your computer's graphics card might even be such a powerhouse, if not now then in a few month's time. One thing's for certain: graphics cards aren't just for gaming anymore.









Comments