Thursday, November 6, 2008


CUDA by NVIDIA is a new C language environment for developers to solve numerical processing problems on Graphical Processing Units(GPUs). The speedup of certain problems in astronomical such as an N-body problem can get up to a 100x faster. The GTX200 has 1.4 billion transistors and can reach 1000 GFLOPS as opposed to a quad core Xeon which reaches 96 GFLOPS. The reason is the difference in architecture. The CPU is designed for minimum latency, not throughput, something a GPU excels at and number crunching is essentially a throughput problem.The speaker Jonathan Cohen believes the scientific community and their problems are ideal for the GPU architecture. A few interesting features of the GPU include 240 processors split into groups of 8 to create shader multiprocessors(SM). Each SM has 6 SPs, 2 SFUs and and a DP. Note that Double precision (DP) processors are not needed for graphics/games and included for scientific reasons. The SFUs are optimised for sine/cosine/sqrt/exp operations, all common in scientific computing.

One application is transferring the cloud microphysics subroutine from WRF into CUDA and John Michalakes at NCAR reported a 1.3 speedup. This shows that it might be possible to build much cheaper clusters with more powerful GPUs and run climate models faster and for far less. Cohen is building up to a ROMS-like system for GPUs. A group in Italy built a computer with 8 GPUs (8 TFLOPS) for $6000. Think of the possibilities...

No comments: