HA-PACS: High bandwidth GPU cluster for computational sciences
New generation GPU cluster with rich I/O bandwidth
HA-PACS (Highly Accelerated Parallel Advanced System for Computational Sciences) is the 8th generation of PACS/PAX series supercomputer in CCS, University of Tsukuba. For the development and product-run on cutting edge scientific computations toward next generation accelerated computing, it is equipped with the latest GPUs and CPUs connected by new generation of PCI-express to provide rich I/O bandwidth. Two sockets of Intel Sandy Bridge-EP CPUs support full bandwidth connection of four NVIDIA M2090 GPUs without performance bottleneck. Interconnection network employs dual-rail Infiniband QDR with a full bisection bandwidth Fat-Tree configuration.
The system will be delivered on January 2012 with 802 TFLOPS of peak performance.
System configuration of HA-PACS
System Specification
Item | Specification |
---|---|
Peak performance | 802 TFLOPS (GPU: 713 TF, CPU: 89 TF) |
# of nodes | 268 |
File system | Lustre, 504 TB user area (DDN SFA10000 ExaScaler) |
Infiniband network switch | 288 port QDR x 2 (Mellanox IS5300) |
Total network bandwidth | 2.14 TB/s |
Language | Fortran90, C, C++ |
MPI | MVAPICH2, Intel MPI, OpenMPI |
System Management | Appro Cluster Engine, PBSpro |
Computation node of HA-PACS
Block diagram of computation node of HA-PACS
Specification of computation node
Item | Specification |
---|---|
Computation node | Appro Xtreme-X with four GPUs |
CPU | Intel ES (Sandy Bridge EP) |
# of cores | 8 cores/socket x 2 sockets = 16 cores/node |
Clock | 2.6 GHz |
Peak performance | 332.8 GFLOPS/node |
PCI-express | generation 3 x 80 lanes (40 lanes/CPU) |
Memory | 128 GB, DDR3 1600MHz, 4 channel/socket, 102.8 GB/s/node |
GPU | NVIDIA M2090 |
# of GPUs/node | 4 |
Peak performance | 2660 GFLOPS/node (665 GF/GPU) |
Memory | 24 GB/node (6 GB/GPU) |
Interconnection | Infiniband QDR x 2 rails (Mellanox ConnectX-3 dual head) |