HA-PACS: High bandwidth GPU cluster for computational sciences
New generation GPU cluster with rich I/O bandwidth
HA-PACS (Highly Accelerated Parallel Advanced System for Computational Sciences) is the 8th generation of PACS/PAX series supercomputer in CCS, University of Tsukuba. For the development and product-run on cutting edge scientific computations toward next generation accelerated computing, it is equipped with the latest GPUs and CPUs connected by new generation of PCI-express to provide rich I/O bandwidth. Two sockets of Intel Sandy Bridge-EP CPUs support full bandwidth connection of four NVIDIA M2090 GPUs without performance bottleneck. Interconnection network employs dual-rail Infiniband QDR with a full bisection bandwidth Fat-Tree configuration.
The system will be delivered on January 2012 with 802 TFLOPS of peak performance.

System configuration of HA-PACS
System Specification
| Item | Specification |
|---|---|
| Peak performance | 802 TFLOPS (GPU: 713 TF, CPU: 89 TF) |
| # of nodes | 268 |
| File system | Lustre, 504 TB user area (DDN SFA10000 ExaScaler) |
| Infiniband network switch | 288 port QDR x 2 (Mellanox IS5300) |
| Total network bandwidth | 2.14 TB/s |
| Language | Fortran90, C, C++ |
| MPI | MVAPICH2, Intel MPI, OpenMPI |
| System Management | Appro Cluster Engine, PBSpro |
Computation node of HA-PACS

Block diagram of computation node of HA-PACS
Specification of computation node
| Item | Specification |
|---|---|
| Computation node | Appro Xtreme-X with four GPUs |
| CPU | Intel ES (Sandy Bridge EP) |
| # of cores | 8 cores/socket x 2 sockets = 16 cores/node |
| Clock | 2.6 GHz |
| Peak performance | 332.8 GFLOPS/node |
| PCI-express | generation 3 x 80 lanes (40 lanes/CPU) |
| Memory | 128 GB, DDR3 1600MHz, 4 channel/socket, 102.8 GB/s/node |
| GPU | NVIDIA M2090 |
| # of GPUs/node | 4 |
| Peak performance | 2660 GFLOPS/node (665 GF/GPU) |
| Memory | 24 GB/node (6 GB/GPU) |
| Interconnection | Infiniband QDR x 2 rails (Mellanox ConnectX-3 dual head) |
