HA-PACS Base Cluster

HA-PACS: High bandwidth GPU cluster for computational sciences

New generation GPU cluster with rich I/O bandwidth

HA-PACS (Highly Accelerated Parallel Advanced System for Computational Sciences) is the 8th generation of PACS/PAX series supercomputer in CCS, University of Tsukuba. For the development and product-run on cutting edge scientific computations toward next generation accelerated computing, it is equipped with the latest GPUs and CPUs connected by new generation of PCI-express to provide rich I/O bandwidth. Two sockets of Intel Sandy Bridge-EP CPUs support full bandwidth connection of four NVIDIA M2090 GPUs without performance bottleneck. Interconnection network employs dual-rail Infiniband QDR with a full bisection bandwidth Fat-Tree configuration.
The system will be delivered on January 2012 with 802 TFLOPS of peak performance.

System configuration of HA-PACS

System Specification

Item	Specification
Peak performance	802 TFLOPS (GPU: 713 TF, CPU: 89 TF)
# of nodes	268
File system	Lustre, 504 TB user area (DDN SFA10000 ExaScaler)
Infiniband network switch	288 port QDR x 2 (Mellanox IS5300)
Total network bandwidth	2.14 TB/s
Language	Fortran90, C, C++
MPI	MVAPICH2, Intel MPI, OpenMPI
System Management	Appro Cluster Engine, PBSpro

Computation node of HA-PACS

Block diagram of computation node of HA-PACS

Specification of computation node

Item	Specification
Computation node	Appro Xtreme-X with four GPUs
CPU	Intel ES (Sandy Bridge EP)
# of cores	8 cores/socket x 2 sockets = 16 cores/node
Clock	2.6 GHz
Peak performance	332.8 GFLOPS/node
PCI-express	generation 3 x 80 lanes (40 lanes/CPU)
Memory	128 GB, DDR3 1600MHz, 4 channel/socket, 102.8 GB/s/node
GPU	NVIDIA M2090
# of GPUs/node	4
Peak performance	2660 GFLOPS/node (665 GF/GPU)
Memory	24 GB/node (6 GB/GPU)
Interconnection	Infiniband QDR x 2 rails (Mellanox ConnectX-3 dual head)