At Center for Computational Sciences (CCS), University of Tsukuba, we have been developed and deployed several supercomputers in PC cluster architecture with accelerators such as GPU or many-core processor. As accelerated supercomputer with many-core architecture, we have been operating COMA cluster with Intel Xeon Phi (Knights Corner) which will be shut down on March 2019. Programming and utilization technology of Intel Xeon Phi is transferred to Oakforet-PACS system which is operated under collaboration with University of Tsukuba and University of Tokyo. On the other hand, we had been operating another supercomputer HA-PACS and HA-PACS/TCA with GPU accelerators, which were shut down on March and October 2018, respectively.
At CCS, we will introduce the next generation accelerated supercomputer, based on our experience to use GPU clusters, with additional accelerator technology of FPGA, which is named Cygnus. All computation nodes of Cygnus are equipped with the most advanced GPUs as four devices per node. Moreover, about half of nodes are additionally equipped with advanced FPGAs where two FPGA devices and four GPUs work together in each node.
Merging two different type of accelerators, GPU and FPGA, in each node to dedicate to parallel processing, we will challenge applications where GPU acceleration is not sufficient by several reasons to be compensated by FPGA technology, toward the new generation of strong scaling supercomputing. Cygnus is the world first GPU-FPGA equipped supercomputer to be opened for public use by various fields of application users.
Oakforest-PACS is operated by Joint Center for Advanced High Performance Computing (JCAHPC).
–> more detail (external link)
COMA (Cluster Of Many-core Architecture processor) is the 9th generation of PACS series supercomputer, and started its operation from April 2014. Intel Xeon Phi coprocessor enhances the performance of traditional multi-core CPU increasing the number of cores, and COMA is our second Pflops-class supercomputer with accelerators with 1.001 Pflops of theoretical peak, which was ranked as no. 51 in Top 500 list issued on June 2014. The system consists of Cray CS300 cluster with 393 of computation nodes and each node consists of two of Intel Xeon E5-2680v2 CPUs and two of Intel Xeon Phi 7110P coprocessors. All the nodes are connected by full-bisection bandwidth of Fat-Tree network by InfiniBand FDR. It is equipped with 1.5 PByte of RAID6 Lustre shared file systen. CCS and Information Technology Center at the University of Tokyo plan to introduce a massively parallel supercomputer based on many-core architecture processor at JCAHPC (Joint Center for Advanced HPC) which is a joint organization by two centers. We will develop various large scale computational science applications based on our experience on COMA, especially focusing on particle physics, astrophysics and biological science.
Shutdown of HA-PACS Base Cluster was done on March 2017. HA-PACS/TCA is operating.
The Highly Accelerated Parallel Advanced system for Computational Sciences (HA-PACS) Base Cluster System is a GPU(Graphic Processing Unit) cluster that incorporates the latest CPU and GPU technologies. Capable of providing 802 Tflops of peak performance with just 268 computation nodes, the system was ranked at number 41 on the June 2012 Top 500 List. Each node of the system is based on the GreenBlade 8200 series produced by Appro International Co., and consists of two Intel E5-2680 (8 core, SandyBridge-EP) processors operating at 2.6 GHz as the CPU, and four NVIDIA M2090 processors as the GPU. The theoretical peak performance of one node is approximately 3 Tflops. All nodes are joined via the interconnection network with dual-rail InfiniBand QDR × 4 in a Fat-Tree configuration to provide 2.14 TByte/s of bisection bandwidth. The full system, which entered service in February 2012, is dedicated to the development of application on state-of-the-art computational sciences that require the accelerated computing performance provided by such large scale GPU clusters. This Base Cluster system is finally extended with additional nodes that include a special feature named Tightly Coupled Accelerators (TCA) architecture, which enables direct communication between GPUs over computation nodes by the external link PCI Express. This TCA system, which is currently under development at CCS is based on field programmable gate array (FPGA) technology and is the main focus of the HA-PACS Project, which is supported by the Ministry of Education, Culture, Science, Sport and Technology (MEXT). The extended system, named HA-PACS/TCA, was completed on October 2013 with 364 Tflops of peak performance, and the entire system performance reached to 1.166 Pflops.
HA-PACS/TCA Movie is available in YouTube
Shutdown of T2K was done on February 2014.
T2K-Tsukuba is a large scale PC cluster with a large capacity of memory, high performance computational nodes in high density packaging, and ultra high-bandwidth interconnection network to maximize these capabilities. A computational node consists of four sockets of 2.3GHz quad-core AMD Opteron and 32GB of DDR2 SDRAM memory. For wide variety of high performance parallel processing, all the nodes are connected by quad-rail of 4xDDR Infiniband interconnection network. The system is configured as a single PC cluster with 648 of computational nodes connected by Fat-Tree network with 5.28TB/s of bisection bandwidth, and the total number of CPU cores is 10,368 to provide 95.4 TFLOPS of peak performance and 20.7TB of total memory capacity. The parallel processing with 16 cores in shared memory on multi-core/multi-socket computational node and the maximum of 648 of these nodes in distributed memory configuration, or the mixture of them provides the ultra high performance to support variety of high-end computational sciences. The high-bandwidth of flat Fat-Tree network also realizes the effective parallel processing and flexible job scheduling T2K-Tsukuba was ranked at number 20th in the TOP500 list on June 2008. The system was installed under the T2K Open Supercomputer Alliance with the University of Tokyo and Kyoto University, and the grid operation is available with other T2K systems in these universities sharing the same basic system architecture with T2K-Tsukuba.
Shutdown of FIRST was done.
For a large scale simulation on the objects generation in the universe, the high-speed computation of the gravity between all the materials as well as the dynamics of them and radiative energy transportation. For this purpose, we developed a specialized PC cluster named “Astrophyics Simulator FIRST” with a special purpose gravity computation board Blade-GRAPE to attach all the computational nodes, and started its operation on March 2007. The FIRST Cluster consists of 256 of computational nodes with 2 Intel Xeon CPUs and a Blade-GRAPE board, which is connected to two file servers providing 4.5TB of disk capacity in total. It provides 38.5 TFLOPS of peak performance (3.3 TFLOPS by general purpose CPUs and 35.2 TFLOPS by Blade-GRAPE) and 1.6TB of memory capacity. Beside the centralized shared file system, the collection of distributed local disk drives of all computational nodes constructs a logically shared file system driven by Gfarm system, providing 22TB of capacity in total. FIRST is a very unique heterogeneous computing system in the world to realize very high resolution of astrophysics simulation, and various and valuable results have been achieved.
Shutdown of PACS-CS was done on September 2011. Succession machine “HA-PACS” will be start operation from February 2012.
The PACS-CS system is a PC Cluster which consists of 2560 nodes, connected by 20480 Gigabit Ethernet cables. The system achieved 10.35 TFLOPS in the Linpack Benchmark, ranking 34th on the June 2006 Top 500 List. To support large-scale parallel computation of computational sciences, the system was designed in the bandwidth-aware way: a node is equipped with a single CPU for high memory bandwidth, and the interconnection network for parallel processing is configured as a multi-dimensional Hyper-Crossbar Network based on trunking of Gigabit Ethernet for high network bandwidth and cost-performance. The project aims not just developing a new cluster computer for computational sciences, but it also tried to foster interdisciplinary collaborations. The full system started operation on 1 July of 2006, and a formal commissioning was made on 3 September of 2007. Hitachi Ltd. was the main contractor of the system hardware and software and Fujitsu Ltd. was a developer of a network software in August 2005. The development of PACS-CS was carried out by a close collaboration of the researchers of the Center and the two vendors.