|Taisuke Boku, Professor, Director of CCS
Dr. Taisuke Boku got his Master and Ph. D. degrees from Department of Electrical Engineering, Keio University. After his career as an Assistant Professor in Department of Physics, Keio University, he moved to the University of Tsukuba and has been continuously working at Center for more than 25 years from its establishment and is serving as Director of CCS since 2019. He has participated in the development of most of the supercomputer systems in CCS such as CP-PACS, FIRST, PACS-CS and HA-PACS as a HPC system researcher. All of these systems were designed and developed through discussion and collaboration with researchers in other Center application divisions. Dr. Boku’s research interests include high performance interconnection networks, large scale cluster system design, parallel programming and tuning for real applications, and most recently, GPU and FPGA accelerated computing.
In order to respond to demands for cutting-edge, ultra high-speed, and large capacity computation resources for the computational sciences, the High Performance Computing Systems (HPCS) Division in the Center for Computational Science (CCS) is investigating a wide variety of HPC hardware and software systems. Through collaborative work with other application divisions in the Center, we are researching the creation of ideal HPC systems that are most suitable for application to real world problems.
Our research targets, which are spread across various HPC technologies, include high performance computing architecture, parallel programming language, massively parallel numerical algorithms and libraries, Graphics Processing Unit (GPU)-accelerated computing systems, large scale distributed storage systems, and grid/cloud technology. The following are among our more recent research topics.
A multi-hybrid accelerated computing system
Graphics processing units (GPUs) have been widely used in high-performance computing (HPC) systems as accelerators by their extreme peak performance and high memory bandwidth. However, the GPU does not work well on applications that employ complicated algorithms using partially poor parallelism, much of conditional branches or frequent inter-node communication. To address this problem, we combine field-programmable gate arrays (FPGAs) with GPUs and make FPGAs not only cover GPU on non-suited computation but also perform high-speed communication. This system concept is called Accelerator in Switch (AiS) and we believe that it offers better strong scaling. Our brand-new supercomputer Cygnus incorporates this concept and it has been in operation since April 2019. In future, we will propose a next generation accelerated computing system by applying this concept to computational application and demonstrating its effectiveness.
Fig.2 FPGA-to-FPGA communication performance on 100Gbps link (Left), GPU-FPGA communication performance with DMA transfer (Right)
XcalableMP is a PGAS-base large scale parallel programming language. In future supercomputer generations, which are expected to have millions of cores and distributed memory architecture, traditional message passing programming would strongly reduce software productivity. To maintain high levels of performance tuning in large-scale parallel programming ease, we have designed and implemented a new language named XcalableMP. This language provides OpenMP-like directive base extension to C and Fortran to permit a global view model of data array handling in the manner of Partitioned Global Address Space (PGAS) concept as well as local-view modeling when describing highly tuned parallel programs. We are also developing an extension of XcalableMP for accelerating devices named XMP-dev in order to cover, for example, large-scale GPU clusters.
Gfarm is a large-scale wide area distributed file system. In order to utilize computing resources spread widely throughout the world, high performance large-scale shared file systems are essential to process distribution freedom. Gfarm is an open-source distributed file system developed in our division that has the capability to support thousands of distributed nodes, tens of petabytes of distributed storage capacity, and thousands of file handling operations per second. With carefully designed system construction to ensure the metadata servers do not cause performance bottlenecks, it also provides a large degree of scalability for client counts. Gfarm is officially adopted as the High Performance Computing Infrastructure (HPCI) shared file system and provides nationwide access to and from any supercomputer in Japan.
Fig.3 Meta-data server scalability of PPMDS
Fig.4 The number of running cores during workflow execution with 2 configurations (Left: Conventional method of GPFS, Right: Proposed method with Pwrake + Gfarm)
High performance and large scale parallel numerical algorithms
FFTE is an open-source high performance parallel fast Fourier transform (FFT) library developed in our division that provides an automatic tuning feature that is available from PC clusters to MPP systems. We are developing Block Krylov subspace methods for computing high accuracy approximate solutions of linear systems with multiple right-hand sides. The Block GWBiCGSTAB method developed in our division can generate high accuracy approximate solutions than the conventional methods.
Fig. 5 Performance of parallel 1-D FFTs on Oakforest-PACS (1024 nodes) (Left). Comparison of the accuracy of the approximate solutions among three Block Krylov subspace methods (Right).
(Update: 2019 Dec. 18)