Taisuke Boku, Professor, Deputy Director of CCS
Dr. Taisuke Boku has been participating to the development of most of supercomputer systems in CCS such as CP-PACS, FIRST and PACS-CS as HPC system researcher. All of these systems are designed and developed under deep discussion and collaboration with researchers in application divisions in CCS. His research interests include high performance interconnection network, large scale cluster system design, parallel programming and tuning for real applications, and most recently, GPU-accelerated computing.
About our group
In order to respond to the demand on the ultra high speed and large capacity of computation resources for the cutting-edge computational sciences, we are investigating wide variety of HPC system hardware and software in High Performance Computing Systems Division. Through the collaborative work with other application divisions in the center, we are researching the ideal HPC systems practically suitable for these application problems in real world.
Our research targets spread in various HPC systems including high performance computing architecture, parallel programming language, massively parallel numerical algorithms and libraries, GPU-accelerated computing system, large scale distributed storage system and grid/cloud technology. The followings are the recent research topics.
PEARL: high performance, low power and reliable interconnection network
PEARL is an interconnection communication link based on PCI Express gen.2 technology to cover from low-power embedded system to high-performance clusters. We are developing a custom communication chip named PEACH to realize PEARL not only for inter-node connection but also for connecting intra-node peripheral devices such as GPUs.
Fig.1 PEARL network board with PCI-E interface for general PC servers with prototype PEACH chip (left) and communication performance with various number of lanes (right).
XcalableMP: next generation large scale parallel programming language
In next generation supercomputers with hundreds of thousands of nodes with distributed memory architecture, traditional message passing programming strongly reduces the software productivity. To keep the high degree of performance tuning freedom as well as easiness of large scale parallel programming, we are designing and implementing a new language named XcalableMP. It provides OpenMP-like directive base extension to C and Fortran for global view model of data array handling as well as PGAS-like local view model to describe highly tuned parallel programs.
Gfarm: large scale wide area distributed file system
In order to utilize widely spread computing resources in the world, high performance large scale shared file system is essential for freedom of process distribution. Gfarm is an open-source distributed file system developed in our division, with the capability to support hundreds of distributed nodes, hundreds TByte of distributed storage capacity and thousands of file handling per second. With carefully designed system construction, the meta-data servers do not cause the performance bottleneck, and it provides a large scalability for client counts.
High performance and large scale parallel numerical algorithms
FFT-E is an open-source high performance parallel FFT library developed in our division with automatic tuning feature which isavailable from PC clusters to MPP systems. Our recent research on Block Krylov subspace iterative method introducing newly developed Block BiCGGR method provides both high accuracy solutions and low iteration counts for linear system with multiple right-hand sides. This method is actually used to solve large scale QCD problem in the center.
Fig.2 3-D FFT performance on T2K-Tsukuba (left) and Residual Reduction in Block BiCGGR method with Jacobi preconditioning (right)