◆We will have a booth talk session in Exhibition at SC19!
“FPGA-FPGA Optical Communication Feature”
We invited the three specialists who are developing some feature/technology to utilize recent FPGA chips with high speed (up to 100Gbps) optical links. It is a good opportunity to share the knowledge/experience/technology on how to utilize it on HPC and other applications.
Booth Talk Schedule @Booth #1943
Nov. 20 (Wed)
CCS, University of Tsukuba
“CIRCUS: Computation/Communication Unified Framework on FPGA and its optical link”
“MPI-style streaming messaging on FPGAs”
“Networks of FPGA Cluster with High Flexibility of Resource Allocation”
The latest FPGA provides multiple ports of very high performance external communication links as well as large capacity of logic elements for computation and memory. For various HPC applications, we are developing a framework to realize computation and communication unification in pipelined manner with easy API utilized on user-level OpenCL programming. This system is named CIRCUS (Communication Integrated Reconfigurable CompUting System) working on our new multi-hybrid cluster Cygnus in CCS, University of Tsukuba. In this talk we will present the concept and overview of the system with several performance evaluation.
Torsten Hoefler “MPI-style streaming messaging on FPGAs”
Abstract:Distributed memory programming is the established paradigm used in high-performance computing (HPC) systems, requiring explicit communication between nodes and devices. When FPGAs are deployed in distributed settings, communication is typically handled either by going through the host machine, sacrificing performance, or by streaming across fixed device-to-device connections, sacrificing flexibility. We present Streaming~Message~Interface~(SMI), a communication model and API that unifies explicit message passing with a hardware-oriented programming model, facilitating minimal-overhead, flexible, and productive inter-FPGA communication. Instead of bulk transmission, messages are streamed across the network during computation, allowing communication to be seamlessly integrated into pipelined designs. We present a high-level synthesis implementation of SMI targeting a dedicated FPGA interconnect, exposing runtime-configurable routing with support for arbitrary network topologies, and implement a set of distributed memory benchmarks. Using SMI, programmers can implement distributed, scalable HPC programs on reconfigurable hardware, without deviating from best practices for hardware design.
We are researching it with an experimental system of FPGA cluster, which has high flexibility in allocating resources of FPGAs to host CPUs. We introduce the progress of the system development.