133rd Colloquium of Center for Computational Sciences

133rd Colloquium

Title: I/O on Hierarchical Storage Systems: The Past, Present, and Future

Speaker: Dr. Kathryn Mohror (Lawrence Livermore National Laboratory)

Date: February 12, 2020 (Wed)

Time: 13:00-14:00

Venue: Center for Computational Sciences, International Workshop Room

Language: English

High-end supercomputing systems generally achieve increased computing
speeds by increasing the number of computing cores in the system. This
strategy is successful in achieving system FLOP goals, but as
applications running on theses systems complete their compute
activities more quickly or at a finer granularity, they ingest and
produce data at faster rates. The performance of data management tasks
is critical for achieving practical scientific throughput on
leadership class systems, but data management infrastructure is
generally an afterthought compared to the spotlight given to compute
infrastructure.  In this talk, I will discuss issues surrounding I/O
and data management performance on high-end systems and several
strategies that we have employed over the years to address these
issues. In particular, I will discuss multilevel checkpointing with
the Scalable Checkpoint/Restart Library (SCR) and disjoint storage
utilization with the UnifyFS burst buffer file system. Additionally, I
will briefly discuss our plans for supporting more complex application
workflows on future platforms.


Kathryn Mohror is the Group Leader for the Data Analysis Group in the
Center for Applied Scientific Computing (CASC) at Lawrence Livermore
National Laboratory (LLNL). Kathryn’s research on high-end computing
systems is currently focused on I/O for extreme scale systems. Her
other research interests include scalable performance analysis and
tuning, fault tolerance, and parallel programming paradigms. Kathryn
has been working at LLNL since 2010 and is a 2019 recipient of the DOE
Early Career Award.

Kathryn’s current research focuses primarily on user-level file
systems for HPC in the Unify project and on scalable I/O with the
Scalable Checkpoint/Restart Library (SCR), an R&D100 Award-winning
multilevel checkpointing library that has been shown to significantly
reduce checkpointing overhead. She is also a Co-Chair of the
Administrative Steering Committee for PMIx, a portable interface for
tools and applications to interact with system management
software. She was the lead for the Tools Working Group for the MPI
Forum from 2013-2019 and served as the Scientific Editor for LLNL’s
Science & Technology Review in 2018.

Kathryn received her Ph.D. in Computer Science in 2010, an M.S. in
Computer Science in 2004, and a B.S. in Chemistry in 1999 from
Portland State University (PSU) in Portland, OR.

Coordinator :Osamu Tatebe