MVAPICH/MVAPICH2 Project
Ohio State University



Checkpoint-Restart Impact on Applications

Performance Impact of Checkpoint-Restart on Applications

  • Experimental Testbed: 128 Intel Westmere servers having 8 processing cores and a Mellanox MT26428 InfiniBand QDR Adapter each, running RHEL 6. The OpenFabrics OFED 1.5.3 InfiniBand subnet management stack is used on the system. A Lustre parallel file system with 8 Object Storage Servers running over native InfiniBand is deployed to write application checkpoints generated by the checkpointing library BLCR v0.8.5.
  • The left side of the bar graph below shows the total runtime of High-Performance Linpack (HPL) application that was run with 512 MPI ranks, with varying number of checkpoint snapshots. For the input size used (N = 64000), the aggregate size of a single checkpoint is ~40GB. The bar on the right shows a breakdown of the time taken for different phases in the Checkpoint/Restart protocol. The performance overheads involved in the Checkpointing protocol is very minimal as seen in these graphs.

  • The performance impact of the Checkpoint-Restart mechanism provided with MVAPICH2 is illustrated in the graph below. The ENZO Cosmology simulation is used as a representative application for this purpose. The Radiation Transport sample workload provided with the application was executed using 512 MPI ranks. For the input parameters used, the aggregate size of a single checkpoint is ~13GB. The impact of the default Checkpoint-Restart mechanism in MVAPICH2, and the SCR-assisted multi-level checkpointing mechanism are shown in the graph below.

  • The SCR library implements three redundancy schemes which trade of performance, storage space, and reliability. The graph below compares the checkpointing-writing time of these different schemes against the default model of writing to a parallel file system. The aggregate checkpoint size for each of these runs that use 512 MPI ranks was ~50GB.