Impact of collective tuning for applications using MPI+OpenMP programming model
Application execution time with 512 cores on Stampede
- Partial subscription nature of hybrid MPI+OpenMP programming requires a new level of collectives tuning
- For PPN=2 (Processes Per Node), the tuned version of MPI_Reduce shows 51% improvement on 2,048 cores
- We see 4% with LULESH applications on 512 cores (8 OpenMP threads per MPI processes)
Library Version: MVAPICH2 2.2b
Runtime Flags: The appropriate tuning parameters for hybrid MPI+OpenMP programming models is enabled by default starting from MVAPICH2-2.2b onward
System Details: Stampede@ TACC: Sandybridge architecture with dual 8-cores nodes and ConnectX-3 FDR InfiniBand interconnect
Tags: Lulesh
Submitted by Jerome Vienne and Carlos Rosales-Fernandez @ TACC
Last Modified April 19, 2016, 10:19 a.m.