MVAPICH-Plus Changelog ----------------------- This file briefly describes the changes to the MVAPICH-Plus software package. The logs are arranged in the "most recent first" order. * Features and Enhancements MVAPICH-Plus 3.0 (3.0 GA Released 03/08/2024) (3.0rc Released 12/22/2023) (3.0b Released 11/01/2023) (3.0a2 Released 07/19/2023) (3.0a Released 11/10/2022) * Features and Enhancements - Based on MVAPICH 3.0 - Support for various high-performance communication fabrics - InfiniBand, Slingshot-10/11, Omni-Path, OPX, RoCE, and Ethernet - Support naive CPU staging approach for collectives for small messages - Tune naive limits for the following systems - Frontier@OLCF, Pitzer@OSC, Owens@OSC, Ascend@OSC, Frontera@TACC, Lonestar6@TACC, ThetaGPU@ALCF, Polaris@ALCF, Tioga@LLNL - Initial support for blocking collectives on NVIDIA and AMD GPUs - Allgather, Allgatherv, Allreduce, Alltoall, Alltoallv, Bcast, Gather, Gatherv, Reduce, Reduce_scatter, Scatter, Scatterv, Reduce_local, Reduce_scatter_block - Initial support for non-blocking GPU collectives on NVIDIA and AMD GPUs - Iallgather, Iallgatherv, Iallreduce, Ialltoall, Ialltoallv, Ibcast, Igather, Igatherv, Ireduce, Ireduce_scatter, Iscatter, Iscatterv - Enhanced collective and pt2pt tuning for NVIDIA Grace-Hopper systems - Enhanced collective tuning for NVIDIA V100, A100, H100 GPUs - Enhanced collective tuning for AMD MI100, and MI250x GPUs - Enhanced support for blocking and non-blocking GPU to GPU point-to-point operations on NVIDIA and AMD GPUs taking advantage of: - NVIDIA GDRCopy, AMD LargeBar support - CUDA and ROCM IPC support - Enhanced CPU tuning on various HPC systems and architectures - Stampede3@TACC, Frontier@OLCF, Lonestar6@TACC - AMD Rome, AMD Millan, Intel Sapphire Rapids - Tested with - Various HPC applications, mini-applications, and benchmarks - MPI4cuML (a custom cuML package with MPI support) - Tested with CUDA <= 12.3 - Tested with ROCM <= 5.6.0