MVAPICH-Plus Changelog
-----------------------
This file briefly describes the changes to the MVAPICH-Plus software package.
The logs are arranged in the "most recent first" order.


* Features and Enhancements

MVAPICH-Plus 3.0
(3.0 GA Released 03/08/2024)
(3.0rc Released 12/22/2023)
(3.0b Released 11/01/2023)
(3.0a2 Released 07/19/2023)
(3.0a Released 11/10/2022)

* Features and Enhancements
    - Based on MVAPICH 3.0
    - Support for various high-performance communication fabrics
        - InfiniBand, Slingshot-10/11, Omni-Path, OPX, RoCE, and Ethernet
    - Support naive CPU staging approach for collectives for small messages
        - Tune naive limits for the following systems
            - Frontier@OLCF, Pitzer@OSC, Owens@OSC, Ascend@OSC, Frontera@TACC,
              Lonestar6@TACC, ThetaGPU@ALCF, Polaris@ALCF, Tioga@LLNL
    - Initial support for blocking collectives on NVIDIA and AMD GPUs
        - Allgather, Allgatherv, Allreduce, Alltoall, Alltoallv, Bcast, Gather,
          Gatherv, Reduce, Reduce_scatter, Scatter, Scatterv, Reduce_local,
          Reduce_scatter_block
    - Initial support for non-blocking GPU collectives on NVIDIA and AMD GPUs
        - Iallgather, Iallgatherv, Iallreduce, Ialltoall, Ialltoallv, Ibcast,
          Igather, Igatherv, Ireduce, Ireduce_scatter, Iscatter, Iscatterv
    - Enhanced collective and pt2pt tuning for NVIDIA Grace-Hopper systems 
    - Enhanced collective tuning for NVIDIA V100, A100, H100 GPUs
    - Enhanced collective tuning for AMD MI100, and MI250x GPUs
    - Enhanced support for blocking and non-blocking GPU to GPU point-to-point
      operations on NVIDIA and AMD GPUs taking advantage of:
        - NVIDIA GDRCopy, AMD LargeBar support
        - CUDA and ROCM IPC support
    - Enhanced CPU tuning on various HPC systems and architectures
        - Stampede3@TACC, Frontier@OLCF, Lonestar6@TACC
        - AMD Rome, AMD Millan, Intel Sapphire Rapids
    - Tested with
        - Various HPC applications, mini-applications, and benchmarks
        - MPI4cuML (a custom cuML package with MPI support)
    - Tested with CUDA <= 12.3
    - Tested with ROCM <= 5.6.0