MVAPICH2-X Changelog -------------------- This file briefly describes the changes to the MVAPICH2-X software package. The logs are arranged in the "most recent first" order. MVAPICH2-X 2.3 (06/10/2020) * Features and Enhancements (since 2.3rc3): - MPI Features - Based on MVAPICH2 2.3.4 - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - Enhanced point-to-point and collective tunings for AMD EPYC, Catalyst@EPCC, Mayer@Sandia, Auzre@Microsoft, AWS, and Frontera@TACC - MPI (Advanced) Features - Optimized support for large message MPI_Allreduce and MPI_Reduce - OFA-IB-CH3 and OFA-IB-RoCE interfaces - Improved performance for communication using DC transport - OFA-IB-CH3 interface - Enhanced support for AWS EFA adapter and SRD transport protocol - OFA-IB-CH3 interface - Enhanced point-to-point and collective tuning for AWS EFA adapter and SRD transport protocol - OFA-IB-CH3 interface - Add multiple MPI_T PVARs and CVARs for point-to-point and collective operations - Tuning for MPI collective operations for Intel Broadwell, Intel CascadeLake, Azure HB (AMD EPYC), and Azure HC (Intel Skylake) systems - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - Support for OSU InfiniBand Network Analysis and Management (OSU INAM) Tool v0.9.6 - Unified Runtime Features - Based on MVAPICH2 2.3.4 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 and OFA-IB-RoCE interfaces of MVAPICH2 2.3.4 are available in MVAPICH2-X 2.3 * Bug Fixes (since 2.3rc3): - Fix issues in UD-Hybrid code path - Fix various compilation warnings and memory leaks MVAPICH2-X 2.3rc3 (03/03/2020) * Features and Enhancements (since 2.3rc2): - MPI Features - Based on MVAPICH2 2.3.3 - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - Enhanced point-to-point and collective tunings for Fulhame@EPCC, Catalyst@EPCC, Mayer@Sandia, Auzre@Microsoft, and Frontera@TACC - MPI (Advanced) Features - Improved performance for communication using DC transport - OFA-IB-CH3 interface - Add support for AWS EFA adapter and SRD transport protocol - OFA-IB-CH3 interface - Add point-to-point and collective tuning for AWS EFA adapter and SRD transport protocol - OFA-IB-CH3 interface - Add improved Allgatherv algorithm for small messages - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - Tunning for for XPMEM-based MPI collective operations for Intel Broadwell, Intel CascadeLake, Azure HB (AMD EPYC), and Azure HC (Intel Skylake) systems - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - Support for OSU InfiniBand Network Analysis and Management (OSU INAM) Tool v0.9.5 - Unified Runtime Features - Based on MVAPICH2 2.3.3 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 and OFA-IB-RoCE interface of MVAPICH2 2.3.3 are available in MVAPICH2-X 2.3rc3 * Bug Fixes (since 2.3rc2): - Fix various compilation warnings and memory leaks MVAPICH2-X 2.3rc2 (04/01/2019) * Features and Enhancements (since 2.3rc1): - MPI Features - Based on MVAPICH2 2.3.1 - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - MPI (Advanced) Features - Improved performance of large message communication - Support for advanced co-operative (COOP) rendezvous protocols in SMP channel - OFA-IB-CH3 and OFA-IB-RoCE interfaces - Support for RGET, RPUT, and COOP protocols for CMA and XPMEM - Support for load balanced and dynamic rendezvous protocol selection - OFA-IB-CH3 and OFA-IB-RoCE interfaces - Support for XPMEM-based MPI collective operations (Broadcast, Gather, Scatter, Allgather) - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - Extend support for XPMEM-based MPI collective operations (Reduce and All-Reduce for PSM-CH3 and PSM2-CH3 interfaces - Improved connection establishment for DC transport - OFA-IB-CH3 interface - Add improved Alltoallv algorithm for small messages - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - OpenSHMEM Features - Support for XPMEM-based collective operations (Broadcast, Collect, Reduce_all, Reduce, Scatter, Gather) - UPC Features - Support for XPMEM-based collective operations (Broadcast, Collect, Scatter, Gather) - UPC++ Features - Support for XPMEM-based collective operations (Broadcast, Collect, Scatter, Gather) - Unified Runtime Features - Based on MVAPICH2 2.3.1 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 and OFA-IB-RoCE interface of MVAPICH2 2.3.1 are available in MVAPICH2-X 2.3rc2 * Bug Fixes (since 2.3rc1): - Fix type overflows caused by very large messages - Fix various compilation warnings and memory leaks MVAPICH2-X 2.3rc1 (09/21/2018) * Features and Enhancements (since 2.3b): - MPI Features - Based on MVAPICH2 2.3GA - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - MPI (Advanced) Features - Support for XPMEM-based point-to-point operations - OFA-IB-CH3 and OFA-IB-RoCE interfaces - Support for XPMEM-based MPI collective operations (Reduce and All-Reduce) - OFA-IB-CH3 and OFA-IB-RoCE interfaces - Enhanced asynchronous progress designs for progressing non-blocking point-to-point and collective operations - OFA-IB-CH3, OFA-IB-RoCE, PSM-CH3, and PSM2-CH3 interfaces - UPC Features - Support Contention Aware Kernel-Assisted MPI collectives - OpenSHMEM Features - Support Contention Aware Kernel-Assisted MPI collectives - Unified Runtime Features - Based on MVAPICH2 2.3 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 and OFA-IB-RoCE interface of MVAPICH2 2.3 are available in MVAPICH2-X 2.3rc1 * Bug Fixes (since 2.3b): - Fix issues with OpenSHMEM Non-Blocking remote memory access routines - Fix compilation warnings and memory leaks MVAPICH2-X 2.3b (10/30/2017) * Features and Enhancements (since 2.2): - MPI Features - Based on MVAPICH2 2.3b - OFA-IB-CH3, PSM-CH3, and PSM2-CH3 interfaces - Support for ARM architecture - Optimized support for OpenPOWER architecture - Collective tuning for ARM architecture - Collective tuning for Intel Skylake architecture - MPI (Advanced) Features - Support Data Partitioning-based Multi-Leader Design (DPML) for MPI collectives - OFA-IB-CH3, PSM-CH3, and PSM2-CH3 interfaces - Support Contention Aware Kernel-Assisted MPI collectives - OFA-IB-CH3, PSM-CH3, and PSM2-CH3 interfaces - Support for OSU InfiniBand Network Analysis and Management (OSU INAM) Tool v0.9.2 - OpenSHMEM Features - Based on OpenSHMEM reference implementation 1.3 - Support Non-Blocking remote memory access routines - Unified Runtime Features - Based on MVAPICH2 2.3b (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 and OFA-IB-RoCE interface of MVAPICH2 2.3b are available in MVAPICH2-X 2.3b * Bug Fixes (since 2.2): - Fix compilation warnings and memory leaks MVAPICH2-X 2.2 (09/07/2016) * Features and Enhancements (since 2.2rc2): - MPI Features - Based on MVAPICH2 2.2 (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.2 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 and OFA-IB-RoCE interface of MVAPICH2 2.2 are available in MVAPICH2-X 2.2 * Bug Fixes (since 2.2rc2): - Fix compilation warnings and memory leaks MVAPICH2-X 2.2rc2 (08/08/2016) * Features and Enhancements (since 2.2rc1): - MPI Features - Based on MVAPICH2 2.2rc2 (OFA-IB-CH3 interface) - Efficient support for On Demand Paging (ODP) feature of Mellanox for point-to-point and RMA operations - Support for Intel Knights Landing architecture - UPC Features - Support for Intel Knights Landing architecture - UPC++ Features - Support for Intel Knights Landing architecture - OpenSHMEM Features - Support for Intel Knights Landing architecture - CAF Features - Support for Intel Knights Landing architecture - Hybrid Program Features - Support Intel Knights Landing architecture for hybrid MPI+PGAS applications - Unified Runtime Features - Based on MVAPICH2 2.2rc2 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 and OFA-IB-RoCE interface of MVAPICH2 2.2rc2 are available in MVAPICH2-X 2.2rc2 * Bug Fixes (since 2.2rc1): - Fix compilation warnings and memory leaks MVAPICH2-X 2.2rc1 (03/29/2016) * Features and Enhancements (since 2.2b): - Introducing UPC++ Support - Based on Berkeley UPC++ v0.1 - Introduce UPC++ level support for new scatter collective operation (upcxx_scatter) - Optimized UPC collectives (improved performance for upcxx_reduce, upcxx_bcast, upcxx_gather, upcxx_allgather, upcxx_alltoall) - MPI Features - Based on MVAPICH2 2.2rc1 (OFA-IB-CH3 interface) - Support for OpenPower architecture - Support for Intel Omni-Path architecture - Support for RoCE v2 - UPC Features - Based on GASNET v1.26 - Support for OpenPower architecture - Support for RoCE v2 - OpenSHMEM Features - Support for OpenPower architecture - Support for RoCE v2 - CAF Features - Support for RoCE v2 - Hybrid Program Features - Introduce support for hybrid MPI+UPC++ applications - Support OpenPower architecture for hybrid MPI+UPC and MPI+OpenSHMEM applications - Unified Runtime Features - Based on MVAPICH2 2.2rc1 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 and OFA-IB-RoCE interface of MVAPICH2 2.2rc1 are available in MVAPICH2-X 2.2rc1 - Introduce support for UPC++ and MPI+UPC++ programming models - Support for OSU InfiniBand Network Analysis and Management (OSU INAM) Tool v0.9 - Capability to profile and report process to node communication matrix for MPI processes at user specified granularity in conjunction with OSU INAM - Capability to classify data flowing over a network link at job level and process level granularity in conjunction with OSU INAM * Bug Fixes (since 2.2b): - Fix compilation warnings and memory leaks MVAPICH2-X 2.2b (11/12/2015) * Features and Enhancements (since 2.2a): - MPI Features - Based on MVAPICH2 2.2a (OFA-IB-CH3 interface) - MPI (Advanced) Features - Support User Mode Memory Registration (UMR) for high performance non-contiguous data transfer - Core-Direct based support for "v"-variants of Non-blocking collectives - Support for Iallgatherv, Ialltoallv, Igatherv, Iscatterv - Core-Direct based support for Ialltoallw - Unified Runtime Features - Based on MVAPICH2 2.2b (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.2b are available in MVAPICH2-X 2.2b - Support for OSU InfiniBand Network Analysis and Management (OSU INAM) Tool v0.8.5 - Capability to profile and report node-level, job-level and process-level intra-node communication activities for MPI processes at user specified granularity in conjunction with OSU INAM - Capability to profile and report the following parameters of MPI processes at node-level, job-level and process-level at user specified granularity in conjunction with OSU INAM - Memory Utilization - Inter-node communication buffer usage for RC transport - Inter-node communication buffer usage for UD transport * Bug Fixes (since 2.2a): - Fix issues in core-direct code - Disable DC if user asked to use only UD and DC at the same time - Optimize fetching maximum pinnable memory - Fix corner case in RDMA_CM based startup when UCR support is enabled MVAPICH2-X 2.2a (08/17/2015) * Features and Enhancements (since 2.1 GA): - MPI Features - Based on MVAPICH2 2.2a (OFA-IB-CH3 interface) - MPI (Advanced) Features - Support for Dynamically Connected (DC) transport protocol - Support for pt-to-pt, RMA and collectives - Support for Hybrid mode with RC/DC/UD/XRC - Support for Core-Direct based Non-blocking collectives - Support available for Ibcast, Ibarrier, Iscatter, Igather, Ialltoall and Iallgather - OpenSHMEM Features - Support for RoCE - Support for Dynamically Connected (DC) transport protocol - UPC Features - Based on Berkeley UPC 2.20.2 (contains changes/additions in preparation for upcoming UPC 1.3 specification) - Support for RoCE - Support for Dynamically Connected (DC) transport protocol - CAF Features - Support for RoCE - Support for Dynamically Connected (DC) transport protocol - Unified Runtime Features - Based on MVAPICH2 2.2a (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.2a are available in MVAPICH2-X 2.2a - The advanced MPI features listed above are available with the unified runtime - Support for InfiniBand Network Analysis and Management (INAM) Tool v0.8 - Capability to analyze and profile network-level activities with many parameters (data and errors) at user specified granularity - Capability to analyze and profile node-level, job-level and process-level activities for MPI communication (pt-to-pt, collectives and RMA) at user specified granularity - Capability to remotely monitor CPU utilization of MPI processes at user specified granularity MVAPICH2-X 2.1 (04/03/2015) * Features and Enhancements (since 2.1rc2): - MPI Features - Based on MVAPICH2 2.1 (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.1 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.1 are available in MVAPICH2-X 2.1 MVAPICH2-X 2.1rc2 (03/12/2015) * Features and Enhancements (since 2.1rc1): - Introducing CAF (Coarray Fortran) Support - Based on University of Houston CAF version 3.0.39 - Efficient point-point read/write operations - Efficient CO_REDUCE and CO_BROADCAST collective operations - MPI Features - Based on MVAPICH2 2.1rc2 (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.1rc2 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.1rc2 are available in MVAPICH2-X 2.1rc2 MVAPICH2-X 2.1rc1 (12/19/2014) * Features and Enhancements (since 2.1a): - OpenSHMEM Features - Based on OpenSHMEM reference implementation 1.0h - Support for on-demand establishment of connections - Improved job start up and memory footprint - UPC Features - Based on Berkeley UPC 2.20.0 (contains changes/additions in preparation for upcoming UPC 1.3 specification) - MPI Features - Based on MVAPICH2 2.1rc1 (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.1rc1 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.1rc1 are available in MVAPICH2-X 2.1rc1 MVAPICH2-X 2.1a (09/21/2014) * Features and Enhancements (since 2.0): - MPI Features - Based on MVAPICH2 2.1a (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.1a (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.1a are available in MVAPICH2-X 2.1a MVAPICH2-X 2.0 (06/20/2014) * Features and Enhancements (since 2.0rc2): - MPI Features - Based on MVAPICH2 2.0 (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.0 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.0 are available in MVAPICH2-X 2.0 MVAPICH2-X 2.0rc2 (05/25/2014) * Features and Enhancements (since 2.0rc1): - MPI Features - Based on MVAPICH2 2.0rc2 (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.0rc2 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.0rc2 are available in MVAPICH2-X 2.0rc2 MVAPICH2-X 2.0rc1 (03/24/2014) * Features and Enhancements (since 2.0b): - OpenSHMEM Features - Based on OpenSHMEM reference implementation 1.0f - Improved intra-node communication performance using Shared memory and Cross Memory Attach (CMA) - UPC Features - Based on Berkeley UPC 2.18.0 (contains changes/additions in preparation for upcoming UPC 1.3 specification) - Optimized UPC collectives (improved performance for upc_all_broadcast, upc_all_scatter, upc_all_gather, upc_all_gather_all, and upc_all_exchange) - MPI Features - Based on MVAPICH2 2.0rc1 (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.0rc1 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.0rc1 are available in MVAPICH2-X 2.0rc1 * Bug Fixes (since 2.0b): - OpenSHMEM Bug Fixes - Fix an issue related to atomics on HCAs without atomics support MVAPICH2-X 2.0b (11/08/2013) * Features and Enhancements (since 2.0a): - OpenSHMEM Features - Based on OpenSHMEM reference implementation 1.0e - Enhanced optimization of OpenSHMEM collectives (improved performance for shmem_collect, shmem_fcollect, shmem_barrier, shmem_reduce, and shmem_broadcast) - Optimized shmalloc routine - UPC Features - Based on Berkeley UPC 2.18.0 - Support for GUPC translator - MPI Features - Based on MVAPICH2 2.0b (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.0b (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.0b are available in MVAPICH2-X 2.0b * Bug Fixes (since 2.0a): - OpenSHMEM Bug Fixes - Fixed synchronization issue in shmem_fence - Fixed issue in shmem_collect which prevented variable length collect routine MVAPICH2-X 2.0a (08/24/2013) * Features and Enhancements (since 1.9): - OpenSHMEM Features - Optimized OpenSHMEM Collectives (Improved performance for shmem_collect, shmem_barrier, shmem_reduce and shmem_broadcast) - MPI Features - Based on MVAPICH2 2.0a (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 2.0a (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 2.0a are available in MVAPICH2-X 2.0a MVAPICH2-X 1.9 (05/06/2013): * Features and Enhancements (since 1.9rc1): - MPI Features - Based on MVAPICH2 1.9 (OFA-IB-CH3 interface) - Unified Runtime Features - Based on MVAPICH2 1.9 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 1.9 are available in MVAPICH2-X 1.9 MVAPICH2-X 1.9RC1 (04/16/2013): * Features and Enhancements (since 1.9b): - OpenSHMEM Features - Added 'shmem_ptr' functionality - MPI Features - Based on MVAPICH2 1.9RC1 (OFA-IB-CH3 interface) including MPI-3 features - Unified Runtime Features - Based on MVAPICH2 1.9RC1 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 1.9RC1 are available in MVAPICH2-X 1.9RC1 * Bug Fixes (since 1.9b): - OpenSHMEM - Fixed a bug in OpenSHMEM atomics MVAPICH2-X 1.9b (02/28/2013): * Features and Enhancements (since 1.9a2): - MPI Features - Based on MVAPICH2 1.9b (OFA-IB-CH3 interface) including MPI-3 features - OpenSHMEM Features - Updated to OpenSHMEM 1.0d - Unified Parallel C (UPC) Features - Updated to Berkeley UPC 2.16.0 - Unified Runtime Features - Based on MVAPICH2 1.9b (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 1.9b are available in MVAPICH2-X 1.9b MVAPICH2-X 1.9a2 (11/08/2012): * Features and Enhancements (since 1.9a): - MPI Features - MPI-2.2 standard compliance and initial support for MPI-3 features - Based on MVAPICH2 1.9a2 (OFA-IB-CH3 interface) - OpenSHMEM Features - Optimized OpenSHMEM put routines for small/medium message sizes - Unified Parallel C (UPC) Features - UPC Language Specification v1.2 standard compliance - Based on Berkeley UPC v2.14.2 - Optimized RDMA-based implementation of UPC data movement routines - Improved UPC memput design for small/medium size messages - Hybrid Program Features - Supports hybrid programming using MPI(+OpenMP), MPI(+OpenMP)+UPC and MPI(+OpenMP)+OpenSHMEM - Compliance to MPI-2.2 and initial support for MPI-3 features, UPC v1.2 and OpenSHMEM v1.0 standards - Efficient deadlock-free progress of MPI and UPC/OpenSHMEM calls - Unified Runtime Features - Based on MVAPICH2 1.9a2 (OFA-IB-CH3 interface). All the runtime features enabled by default in OFA-IB-CH3 interface of MVAPICH2 1.9a2 are available in MVAPICH2-X 1.9a2 - Support for upcrun process manager * Bug Fixes (since 1.9a): - Fixed incorrect compiler selection in oshfort - Fixed linker errors with Intel oshfort compiler MVAPICH2-X 1.9a Features (09/07/2012): * MPI Features - MPI-2.2 standard compliance - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can take advantage of all the features enabled by default in OFA-IB-CH3 interface of MVAPICH2-1.9a - High performance two-sided communication scalable to multi-thousand nodes - Optimized collective communication operations - Shared-memory optimized algorithms for barrier, broadcast, reduce and allreduce operations - Optimized two-level designs for scatter and gather operations - Improved implementation of allgather, alltoall operations - High-performance and scalable support for one-sided communication - Direct RDMA based designs for one-sided communication - Shared memory backed Windows for One-Sided Communication - Support for truly passive locking for intra-node RMA in shared memory backed windows - Multi-threading support - Enhanced support for multi-threaded MPI applications * OpenSHMEM Features - OpenSHMEM v1.0 standard compliance - Based on OpenSHMEM reference implementation v1.0c - Optimized RDMA-based implementation of OpenSHMEM data movement routines - Efficient implementation of OpenSHMEM atomics using RDMA atomics - High performance intra-node communication using shared memory based schemes * Hybrid Program Features - Supports hybrid programming using MPI and OpenSHMEM - Compliance to MPI 2.2 and OpenSHMEM v1.0 standards - Optimized network resource utilization through the unified communication runtime - Efficient deadlock-free progress of MPI and OpenSHMEM calls * Unified Runtime Features - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM and Hybrid programs benefit from its features listed below. - Scalable inter-node communication with highest performance and reduced memory usage - Integrated RC/XRC design to get best performance on large-scale systems with reduced/constant memory footprint - RDMA Fast Path connections for efficient small message communication - Shared Receive Queue (SRQ) with flow control to significantly reduce memory footprint of the library. - AVL tree-based resource-aware registration cache - Automatic tuning based on network adapter and host architecture - Optimized intra-node communication support by taking advantage of shared-memory communication. - Efficient Buffer Organization for Memory Scalability of Intra-node Communication - Automatic intra-node communication parameter tuning based on platform - Flexible CPU binding capabilities - Portable Hardware Locality (hwloc v1.5) support for defining CPU affinity - Efficient CPU binding policies (bunch and scatter patterns, socket and numanode granularities) to specify CPU binding per job for modern multi-core platforms - Allow user-defined flexible processor affinity - Two modes of communication progress - Polling - Blocking (enables running multiple processes/processor) - Flexible process manager support - Support for mpirun_rsh, hydra and oshrun process managers