The number of downloads has crossed 1.8 million!! The number of organizations using MVAPICH libraries has crossed 3,400 in 92 countries!! The MVAPICH team would like to express thanks to all these organizations and their users!!


MVAPICH Delivers Sub-minute (22 sec) Job Startup for 229,376 processes!! (Details)


MVAPICH@SC '24


MVAPICH@HotI '24


MVAPICH@PEARC '24



Welcome to the home page of the MVAPICH project, led by Network-Based Computing Laboratory (NBCL) of The Ohio State University. The MVAPICH software, based on MPI 4.1 standard, delivers the best performance, scalability and fault tolerance for high-end computing systems and servers using InfiniBand, Omni-Path, Ethernet/iWARP, RoCE(v1/v2), Cray Slingshot, and Rockport Networks networking technologies. This software is being used by more than 3,425 organizations in 92 countries worldwide to extract the potential of these emerging networking technologies for modern systems. As of Nov '24, more than 1,841,000 downloads have taken place from this project's site. This software is also being distributed by many vendors as part of their software distributions.

The MVAPICH2 software family is ABI compatible with the version of MPICH it is based on. Please refer to our download page for more details.

The MVAPICH software is powering several supercomputers in the TOP500 list. Examples (from the November '24 ranking) include:

  • 15th, 10,649,600-core (Sunway TaihuLight) at National Supercomputing Center in Wuxi, China
  • 52nd, 448, 448 cores (Frontera) at TACC
  • 72nd, 288,288 cores (Lassen) at LLNL

The MVAPICH group provides several software libraries as listed below.

High-Performance Parallel Programming Libraries

MVAPICH2Support for InfiniBand, Omni-Path, Ethernet/iWARP, RoCE, Slingshot 10, and Rockport Networks
MVAPICH2-AzureOptimized Support for Microsoft Azure Platform with InfiniBand
MVAPICH-PlusAdvanced MPI with unified MVAPICH2-GDR and MVAPICH2-X features for HPC, DL, ML, Big Data and Data Science applications
MVAPICH2-XAdvanced MPI features/support (UMR, ODP, DC, Core-Direct, SHArP, XPMEM), OSU INAM (InifniBand Network Monitoring and Analysis), PGAS (OpenSHMEM, UPC, UPC++, and CAF), and MPI+PGAS programming models with unified communication runtime
MVAPICH2-X-AWSAdvanced MPI features (SRD and XPMEM) with support for Amazon Elastic Fabric Adapter (EFA)
MVAPICH2-GDROptimized MPI for clusters with NVIDIA GPUs, AMD GPUs, and for GPU-enabled Deep Learning Applications
MVAPICH2-JJava bindings for the MVAPICH2 family of MPI libraries
MVAPICH2-VirtHypervisor and container based (Docker and Singularity) HPC cloud with MPI & IB (SR-IOV)
MVAPICH2-EAEnergy aware and High-performance MPI

Microbenchmarks

OMBMicrobenchmarks suite to evaluate MPI and PGAS (OpenSHMEM, UPC, and UPC++) libraries for CPUs and GPUs

Tools

OSU INAMNetwork monitoring, profiling, and analysis for clusters with MPI and scheduler integration
OEMTUtility to measure the energy consumption of MPI applications

This project is supported by funding from U.S. National Science Foundation, U.S. DOE Office of Science, U.S. Department of Defense, Ohio Board of Regents, Ohio Department of Development, arm, Cisco Systems, Cray, Intel, Linux Networx, Mellanox, Microsoft, NVIDIA, Pattern Computer, QLogic, ROCKPORT, and Sun Microsystems; and equipment donations from Advanced Clustering, AMD, Appro, arm, Broadcom, Chelsio, Dell, Fulcrum, Fujitsu, Intel, Mellanox, Microway, NetEffect, Pattern Computer, QLogic, ROCKPORT and Sun. Other technology partner includes: TotalView.

Announcements



(NEW) MVAPICH-Plus 4.0rc with support for unified MVAPICH2-GDR and MVAPICH2-X features is available. Features of this release include: support for various HPC fabrics (InfiniBand, Slingshot, Omni-Path, OPX, RoCE, and Ethernet/iWARP), various CPUs (x86, ARM, and OpenPOWER), and various GPUs (NVIDIA, AMD, and Intel), CMA and cooperative protocol support for intra-node pt2pt communication, CUDA-aware MPI (pt2pt and collective) support, IPC-based support for collectives on Intel GPU, on-the-fly compression support for collectives for NVIDIA and AMD GPUs, and on-the-fly compression support for point-to-point communication for NVIDIA GPUs. This new release series is targeted to provide optimized support for modern platforms (CPU, GPU, and interconnects) for HPC, Deep Learning, Machine Learning, Big Data, and Data Science applications. [more]

(NEW) Join us for the Upcoming Tutorials at Supercomputing 2024: 1. High-Performance and Smart Networking Technologies for HPC and AI, 2. Principles and Practice of High-Performance Deep Learning/Machine Learning Training and Inference, and 3. Scalable Big Data Processing on High-Performance Computing Systems More Details

(NEW) OMB 7.5 with new benchmarks supporting OpenSHMEM, partitioned point-to-point, neighborhood collectives, and Intel GPU are available. [more]

MPI4Spark 0.3 (based on Apache Spark 3.3.0) with support for MPI-based communication runtime on high-performance networks (InfiniBand, OPA, ROCE, and Slingshot) to accelerate SPARK-based applications and support for the YARN cluster manager is available. [more]

The 12th Annual MVAPICH User Group (MUG) Conference was held successfully in a hybrid manner on August 19-21, 2024 with more than 220 attendees.

Spack and Docker versions of MVAPICH 3.0GA are available. [more]

MVAPICH 3.0 GA with support for optimized intra-node shared-memory/kernel-based communication; unified CVAR interface; enhanced PVAR interface; CH4 channel; OFI support for HPE/Cray Slingshot 11, Cornelis Networks Omni-Path Express (OPX), and Intel PSM3; and UCX support for InfiniBand and RoCE is available. [more]

ParaInfer-X v1.0 with MPI and NCCL-based support for fast parallel inference of various large language models (GPT-J and LlaMA), persistent model inference stream, temporal fusion/in-flight batching of multiple requests, multiple GPU tensor parallelism, asynchronous memory reordering for evicting finished requests, and support for float32, float16, bfloat16 for model inference is available. [more]

MPI4DL 0.6 with support for distributed and accelerated training framework for very high-resolution images that integrates Spatial Parallelism, Layer Parallelism, and Pipeline Parallelism is available. [more]

HiDL 1.0 (based on Horovod) with support for TensorFlow, PyTorch, Keras and MXNet, built on top of MVAPICH2-GDR and MVAPICH2-X, providing large-scale distributed deep learning support for clusters with NVIDIA and AMD GPUs is available. [more]

MPI4cuML 0.5 (based on cuML 22.02.00) with support for RAFT 22.02.00, C++ and Python APIs, built on top of mpi4py over the MVAPICH2-GDR library, handles to use MVAPICH2-GDR backend for Python cuML applications (KMeans, PCA, tSVD, RF, and LinearModels) is available. [more]

OSU INAM 1.0 OSU InfiniBand Network Analysis and Monitoring (INAM) Tool 1.0 with support for data logging progress bars on the UI for all charts, asynchronous calls for data loading, detailed debugging levels for the INAM daemon, and features in conjunction with MVAPICH2-X 2.3 is available. Click here for more details!

MVAPICH2-X-AWS 2.3.7 based on MVAPICH2-X, direct support for Amazon EFA adapter, improved inter-node latency and bandwidth performance, initial support for AWS hpc6a/c6a instances, support and performance optimization for AWS c6g/c7g with Amazon Graviton 2/3 ARM processors, support for rdma_read feature for p4d instance types, support for currently available basic OS types on AWS EC2 including: Amazon Linux 1/2, CentOS 6/7, Ubuntu 18.04/20.04 is available. [more]

MVAPICH2-J 2.3.7 GA (based on MVAPICH2 2.3.7 GA) with support for Java bindings to the MVAPICH2 family of libraries, support for communicating data from basic Java data types as well as direct ByteBuffers from the Java New I/O (NIO) package is available. [more]

MVAPICH2-GDR 2.3.7 GA (based on MVAPICH2 2.3.7 GA) with support for on-the-fly compression of point-to-point GPU-GPU communication for NVIDIA GPUs; hybrid communication protocols using NCCL-based, CUDA-based, and IB verbs-based primitives for blocking and non-blocking collective operations; full support for NVIDIA DGX, NVIDIA DGX V-100, NVIDIA DGX A-100 systems, and AMD systems with Mi100 GPUs; support for Slingshot-10 interconnect; optimized support for HPC, deep learning, machine learning, and data science workloads and multiple bug fixes is available. [more]

MVAPICH2 2.3.7 GA with support for Cray Slingshot 10, Rockport's switchless networks, enhanced support for blocking, non-blocking collective offload using Mellanox SHARP, and multiple bug-fixes. [more]

Partnership and contribution to the NSF-Awarded $20M AI-Institute on Intelligent CyberInfrastructure (ICICLE). Details.

MPI4Dask 0.2 (based on Dask Distributed 2021.01.0) with support for MPI-based communication in Dask for a cluster of CPUs and GPUs, built on top of mpi4py over the MVAPICH2, MVAPICH2-X, and MVAPICH2-GDR library, starting execution of Dask programs using Dask-MPI, compliant with user-level Dask APIs and packages is available. [more]

MVAPICH2-X 2.3 GA with optimized support for large message MPI_Allreduce and MPI_Reduce, improved communication performance using DC transport, optimized point-to-point and collective communication support for AWS EFA adapter and SRD transport protocol, availability of multiple MPI_T PVARs and CVARs, support for hybrid MPI+OpenSHMEM; optimized communication performance for AMD (EPYC), ARM, Intel and OpenPOWER platforms, and support for INAM 0.9.6 is available. [more]