The number of downloads has crossed 1.75 million!! The number of organizations using MVAPICH2 libraries has crossed 3,375 in 91 countries!! The MVAPICH team would like to express thanks to all these organizations and their users!!


MVAPICH Delivers Sub-minute (22 sec) Job Startup for 229,376 processes!! (Details)


MVAPICH@ISC 2024


MVAPICH@OFA'24


MVAPICH@GTC '24



Welcome to the home page of the MVAPICH project, led by Network-Based Computing Laboratory (NBCL) of The Ohio State University. The MVAPICH2 software, based on MPI 3.1 standard, delivers the best performance, scalability and fault tolerance for high-end computing systems and servers using InfiniBand, Omni-Path, Ethernet/iWARP, RoCE(v1/v2), Cray Slingshot 10 and 11, and Rockport Networks networking technologies. This software is being used by more than 3,375 organizations in 91 countries worldwide to extract the potential of these emerging networking technologies for modern systems. As of Apr '24, more than 1,776,000 downloads have taken place from this project's site. This software is also being distributed by many vendors as part of their software distributions.

The MVAPICH2 software family is ABI compatible with the version of MPICH it is based on. Please refer to our download page for more details.

The MVAPICH2 software is powering several supercomputers in the TOP500 list. Examples (from the June '23 ranking) include:

  • 11th, 10,649,600-core (Sunway TaihuLight) at National Supercomputing Center in Wuxi, China
  • 29th, 448, 448 cores (Frontera) at TACC
  • 46th, 288,288 cores (Lassen) at LLNL
  • 61st, 570,020 cores (Nurion) in South Korea

The MVAPICH group provides several software libraries as listed below.

High-Performance Parallel Programming Libraries

MVAPICH2Support for InfiniBand, Omni-Path, Ethernet/iWARP, RoCE, Slingshot 10, and Rockport Networks
MVAPICH2-AzureOptimized Support for Microsoft Azure Platform with InfiniBand
MVAPICH-PlusAdvanced MPI with unified MVAPICH2-GDR and MVAPICH2-X features for HPC, DL, ML, Big Data and Data Science applications
MVAPICH2-XAdvanced MPI features/support (UMR, ODP, DC, Core-Direct, SHArP, XPMEM), OSU INAM (InifniBand Network Monitoring and Analysis), PGAS (OpenSHMEM, UPC, UPC++, and CAF), and MPI+PGAS programming models with unified communication runtime
MVAPICH2-X-AWSAdvanced MPI features (SRD and XPMEM) with support for Amazon Elastic Fabric Adapter (EFA)
MVAPICH2-GDROptimized MPI for clusters with NVIDIA GPUs, AMD GPUs, and for GPU-enabled Deep Learning Applications
MVAPICH2-JJava bindings for the MVAPICH2 family of MPI libraries
MVAPICH2-VirtHypervisor and container based (Docker and Singularity) HPC cloud with MPI & IB (SR-IOV)
MVAPICH2-EAEnergy aware and High-performance MPI

Microbenchmarks

OMBMicrobenchmarks suite to evaluate MPI and PGAS (OpenSHMEM, UPC, and UPC++) libraries for CPUs and GPUs

Tools

OSU INAMNetwork monitoring, profiling, and analysis for clusters with MPI and scheduler integration
OEMTUtility to measure the energy consumption of MPI applications

This project is supported by funding from U.S. National Science Foundation, U.S. DOE Office of Science, U.S. Department of Defense, Ohio Board of Regents, Ohio Department of Development, arm, Cisco Systems, Cray, Intel, Linux Networx, Mellanox, Microsoft, NVIDIA, Pattern Computer, QLogic, ROCKPORT, and Sun Microsystems; and equipment donations from Advanced Clustering, AMD, Appro, arm, Broadcom, Chelsio, Dell, Fulcrum, Fujitsu, Intel, Mellanox, Microway, NetEffect, Pattern Computer, QLogic, ROCKPORT and Sun. Other technology partner includes: TotalView.

Announcements



MVAPICH-Plus 3.0 GA with support for unified MVAPICH2-GDR and MVAPICH2-X features is available. Features of this release include: support for various HPC fabrics (InfiniBand, Slingshot, Omni-Path, OPX, RoCE, and Ethernet), support for GPU-aware blocking and non-blocking collectives for NVIDIA and AMD GPUs, support for blocking and non-blocking GPU to GPU point-to-point operations. This new release series is targeted to provide optimized support for modern platforms (CPU, GPU, and interconnects) for HPC, Deep Learning, Machine Learning, Big Data, and Data Science applications. [more]

Spack and Docker versions of MVAPICH 3.0GA are available. [more]

MVAPICH 3.0 GA with support for optimized intra-node shared-memory/kernel-based communication; unified CVAR interface; enhanced PVAR interface; CH4 channel; OFI support for HPE/Cray Slingshot 11, Cornelis Networks Omni-Path Express (OPX), and Intel PSM3; and UCX support for InfiniBand and RoCE is available. [more]

Join us for the Upcoming Tutorial: High-Performance and Smart Networking Technologies for HPC and AI at HPCA '24

ParaInfer-X v1.0 with MPI and NCCL-based support for fast parallel inference of various large language models (GPT-J and LlaMA), persistent model inference stream, temporal fusion/in-flight batching of multiple requests, multiple GPU tensor parallelism, asynchronous memory reordering for evicting finished requests, and support for float32, float16, bfloat16 for model inference is available. [more]

MPI4DL 0.6 with support for distributed and accelerated training framework for very high-resolution images that integrates Spatial Parallelism, Layer Parallelism, and Pipeline Parallelism is available. [more]

OMB 7.3 with support for RCCL benchmarks, persistent collectives and new metrics to evaluate tail latency/bandwidth are available. [more]

The 11th Annual MVAPICH User Group (MUG) Conference was held successfully in a hybrid manner on August 21-23, 2023 with more than 225 attendees. Slides and videos of the Presentations are available here.

MPI4Spark 0.2 (based on Apache Spark 3.3.0) with support for MPI-based communication runtime on high-performance networks (InfiniBand, OPA, ROCE, and Slingshot) to accelerate SPARK-based applications and support for the YARN cluster manager is available. [more]

HiDL 1.0 (based on Horovod) with support for TensorFlow, PyTorch, Keras and MXNet, built on top of MVAPICH2-GDR and MVAPICH2-X, providing large-scale distributed deep learning support for clusters with NVIDIA and AMD GPUs is available. [more]

MPI4cuML 0.5 (based on cuML 22.02.00) with support for RAFT 22.02.00, C++ and Python APIs, built on top of mpi4py over the MVAPICH2-GDR library, handles to use MVAPICH2-GDR backend for Python cuML applications (KMeans, PCA, tSVD, RF, and LinearModels) is available. [more]

OSU INAM 1.0 OSU InfiniBand Network Analysis and Monitoring (INAM) Tool 1.0 with support for data logging progress bars on the UI for all charts, asynchronous calls for data loading, detailed debugging levels for the INAM daemon, and features in conjunction with MVAPICH2-X 2.3 is available. Click here for more details!

MVAPICH2-X-AWS 2.3.7 based on MVAPICH2-X, direct support for Amazon EFA adapter, improved inter-node latency and bandwidth performance, initial support for AWS hpc6a/c6a instances, support and performance optimization for AWS c6g/c7g with Amazon Graviton 2/3 ARM processors, support for rdma_read feature for p4d instance types, support for currently available basic OS types on AWS EC2 including: Amazon Linux 1/2, CentOS 6/7, Ubuntu 18.04/20.04 is available. [more]

MVAPICH2-J 2.3.7 GA (based on MVAPICH2 2.3.7 GA) with support for Java bindings to the MVAPICH2 family of libraries, support for communicating data from basic Java data types as well as direct ByteBuffers from the Java New I/O (NIO) package is available. [more]

MVAPICH2-GDR 2.3.7 GA (based on MVAPICH2 2.3.7 GA) with support for on-the-fly compression of point-to-point GPU-GPU communication for NVIDIA GPUs; hybrid communication protocols using NCCL-based, CUDA-based, and IB verbs-based primitives for blocking and non-blocking collective operations; full support for NVIDIA DGX, NVIDIA DGX V-100, NVIDIA DGX A-100 systems, and AMD systems with Mi100 GPUs; support for Slingshot-10 interconnect; optimized support for HPC, deep learning, machine learning, and data science workloads and multiple bug fixes is available. [more]

MVAPICH2 2.3.7 GA with support for Cray Slingshot 10, Rockport's switchless networks, enhanced support for blocking, non-blocking collective offload using Mellanox SHARP, and multiple bug-fixes. [more]

Partnership and contribution to the NSF-Awarded $20M AI-Institute on Intelligent CyberInfrastructure (ICICLE). Details.

MPI4Dask 0.2 (based on Dask Distributed 2021.01.0) with support for MPI-based communication in Dask for a cluster of CPUs and GPUs, built on top of mpi4py over the MVAPICH2, MVAPICH2-X, and MVAPICH2-GDR library, starting execution of Dask programs using Dask-MPI, compliant with user-level Dask APIs and packages is available. [more]

MVAPICH2-X 2.3 GA with optimized support for large message MPI_Allreduce and MPI_Reduce, improved communication performance using DC transport, optimized point-to-point and collective communication support for AWS EFA adapter and SRD transport protocol, availability of multiple MPI_T PVARs and CVARs, support for hybrid MPI+OpenSHMEM; optimized communication performance for AMD (EPYC), ARM, Intel and OpenPOWER platforms, and support for INAM 0.9.6 is available. [more]