Point-to-Point MPI Benchmarks

Point-to-Point NCCL Benchmarks

Collective MPI Benchmarks

Collective NCCL Benchmarks

Non-Blocking Collective MPI Benchmarks

Support for CUDA Managed Memory

One-sided MPI Benchmarks

CUDA, ROCm, and OpenACC Extensions to OSU Micro Benchmarks

Point-to-Point OpenSHMEM Benchmarks

Collective OpenSHMEM Benchmarks

Point-to-Point Unified Parallel C (UPC) Benchmarks

Collective Unified Parallel C (UPC) Benchmarks

osu_upc_all_barrier, upc_all_broadcast, osu_upc_all_exchange, osu_upc_all_gather_all, osu_upc_all_gather, osu_upc_all_reduce, and osu_upc_all_scatter

Point-to-Point UPC++ Benchmarks

Collective UPC++ Benchmarks

osu_upcxx_bcast, osu_upcxx_reduce, osu_upcxx_allgather, osu_upcxx_gather, osu_upcxx_scatter, osu_upcxx_alltoall

Startup Benchmarks

Please note that there are many different ways to measure these performance parameters. For example, the bandwidth test can have different variations regarding the types of MPI calls (blocking vs. non-blocking) being used, total number of back-to-back messages sent in one iteration, number of iterations, etc. Other ways to measure bandwidth may give different numbers. Readers are welcome to use other tests, as appropriate to their application environments.