OSU Micro Benchmarks v7.3 (10/30/2023) * New Features & Enhancements - Add support for RCCL benchmarks * Thanks to Marcel Koch @KIT for the initial patch * Point-to-point benchmarks supported * osu_xccl_bibw, osu_xccl_bw, osu_xccl_latency * Collective benchmarks supported * osu_xccl_allgather, osu_xccl_allreduce, osu_xccl_alltoall, * osu_xccl_bcast, osu_xccl_reduce, osu_xccl_reduce_scatter - Add new benchmarks for persistent collectives * osu_allgather_persistent, osu_allgatherv_persistent, * osu_allreduce_persistent, osu_alltoall_persistent, * osu_alltoallv_persistent, osu_alltoallw_persistent, * osu_barrier_persistent, osu_bcast_persistent, * osu_gather_persistent, osu_gatherv_persistent, * osu_reduce_persistent, osu_reduce_scatter_persistent, * osu_scatter_persistent, osu_scatterv_persistent - Support new metrics to evaluate benchmark performance * 50th percentile tail latency/bandwidth * 95th percentile tail latency/bandwidth * 99th percentile tail latency/bandwidth * Bug Fixes - Fixed acknowledgement buffer memory allocation issue in bandwidth related benchmarks * Thanks to Emmanuel BRELLE @Eviden for report and patch. - Fixed validation issue in osu_fop_latency. * Thanks to Coey Minear @HPE for report and patch. - Added support to managed buffers in one-sided collectives and one-sided point-to-point benchmarks OSU Micro Benchmarks v7.2 (07/10/2023) * New Features & Enhancements - Add MPI-4 session based initialization support to following benchmarks * Point-to-point benchmarks supported * osu_bibw, osu_bw, osu_mbw_mr, osu_latency, osu_multi_lat, * osu_latency_mp, osu_latency_mt, osu_bw_persistent, * osu_bibw_persistent, osu_latency_persistent * Blocking benchmarks supported * osu_allgather, osu_allgatherv, osu_alltoall, osu_allreduce, * osu_alltoallv, osu_alltoallw, osu_bcast, osu_barrier, osu_gather, * osu_gatherv, osu_reduce, osu_reduce_scatter, osu_scatter, * osu_scatterv * Non-Blocking benchmarks supported * osu_iallgather, osu_iallgatherv, osu_iallreduce, osu_ialltoall, * osu_ialltoallv, osu_ialltoallw, osu_ibcast, osu_ibarrier, * osu_igather, osu_igatherv, osu_ireduce, osu_iscatter, * osu_iscatterv, osu_ireduce_scatter * Neighborhood benchmarks * osu_neighbor_allgather, osu_neighbor_allgatherv, * osu_neighbor_alltoall,osu_neighbor_alltoallv, * osu_neighbor_alltoallw, osu_ineighbor_allgatherv, * osu_ineighbor_allgatherv, osu_ineighbor_alltoall, * osu_ineighbor_alltoallv, osu_ineighbor_alltoallw * Startup benchmarks * osu_init - Add MPI_IN_PLACE support for following blocking and non-blocking collectives * Blocking benchmarks supported * osu_allgather, osu_allgatherv, osu_alltoall, osu_allreduce, * osu_alltoallv, osu_alltoallw, osu_gather, osu_gatherv, osu_reduce, * osu_reduce_scatter, osu_scatter, osu_scatterv * Non-Blocking benchmarks supported * osu_iallgather, osu_iallgatherv, osu_iallreduce, osu_ialltoall, * osu_ialltoallv, osu_ialltoallw, osu_igather, osu_igatherv, * osu_ireduce, osu_iscatter, osu_iscatterv, osu_ireduce_scatter - Add an option to set root rank for rooted blocking and non-blocking collectives * Blocking benchmarks supported * osu_gather, osu_gatherv, osu_reduce, osu_scatter, * osu_scatterv * Non-Blocking benchmarks supported * osu_igather, osu_igatherv, osu_ireduce, osu_iscatter, * osu_iscatterv * Bug Fixes - Fixed memory leak in point-to-point benchmarks when validation is enabled. * Thanks to Shi Jin @Amazon for report and patch. - Fixed missing '#' formatting bug in osu_ibarrier header. * Thanks to Nick Hagerty @ORNL for report. OSU Micro Benchmarks v7.1 (4/06/2023) * New Features & Enhancements - Add new blocking and non-blocking neighbor collective benchmarks with support for CPU buffers. * Blocking benchmarks - osu_neighbor_allgather, osu_neighbor_allgatherv, - osu_neighbor_alltoall,osu_neighbor_alltoallv, - osu_neighbor_alltoallw * Non-Blocking benchmarks - osu_ineighbor_allgatherv, osu_ineighbor_allgatherv, - osu_ineighbor_alltoall, osu_ineighbor_alltoallv, - osu_ineighbor_alltoallw - Add new benchmarks for point-to-point persistent communication primitives. * osu_bw_persistent, osu_bibw_persistent, osu_latency_persistent - Extend collective and point-to-point benchmarks to support different MPI datatypes. * New data types supported - MPI_INT, MPI_FLOAT, MPI_CHAR - Add validation and MPI types support to one-sided benchmarks. * osu_acc_latency, osu_cas_latency, osu_fop_latency * Bug Fixes (since v7.0) - Reverse order of calling MPI_Reduce to find out MIN and MAX comm times * Thanks to Devendar Bureddy @NVIDIA for the report and patch - Fix data validation issue when running wiht ROCm support on AMD GPUs * Thanks to Damon McDougall @AMD for the report and patch - Thanks to Luke Robison @Amazon for initial patch for one sided validation and MPI_TYPEs support. OSU Micro Benchmarks v7.0 (11/11/2022) * New Features & Enhancements - Add PAPI support for mpi pt2pt benchmarks * Enabled with the `-P` option * Benchmarks supported * osu_bibw, osu_bw, osu_mbw_mr, osu_latency, osu_multi_lat, * osu_latency_mp - Add PAPI support for mpi blocking and non-blocking collective benchmarks * Enabled with the `-P` option * Blocking benchmarks supported * osu_allgather, osu_allgatherv, osu_alltoall, osu_allreduce, * osu_alltoallv, osu_alltoallw, osu_bcast, osu_barrier, osu_gather, * osu_gatherv, osu_reduce, osu_reduce_scatter, osu_scatter, * osu_scatterv * Non-Blocking benchmarks supported * osu_iallgather, osu_iallgatherv, osu_iallreduce, osu_ialltoall, * osu_ialltoallv, osu_ialltoallw, osu_ibcast, osu_ibarrier, * osu_igather, osu_igatherv, osu_ireduce, osu_iscatter, * osu_iscatterv, osu_ireduce_scatter - Add PAPI support for mpi one-sided benchmarks * Enabled with the `-P` option * Benchmarks supported * osu_acc_latency, osu_cas_latency, osu_fop_latency, * osu_get_acc_latency, osu_get_bw, osu_get_latency, osu_put_bibw, * osu_put_bw, osu_put_latency - Add support for benchmarking non-blocking Reduce-Scatter. * osu_ireduce_scatter * Bug Fixes (since v6.2) - Fixed a bug causing incorrect value for size in graph titles. OSU Micro Benchmarks v6.2 (10/25/2022) * New Features & Enhancements - Add graphing support for mpi pt2pt benchmarks * Enabled with the `-G` option * Supports tty, png, and pdf graph ouputs. * Benchmarks supported * osu_bibw, osu_bw, osu_mbw_mr, osu_latency, osu_multi_lat, * osu_latency_mp, osu_latency_mt - Add graphing support for mpi blocking and non-blocking collective benchmarks * Enabled with the `-G` option * Supports tty, png, and pdf graph ouputs. * Blocking benchmarks supported * osu_allgather, osu_allgatherv, osu_alltoall, osu_allreduce, * osu_alltoallv, osu_alltoallw, osu_bcast, osu_barrier, osu_gather, * osu_gatherv, osu_reduce, osu_reduce_scatter, osu_scatter, * osu_scatterv * Non-Blocking benchmarks supported * osu_iallgather, osu_iallgatherv, osu_iallreduce, osu_ialltoall, * osu_ialltoallv, osu_ialltoallw, osu_ibcast, osu_ibarrier, * osu_igather, osu_igatherv, osu_ireduce, osu_iscatter, * osu_iscatterv - Add graphing support for mpi one-sided benchmarks * Enabled with the `-G` option * Supports tty, png, and pdf graph ouputs. * Benchmarks supported * osu_acc_latency, osu_cas_latency, osu_fop_latency, * osu_get_acc_latency, osu_get_bw, osu_get_latency, osu_put_bibw, * osu_put_bw, osu_put_latency - Add support for benchmarking blocking Alltoallw. * osu_alltoallw * Bug Fixes (since v6.1) - Fixed inconsistency in max_message_size between some blocking and non-blocking collectives. * Thanks to Krishna Kandalla @HPE for the report. - Correct skip value for small message sizes in osu_ialltoallw benchmark OSU Micro Benchmarks v6.1 (09/16/2022) * New Features & Enhancements - Add support for derived data types in mpi pt2pt benchmarks * Enabled with the `-D` option * Supports contiguous, vector, and indexed data types * Benchmarks supported * osu_bibw * osu_bw * osu_mbw_mr * osu_latency * osu_multi_lat - Add support for derived data types in mpi blocking and non-blocking collective benchmarks * Enabled with the `-D` option * Supports contiguous, vector, and indexed data types * Blocking benchmarks supported * osu_allgather * osu_allgatherv * osu_alltoall * osu_alltoallv * osu_bcast * osu_gather * osu_gatherv * osu_scatter * osu_scatterv * Non-Blocking benchmarks supported * osu_iallgather * osu_iallgatherv * osu_ialltoall * osu_ialltoallv * osu_ialltoallw * osu_ibcast * osu_igather * osu_igatherv * osu_iscatter * osu_iscatterv * Bug Fixes (since v6.0) - Add support for data validation for ROCm benchmarks * Thanks to Amir Shehata @ORNL for the report and initial patch - Correct the version number in the released version * Thanks to Adam Goldman @Intel for the report OSU Micro Benchmarks v6.0 (08/16/2022) * New Features & Enhancements - Add support for Java pt2pt benchmarks * OSULatency - Latency Test * OSUBandwidth - Bandwidth Test - OSUBandwidthOMPI - Bandwidth Test for Open MPI Java Bindings * OSUBiBandwidth - Bidirectional Bandwidth Test - OSUBiBandwidthOMPI - BiBandwidth Test for Open MPI Java Bindings - Add support for Java collective benchmarks * OSUAllgather - MPI_Allgather Latency Test * OSUAllgatherv - MPI_Allgatherv Latency Test * OSUAllReduce - MPI_Allreduce Latency Test * OSUAlltoall - MPI_Alltoall Latency Test * OSUAlltoallv - MPI_Alltoallv Latency Test * OSUBarrier - MPI_Barrier Latency Test * OSUBcast - MPI_Bcast Latency Test * OSUGather - MPI_Gather Latency Test * OSUGatherv - MPI_Gatherv Latency Test * OSUReduce - MPI_Reduce Latency Test * OSUReduceScatter - MPI_Reduce_scatter Latency Test * OSUScatter - MPI_Scatter Latency Test * OSUScatterv - MPI_Scatterv Latency Test - Add support for Python pt2pt benchmarks * osu_latency - Latency Test * osu_bw - Bandwidth Test * osu_bibw - Bidirectional Bandwidth Test * osu_multi_lat - Multi-pair Latency Test - Add support for Python collective benchmarks * osu_allgather - MPI_Allgather Latency Test * osu_allgatherv - MPI_Allgatherv Latency Test * osu_allreduce - MPI_Allreduce Latency Test * osu_alltoall - MPI_Alltoall Latency Test * osu_alltoallv - MPI_Alltoallv Latency Test * osu_barrier - MPI_Barrier Latency Test * osu_bcast - MPI_Bcast Latency Test * osu_gather - MPI_Gather Latency Test * osu_gatherv - MPI_Gatherv Latency Test * osu_reduce - MPI_Reduce Latency Test * osu_reduce_scatter - MPI_Reduce_scatter Latency Test * osu_scatter - MPI_Scatter Latency Test * osu_scatterv - MPI_Scatterv Latency Test * Bug Fixes (since v5.9) - Fix bug in data validation support for CUDA managed memory benchmarks - Thanks to Chris Chambreau @LLNL for the report OSU Micro Benchmarks v5.9 (03/02/2022) * New Features & Enhancements - Add support for data validation in mpi pt2pt benchmarks * Enabled with the `-c` option - Add support for data validation in mpi collective benchmarks * Enabled with the `-c` option - Add support for NCCL collective benchmarks * osu_nccl_alltoall * Bug Fixes (since v5.8) - Remove osu_latency_mp & osu_latency_mt from list of benchmarks that have GPU support * Thanks to Carl Ponder @NVIDIA for reporting the issue. - Ignore the `--accelerator` option when using pt2pt benchmarks with GPU enabled builds. * Thanks to Adam Goldman @Intel for the report OSU Micro Benchmarks v5.8 (08/12/2021) * New Features & Enhancements - Add support for NCCL pt2pt benchmarks * osu_nccl_bibw * osu_nccl_bw * osu_nccl_latency - Add support for NCCL collective benchmarks * osu_nccl_allgather * osu_nccl_allreduce * osu_nccl_bcast * osu_nccl_reduce * osu_nccl_reduce_scatter - Add data validation support for * osu_allreduce * osu_nccl_allreduce * osu_reduce * osu_nccl_reduce * osu_alltoall * Bug Fixes (since v5.7.1) - Fix bug in support for CUDA managed memory benchmarks - Thanks to Adam Goldman @Intel for the report and the initial patch - Protect managed memory functionality with appropriate compile time flag OSU Micro Benchmarks v5.7.1 (05/11/2021) * New Features & Enhancements - Add support to send and receive data from different buffers for osu_latency, osu_bw, osu_bibw, and osu_mbw_mr - Enhance support for CUDA managed memory benchmarks - Thanks to Ian Karlin and Nathan Hanford @LLNL for the feedback - Add support to print minimum and maximum communication times for non-blocking benchmarks * Bug Fixes (since v5.7) - Update README file with updated description for osu_latency_mp - Thanks to Honggang Li @RedHat for the suggestion - Fix error in setting benchmark name in osu_allgatherv.c and osu_allgatherv.c - Thanks to Brandon Cook @LBL for the report OSU Micro Benchmarks v5.7 (12/11/2020) * New Features & Enhancements - Add support to OMB to evaluate the performance of various primitives with AMD GPU device and ROCm support - This functionality is exposed when configured with --enable-rocm option - Thanks to AMD for the initial patch * Bug Fixes (since v5.6.3) - Enhance one-sided window creation and fix a potential issue when using MPI_Win_allocate * Thanks to Bert Wesarg and George Katevenis for reporting the issue and providing initial patch - Remove additional '-M' option that gets printed with help message for osu_latency_mt and osu_latency_mp * Thanks to Nick Papior for the report - Added missing '-W' option support for one-sided bandwidth tests OSU Micro Benchmarks v5.6.3 (06/01/2020) * New Features & Enhancements - Add support for benchmarking applications that use 'fork' system call * osu_latency_mp * Bug Fixes (since v5.6.2) - Allow passing flags to nvcc compiler - Fix compilation issue with IBM XLC++ compilers and CUDA 10.2 - Fix issues in window creation with host-to-device and device-to-host transfers for one-sided tests OSU Micro Benchmarks v5.6.2 (08/08/2019) * New Features & Enhancements - Add support for benchmarking GPU-Aware multi-threaded point-to-point operations * osu_latency_mt * Bug Fixes (since v5.6.1) - Fix issue with freeing in osu_get_bw benchmark - Fix issues with out of tree builds - Thanks to Joseph Schuchart@HLRS for reporting the issue - Fix incorrect header in osu_mbw_mr benchmark - Fix memory alignment for non-heap allocations in openshmem message rate benchmarks - Thanks to Yossi@Mellanox for pointing out the issue OSU Micro Benchmarks v5.6.1 (03/15/2019) * Bug Fixes (since v5.6) - Fix issue with latency computation in osu_latency_mt benchmark. OSU Micro Benchmarks v5.6 (03/01/2019) * New Features & Enhancements - Add support for benchmarking GPU-Aware multi-pair point-to-point operations * osu_mbw_mr * osu_multi_lat - Add support to specify different number of sender and receiver threads for osu_latency_mt benchmark * Bug Fixes - Remove -t option for blocking collectives - Fix issue when only building OpenSHMEM benchmarks - Improve error reporting for one-sided benchmarks - Fix compilation issues with PGI compilers for GPU-Aware benchmarks - Fix incorrect handling of configure options - Thanks to Tony Curtis @StonyBrook for the report - Fix compilation warnings OSU Micro Benchmarks v5.5 (11/10/18) * New Features & Enhancements - Introduce new MPI non-blocking collective benchmarks with support to measure overlap of computation and communication for CPUs and GPUs - osu_ireduce - osu_iallreduce OSU Micro Benchmarks v5.4.4 (09/21/18) * Bug Fixes - Deprecate the need to specify get_local_rank when running GPU-based benchmarks built with MVAPICH2 and launched with mpirun_rsh - Enhance error checking for GPU-based benchmarks - Fix issues in GPU-based OpenACC benchmarks - Fix race condition with OpenSHMEM non-blocking and overlap benchmarks OSU Micro Benchmarks v5.4.3 (07/23/18) * Bug Fixes - Fix buffer overflow in osu_reduce_scatter - Thanks to Matias A Cabral @Intel for reporting the issue and patch - Thanks to Gilles Gouaillardet for creating the patch - Fix buffer overflow in one sided tests - Thanks to John Byrne @HPE for reporting this issue - Fix buffer overflow in multi threaded latency test - Fix issues with freeing buffers for one-sided tests - Fix issues with freeing buffers for CUDA-enabled tests - Fix warning messages for benchmarks that do not support CUDA and/or Managed memory - Thanks to Carl Ponder@NVIDIA for reporting this issue - Fix compilation warnings OSU Micro Benchmarks v5.4.2 (04/30/18) * New Features & Enhancements - Add "-W --window-size" option to osu_bw and osu_bibw * Bug Fixes - Fix issues with out of tree builds - Thanks to Adam Moody @LLNL for reporting the issue - Fix PGI and XLC builds by using the correct generated cpp files - Fix crash with osu_mbw_mr for large messages - Fix minor error in Makefile - Fix compilation warnings OSU Micro Benchmarks v5.4.1 (02/19/18) * New Features & Enhancements - Enhanced help messages and runtime parameters * Bug Fixes - Fix compile and runtime issues in PGAS benchmarks (OpenSHMEM, UPC, and UPC++) exposed by PGI compiler - Added warning message to display memory limitation when running benchmarks with very large messages - Fix memory leaks for device buffers - Fix issues with type overflows - Fix an issue with pWork symmetric heap allocation in oshm_reduce benchmark - Thanks to Naveen Ravichandrasekaran@Cray for the report OSU Micro Benchmarks v5.4.0 (10/30/17) * New Features & Enhancements - Introduce new OpenSHMEM Non-blocking Benchmarks * osu_oshm_get_mr_nb * osu_oshm_get_nb * osu_oshm_put_mr_nb * osu_oshm_put_nb * osu_oshm_put_overlap - Automatically build OpenSHMEM 1.3 benchmarks when library support is detected - Add ability to specify min and max message size for point-to-point and one-sided benchmarks - Enhanced error handling for MPI benchmarks - Code clean-ups and unification of utility functions across benchmarks - Enhanced help messages and runtime parameters * Bug Fixes - Fix compile-time warnings - Fix peer calculation formula in UPC/UPC++ benchmarks - Fix correct number of warmup iterations in osu_barrier benchmark OSU Micro Benchmarks v5.3.2 (09/08/16) * New Features & Enhancements - Allow specifying very large message sizes (>2GB) for collective benchmarks * Bug Fixes - Fix compilation errors due to missing type casting OSU Micro Benchmarks v5.3.1 (08/08/16) * New Features & Enhancements - Add option to control whether CUDA kernels are built - Add runtime option to specify number of threads for osu_latency_mt * Bug Fixes - Check if -lrt or -lpthread is needed - Fix compilation warnings - Fix non-blocking collective memory leak - Correct documentation for osu_multi_lat OSU Micro Benchmarks v5.3 (03/25/16) * New Features & Enhancements - Introduce new UPC++ Benchmarks * osu_upcxx_allgather * osu_upcxx_alltoall * osu_upcxx_async_copy_get * osu_upcxx_async_copy_put * osu_upcxx_bcast * osu_upcxx_gather * osu_upcxx_reduce * osu_upcxx_scatter * Bug Fixes - Determine page size at runtime in OpenSHMEM benchmarks (fixes issue seen on OpenPower machines) OSU Micro Benchmarks v5.2 (02/05/16) * New Features & Enhancements - Support for CUDA-Aware Managed memory * osu_bibw * osu_bw * osu_latency * osu_allgather * osu_allgatherv * osu_allreduce * osu_alltoall * osu_alltoallv * osu_bcast * osu_gather * osu_gatherv * osu_reduce * osu_reduce_scatter * osu_scatter * osu_scatterv - Add ability to specify minimum message size in addition to maximum message size for all collective benchmarks OSU Micro Benchmarks v5.1 (11/10/15) * New Features & Enhancements - Introduce non-blocking collective v-variants as well as ialltoallw * osu_iallgatherv * osu_ialltoallv * osu_igatherv * osu_iscatterv * osu_ialltoallw - Add support for benchmarking GPU-Aware non-blocking collectives. Overlap can be computed using either CPU or GPU kernels * osu_iallgather * osu_iallgatherv * osu_ialltoall * osu_ialltoallv * osu_ialltoallw * osu_ibcast * osu_igather * osu_igatherv * osu_iscatter * osu_iscatterv - Allow users the ability to specify zero warmup iterations * Bug Fixes - fix openacc pragma OSU Micro Benchmarks v5.0 (08/17/15) * New Features & Enhancements - Support for a set of non-blocking collectives. The benchmarks can display both the amount of time spent in the collectives and the amount of overlap achievable * osu_iallgather * osu_ialltoall * osu_ibarrier * osu_ibcast * osu_igather * osu_iscatter - Add startup benchmarks to facilitate the ability to measure the amount of time it takes for an MPI library to complete MPI_Init * osu_init * osu_hello - Allocate and align data dynamically - Thanks to Devendar Bureddy from Mellanox for the suggestion - Add options for number of warmup iterations [-x] and number of iterations used per message size [-i] to MPI benchmarks - Thanks to Devendar Bureddy from Mellanox for the suggestion * Bug Fixes - Do not truncate user specified max memory limits - Thanks to Devendar Bureddy from Mellanox for the report and patch OSU Micro Benchmarks v4.4.1 (10/30/14) * Bug Fixes - adding missing MPI3 guard for WIN_ALLOCATE - capture getopt return value in an int instead of char OSU Micro Benchmarks v4.4 (8/23/14) * New Features & Enhancements - Support for MPI-3 RMA (one-sided) and atomic operations using GPU buffers * osu_acc_latency * osu_cas_latency * osu_fop_latency * osu_get_bw * osu_get_latency * osu_put_bibw * osu_put_bw * osu_put_latency * Bug Fixes - remove use of AC_FUNC_MALLOC to avoid undefined rpl_malloc reference - add missing upc benchmarks for make dist rule OSU Micro Benchmarks v4.3.1 (6/20/14) * Bug Fixes - Fix typo in MPI collective benchmark help message - Explicitly mention that -m and -M parameters are specified in bytes OSU Micro Benchmarks v4.3 (3/24/14) * New Features & Enhancements - This new suite includes several new (or updated) benchmarks to measure performance of MPI-3 RMA communication operations with options to select different window creation (WIN_CREATE, WIN_DYNAMIC, and WIN_ALLOCATE) and synchronization functions (LOCK, PSCW, FENCE, FLUSH, FLUSH_LOCAL, and LOCK_ALL) in each benchmark * osu_acc_latency * osu_cas_latency * osu_fop_latency * osu_get_acc_latency * osu_get_bw * osu_get_latency * osu_put_bibw * osu_put_bw * osu_put_latency - New UPC Collective Benchmarks * osu_upc_all_barrier * osu_upc_all_broadcast * osu_upc_all_exchange * osu_upc_all_gather * osu_upc_all_gather_all * osu_upc_all_reduce * osu_upc_all_scatter - Build MPI3 benchmarks when MPI library support is detected * Bug Fixes - Add shmem_quiet() in OpenSHMEM Message Rate benchmark to ensure all previously issued operations are completed - Allocate pWrk from symmetric heap in OpenSHMEM Reduce benchmark OSU Micro Benchmarks v4.2 (11/08/13) * New Features & Enhancements - New OpenSHMEM benchmarks * osu_oshm_fcollect - Enable handling of GPU device buffers in all MPI collective benchmarks - Add device binding for OpenACC benchmarks * Bug Fixes - Add upc_fence after memput in osu_upc_memput benchmark - Correct CUDA configuration example in README - Fix several warnings OSU Micro Benchmarks v4.1 (8/24/13) * New Features & Enhancements - New OpenSHMEM benchmarks * osu_oshm_barrier * osu_oshm_broadcast * osu_oshm_collect * osu_oshm_reduce - New MPI-3 RMA Atomics benchmarks * osu_cas_flush * osu_fop_flush OSU Micro Benchmarks v4.0.1 (5/06/13) * Bug Fixes - Fix several warnings OSU Micro Benchmarks v4.0 (4/16/13) * New Features & Enhancements - Support buffer allocation using OpenACC and CUDA in osu_alltoall, osu_gather, and osu_scatter benchmarks - Limit amount of memory allocated by collective benchmarks dynamically based on number of processes - Memory limit can also be explicitly set by the user through the -m option - Support for 64-bit atomic operations in osu_oshm_atomics * Bug Fixes - Fix numerical overflow error with reporting bandwidth in osu_mbw_mr OSU Micro Benchmarks v3.9 (2/28/13) * New Features & Enhancements - Support buffer allocation using OpenACC in GPU benchmarks - Use average time instead of max time for calculating the bandwidth and message rate in osu_mbw_mr - Thanks to Alex Mikheev from Mellanox for the patch * Bug Fixes - Properly initialize host buffers for DH and HD transfers in GPU benchmarks OSU Micro Benchmarks v3.8 (11/07/12) * New Features & Enhancements - New UPC benchmarks * osu_upc_memput * osu_upc_memget OSU Micro Benchmarks v3.7 (9/07/12) * New Features & Enhancements - New OpenSHMEM benchmarks * osu_oshm_get * osu_oshm_put_mr * osu_oshm_atomics * osu_oshm_put - Organize installation directory according to benchmark type * Bug Fixes - Destroy cuda context before exiting OSU Micro Benchmarks v3.6 (4/30/12) * New Features & Enhancements - New collective benchmarks * osu_allgather * osu_allgatherv * osu_allreduce * osu_alltoall * osu_alltoallv * osu_barrier * osu_bcast * osu_gather * osu_gatherv * osu_reduce * osu_reduce_scatter * osu_scatter * osu_scatterv * Bug Fixes - Fix GPU binding issue when running with HH mode OSU Micro Benchmarks v3.5.2 (3/22/12) * Bug Fixes - Fix typo which led to use of incorrect buffers OSU Micro Benchmarks v3.5.1 (2/02/12) * New Features & Enhancements - Provide script to set GPU affinity for MPI processes * Bug Fixes - Removed GPU binding after MPI_Init to avoid switching context OSU Micro Benchmarks v3.5 (11/09/11) * New Features & Enhancements - Extension of osu_latency, osu_bw, and osu_bibw benchmarks to evaluate the performance of MPI_Send/MPI_Recv operation with NVIDIA GPU device and CUDA support - This functionality is exposed when configured with --enable-cuda option - Flexibility for using buffers in NVIDIA GPU device (D) and host memory (H) - Flexibility for selecting data movement between D->D, D->H and H->D OSU Micro Benchmarks v3.4 (09/13/11) * New Features & Enhancements - Add passive one-sided communication benchmarks - Update one-sided communication benchmarks to provide shared memory hint in MPI_Alloc_mem calls - Update one-sided communication benchmarks to use MPI_Alloc_mem for buffer allocation - Give default values to configure definitions (can now build directly with mpicc) - Update latency benchmarks to begin from 0 byte message * Bug Fixes - Remove memory leaks in one-sided communication benchmarks - Update benchmarks to touch buffers before using them for communication - Fix osu_get_bw test to use different buffers for concurrent communication operations - Fix compilation warnings