OSU Micro-Benchmarks for Python
-------------------------------

The OSU Micro-Benchmarks Python package consists of point-to-point and 
collective communication benchmark tests utilizing the mpi4py library to 
provide Python bindings for the MPI standard. 
The package supports NumPy, CuPy, Numba, and PyCUDA as buffers for 
communication on CPUs and GPUs.

To run benchmarks, a Python environment must be set up and with dependencies 
installed. We recommend using Miniconda to create a Python environment and 
install required packages.

Environment Setup:
1) Install your preferred MPI library. This README tested Python benchmarks using
   the MVAPICH2 library, which can be downloaded from 
   https://mvapich.cse.ohio-state.edu/downloads/.

2) Download and install Miniconda.
   wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh
   bash Miniconda3-py39_4.9.2-Linux-x86_64.sh

3) Create new conda environment
   source PATH/TO/miniconda3/bin/activate
   conda create -n OMB-Py python=3.8
   conda activate OMB-Py
   export PATH=/path/to/mpi/bin:$PATH
   export LD_LIBRARY_PATH=/path/to/mpi/lib:$LD_LIBRARY_PATH

4) Install requirements:
   pip install numpy
   pip install cupy-cuda112 (replace 112 with appropriate cuda version)
   pip install pycuda
   pip install numba
   pip install mpi4py
  For more information on installing mpi4py with different MPI libraries, please 
  refer to the following link: https://mpi4py.readthedocs.io/en/stable/install.html
  
5) Activate conda environment before using OMB Python benchmarks
    source PATH/TO/miniconda3/bin/activate
    conda activate OMB-Py


Running Benchmarks:

To run benchmarks, use the run.py file (available in the "python" folder)
with the following arguments:

Arguments:
    --benchmark: specifies benchmark to run. Options: 
      Collective blocking: allgather, allgatherv, allreduce, alltoall, alltoallv,
                           barrier, bcast, gather, gatherv, reduce_scatter, 
                           reduce, scatter, scatterv
      Point-to-point: bw, bibw, latency, multi_lat
    --buffer: (optional) sets type of buffer. Options: byterray, numpy, cupy, 
              pycuda, or numba. CPU buffers are set by default.
    --pickle: (optional) uses pickle methods. Default is false.
    --min: (optional) sets the minimum tested message length.
    --max: (optional) sets the maximum tested message length.
    --iterations: (optional) sets the number of iterations for each message length.
    --skip: (optional) sets the number of warmup iterations for each message length.

Examples:
Latency test on CPU with buffer type NumPy:
    mpirun -np 2 --hostfile hosts python run.py \ 
    --benchmark latency --buffer numpy

Allgather test on CPU with buffer type NumPy and message lenghts 128 to 4096:
    mpirun -np 4 --hostfile hosts python run.py \
    --benchmark allgather --min 128 --max 4096

Latency test on GPU with buffer type CuPy:
    mpirun -np 2 --hostfile hosts python run.py \
    --benchmark latency --buffer cupy