1. Overview
MVAPICH2-X-AWS release is based on MVAPICH2-X and incorporates designs that take advantage of Scalable Reliable Datagram (SRD) of AWS Elastic Fabric Adapter(EFA) technology. It also provides support for XPMEM to achieve efficient intra-node communication performance. The latest version is MVAPICH2-X-AWS 2.3.7
2. Features
-
Based on MVAPICH2-X
-
Design based on Amazon Elastic Fabric Adapter’s (EFA) Scalable Reliable Datagram (SRD) transport protocol
-
Delivers efficient inter-node latency and bandwidth performance
-
Support for XPMEM based intra-node communication
-
Optimized and Tuned collectives (inter-node and intra-node)
-
Support for dynamic run-time detection of XPMEM module
-
Add initial support for AWS hpc6a/c6a instances with 3rd generation AMD EPYC processors (new)
-
Add support & performance optimization for AWS c6g/c7g with Amazon Graviton 2/3 processors aarch64 (new)
-
Targeted for AWS EC2 instances with EFA support
-
Support available (currently) for basic OS types on AWS EC2 including: Amazon Linux 1/2, CentOS 7, Ubuntu 20.04/18.04
3. Launch AWS EFA Instance
Follow the step 1-3 in this webpage:
Launch a AWS EC2 instance with Elastic Fabric Adapter enabled. We recommend to use Amazon Linux 2 AMI.
4. Install MVAPICH2-X-AWS
Install MVAPICH2-X-AWS from rpm: (make sure you have sudo access)
$ wget http://mvapich.cse.ohio-state.edu/download/mvapich/mv2x/2.3/mvapich2-x-aws-mofed-gnu7.3.1-2.3x-1.amzn2.x86_64.rpm
$ wget http://mvapich.cse.ohio-state.edu/download/mvapich/mv2x/mvapich2-x-aws-mofed-gnu9.4.0_2.3.7x-2_amd64.deb
$ wget http://mvapich.cse.ohio-state.edu/download/mvapich/mv2x/2.3/mvapich2-x-aws-mofed-gnu7.3.1-2.3x-1.amzn2.aarch64.rpm
$ rpm -Uvh --nodeps mvapich2-x-aws-mofed-gnu7.3.1-2.3x-1.amzn2.x86_64.rpm
$ rpm --prefix=/custom/install/prefix -Uvh --nodeps mvapich2-x-aws-mofed-gnu7.3.1-2.3x-1.amzn2.x86_64.rpm
$ rpm2cpio mvapich2-x-aws-mofed-gnu7.3.1-2.3x-1.amzn2.x86_64.rpm | cpio -id
5. Install XPMEM
To run MVAPICH2-X-AWS with better intra-node performance, you may want to install and load XPMEM as well.
Download the XPMEM module from the following Gitlab link
$ git clone https://github.com/hpc/xpmem.git
$ cd xpmem
$ ./autogen.sh
$ ./configure --prefix=/opt/xpmem
$ sudo make -j8 install
A common build issue is likely to happen with latest kernel version of Amazon Linux 2 OS. Please find details & solutions in this link: https://github.com/hpc/xpmem/issues/40
$ sudo insmod /opt/xpmem/lib/modules/4.14.123-111.109.amzn2.x86_64/xpmem.ko
$ sudo chmod 666 /dev/xpmem
$ lsmod | grep xpmem
xpmem 32569 0
6. Create More Instances
Now you can install HPC applications. You can either use AWS ParallelCluster to create a cluster with head node and compute nodes, or you can create more instances with image of the created instance. To create more instances, make AMI from our created instance, launch new instances with the AMI so that you don’t need to re-install everything.
Note that you need to repeat the above step to load xpmem everytime when you launch a new instance or reboot an existed instance.
7. Example: How to Run OSU Micro-benchmarks?
OMB is installed as default in mvapich2-x install path, you can find OMB in ./libexec directory
go to mvapich2-x install path such as /opt/mvapich2-x/gnu7.3.1/aws-ofed/intermediate/mpirun
you may need to prepend mvapich2-x library to LD_LIBRARY_PATH like this:
$ export LD_LIBRARY_PATH=/opt/mvapich2-x/gnu7.3.1/aws-ofed/intermediate/mpirun/lib64/:$LD_LIBRARY_PATH
run OMB with this command:
$ ./bin/mpirun_rsh -np 2 -hostfile ~/hostfile ./libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency