Machine Specifications

CPU Model CPU Core Info Memory IB Card IB Switch OS CUDA GPU
Intel Xeon CPU E5-2687W v3 2x10 @ 3.10GHz 64GB Mellanox ConnectX-4 (100Gbps) Mellanox EDR IB Switch RHEL 7.0 CUDA 9.2 NVIDIA Tesla V100-PCIE-16GB

Inter-Node Performance numbers of MVAPICH2 on Intel Haswell Architecture with Mellanox ConenctX-4 (03/16/19)

One Way Latency Unidirectional Bandwidth Bidirectional Bandwidth Notes
1.85 us 9890.47 MBps 17962.21 MBps Inter-Node GPU to GPU (Device-to-Device) using single HCA

Machine Specifications

CPU Model CPU Core Info Memory IB Card IB Switch OS CUDA GPU
Intel Xeon CPU E5-2650 v4 2x12 @ 2.2GHz 64GB Mellanox ConnectX-4 (100Gbps) Mellanox EDR IB Switch RHEL 7.0 CUDA 9.0 4xNVIDIA Tesla P100-PCIE2-16GB

Intra-Node Performance numbers of MVAPICH2 on Intel Broadwell Architecture with Mellanox ConenctX-4 (03/16/19)

One Way Latency Unidirectional Bandwidth Bidirectional Bandwidth Notes
1.55 us 13049.32 MBps 21261.77 MBps Intra-Node (Device-to-Device)

Machine Specifications

CPU Model CPU Core Info Memory IB Card IB Switch OS CUDA GPU
POWER9 44 @ 3.45 GHz 256GB Dual Mellanox Connect-X5 (2x100Gbps) Mellanox EDR IB Switch RHEL 7.5 CUDA 9.2 NVIDIA Tesla V100-SXM2-16GB

Inter-Node Performance numbers of MVAPICH2 on OpenPOWER9 Architecture with Mellanox Conenct-X5 EDR (03/15/19)

One Way Latency Unidirectional Bandwidth Bidirectional Bandwidth Notes
5.66 us 22737.687 MBps 37409.62 MBps Inter-Node GPU to GPU (Device-to-Device) using two HCAs

Intra-Node Performance numbers of MVAPICH2 on OpenPOWER9 Architecture with Mellanox Conenct-X5 EDR (03/16/19)

One Way Latency Unidirectional Bandwidth Bidirectional Bandwidth Notes
5.52 us 70463.41 MBps 133097.60 MBps Intra-Node over 3-lane NVLink2 (Device-to-Device)