Machine Specifications

CPU Model CPU Core Info Memory IB Card IB Switch OS CUDA GPU
Intel Xeon CPU E5-2687W v3 2x10 @ 3.10GHz 64GB Mellanox ConnectX-4 (100Gbps) Mellanox EDR IB Switch RHEL 7.0 CUDA 9.2 NVIDIA Tesla V100-PCIE-16GB

Inter-Node Performance numbers of MVAPICH2 on Intel Haswell Architecture with Mellanox ConenctX-4 (06/04/20)

One Way Latency Unidirectional Bandwidth Bidirectional Bandwidth Notes
1.85 us 11323.99 MBps 17962.21 MBps Inter-Node GPU to GPU (Device-to-Device) using single HCA

Machine Specifications

CPU Model CPU Core Info Memory IB Card IB Switch OS CUDA GPU
Intel Xeon Gold 6148 2x20 @ 2.4GHz 384 GB 2xMellanox ConnectX-4 (100Gbps) Mellanox EDR IB Switch CentOS 7.5 CUDA 10.2.89 4xNVIDIA Tesla V100-SXM2-16GB

Intra-Node Performance numbers of MVAPICH2 on Intel Gold Architecture with Mellanox ConenctX-4 (06/04/20)

One Way Latency Unidirectional Bandwidth Bidirectional Bandwidth Notes
1.39 us 48435.92 MBps 91499.80 MBps Intra-Node over 2-lane NVLink2 (Device-to-Device)

These numbers were taken on ABCI System. We would like to thank AIST staff members for providing us access to this system.

Machine Specifications

CPU Model CPU Core Info Memory IB Card IB Switch OS CUDA GPU
POWER9 44 @ 3.45 GHz 256GB Dual Mellanox Connect-X5 (2x100Gbps) Mellanox EDR IB Switch RHEL 7.5 CUDA 10.1.243 NVIDIA Tesla V100-SXM2-16GB

Inter-Node Performance numbers of MVAPICH2 on OpenPOWER9 Architecture with Mellanox Conenct-X5 EDR (06/04/20)

One Way Latency Unidirectional Bandwidth Bidirectional Bandwidth Notes
2.1 us 24680.53 MBps 36146.33 MBps Inter-Node GPU to GPU (Device-to-Device) using two HCAs

Intra-Node Performance numbers of MVAPICH2 on OpenPOWER9 Architecture with Mellanox Conenct-X5 EDR (06/04/20)

One Way Latency Unidirectional Bandwidth Bidirectional Bandwidth Notes
0.71 us 70411.32 MBps 132998.59 MBps Intra-Node over 3-lane NVLink2 (Device-to-Device)