MVAPICH/MVAPICH2 Project
Ohio State University



Sandy Bridge | IntraNode InterIOH GPU | Performance | Network-Based Computing Laboratory

Intra-node inter-IOH GPU-GPU performance numbers of MVAPICH2 on Sandy Bridge Architecture (05/06/13)

  • Experimental Testbed: Each node of our testbed has 16 cores (dual oct-core, 2.60 GHz) and 32 GB of main memory. The CPUs are based on Sandy Bridge architecture and run in 64 bit mode. The nodes support 16x PCI Express Gen3 interfaces and each node is equipped with a Mellanox ConnectX-3 FDR HCA and two NVIDIA Tesla K20c GPUs. They have CUDA Toolkit 5.0 and CUDA Driver 310.44 installed. The nodes are connected using a Mellanox FDR InfiniBand switch. The operating system used is RedHat Enterprise Linux Server release 6.3 (Santiago).
  • The results reported are for MPI communication between device memory on two GPUs with ECC enabled. MVAPICH2 currently delivers one-way latencie of 27.3 microseconds for 4 bytes. It achieves a unidirectional bandwidth upto 4540.25 Million Bytes/sec and a bidirectional bandwidth upto 4487.17 Million Bytes/sec. (1 Mega Byte = 1,048,576 Bytes; 1 Million Byte = 1,000,000 Bytes)
  • OSU Micro Benchmarks have been modified to measure performance of MPI communication from NVIDIA GPU devices and are available for download at OMB v4.0.1.
  • The two GPU devices on this node were connected to different I/O Hubs. The two processes were mapped onto cores 1 and 9 (inter-socket). Process 0 used GPU Device 0 and Process 1 used GPU Device 1.