Benchmarks | Network-Based Computing Laboratory

Benchmarks

  • OMB-3.1.1 (08/20/08) is now available for download.
    • This version adds a new multi-pair latency benchmark.
  • This page contains descriptions of the following MPI-level tests included in the OMB package:
    • Latency, bandwidth, bidirectional bandwidth, multiple bandwidth / message rate test, multi-pair latency and broadcast for MVAPICH (MPI-1)
    • Latency, multi-threaded latency, multi-pair latency, multiple bandwidth / message rate test bandwidth, bidirectional bandwidth, one-sided put latency, one-sided put bandwidth, one-sided put bidirectional bandwidth, one-sided get latency, one-sided get bandwidth, and one-sided accumulate latency for MVAPICH2 (MPI-2)
    • Please note that all these benchmarks require osu.h, which can be downloaded here.

Latency Test

  • The latency tests were carried out in a ping-pong fashion. The sender sends a message with a certain data size to the receiver and waits for a reply from the receiver. The receiver receives the message from the sender and sends back a reply with the same data size. Many iterations of this ping-pong test were carried out and average one-way latency numbers were obtained. Blocking version of MPI functions (MPI_Send and MPI_Recv) were used in the tests. This test is available here.

Multi-threaded Latency Test (only applicable for MVAPICH2 with threading support enabled)

  • The multi-threaded latency test performs a ping-pong test with a single sender process and multiple threads on the receiving process. In this test the sending process sends a message of a given data size to the receiver and waits for a reply from the receiver process. The receiving process has a variable number of receiving threads (set by default to 2), where each thread calls MPI_Recv and upon receiving a message sends back a response of equal size. Many iterations are performed and the average one-way latency numbers are reported. This test is available here.

Bandwidth Test

  • The bandwidth tests were carried out by having the sender sending out a fixed number (equal to the window size) of back-to-back messages to the receiver and then waiting for a reply from the receiver. The receiver sends the reply only after receiving all these messages. This process is repeated for several iterations and the bandwidth is calculated based on the elapsed time (from the time sender sends the first message until the time it receives the reply back from the receiver) and the number of bytes sent by the sender. The objective of this bandwidth test is to determine the maximum sustained date rate that can be achieved at the network level. Thus, non-blocking version of MPI functions (MPI_Isend and MPI_Irecv) were used in the test. This test is available here.

Bidirectional Bandwidth Test

  • The bidirectional bandwidth test is similar to the bandwidth test, except that both the nodes involved send out a fixed number of back-to-back messages and wait for the reply. This test measures the maximum sustainable aggregrate bandwidth by two nodes. This test is available here.

Multi-pair Latency Test

  • This test is very similar to the latency test. However, at the same instant multiple pairs are performing the same test simultaneously. The processes are divided into two equal blocks according to their ranks. Each process from a block forms a pair with the corresponding process from the other block. For example, process with rank '0' pairs with the process with rank 'np/2' and rank '1' with 'np/2 + 1' and so on. This test is available here.

Multiple Bandwidth / Message Rate test

  • The multi-pair bandwidth and message rate test evaluates the aggregate uni-directional bandwidth and message rate between multiple pairs of processes. Each of the sending processes sends a fixed number of messages (the window size) back-to-back to the paired receiving process before waiting for a reply from the receiver. This process is repeated for several iterations. The objective of this benchmark is to determine the achieved bandwidth and message rate from one node to another node with a configurable number of processes running on each node. The test is available here.

Broadcast Latency Test

  • Broadcast Latency Test: The Broadcast latency test is carried out in the following manner. After doing a MPI_Bcast the root node waits for an ack from the last receiver. This ack is in the form of a zero byte message from the receiver to the root. This test is carried out for a large number (1000) of iterations. The Broadcast latency is obtained by subtracting the time taken for the ack from the total time. We compute the ack time initially by doing a ping-pong test. This test is available here.

Alltoall Collective Test

  • Alltoall Collective Test: All the processes involved in the test collectively call the MPI_Alltoall api. The average time for one alltoall collective operation over a large number of iterations is reported. The test is run for message sizes in multiples of two till 4MBytes. This test is available here.

One-Sided Put Latency Test (only applicable for MVAPICH2)

  • One-Sided Put Latency Test: The sender (origin process) calls MPI_Put (ping) to directly place a message of certain data size in the receiver window. The receiver (target process) calls MPI_Win_wait to make sure the message has been received. Then the receiver initates a MPI_Put (pong) of the same data size to the sender which is now waiting on a synchronization call. Several iterations of this test is carried out and the average put latency numbers is obtained. This test is available here.

One-Sided Get Latency Test (only applicable for MVAPICH2)

  • One-Sided Get Latency Test: The origin process calls MPI_Get (ping) to directly fetch a message of certain data size from the target process window to its local window.It then waits on a synchronization call (MPI_Win_complete) for local completion. After the synchronization call the target and origin process are switched for the pong message. Several iterations of this test are carried out and the average get latency numbers is obtained. This test is available here.

One-Sided Put Bandwidth Test (only applicable for MVAPICH2)

  • One-Sided Put Bandwidth Test: The bandwidth tests were carried out by the origin process calling a fixed number of back to back Puts and then wait on a synchronization call (MPI_Win_complete) for completion. This process is repeated for several iterations and the bandwidth is calculated based on the elapsed time and the number of bytes sent by the origin process. This test is available here.

One-Sided Get Bandwidth Test (only applicable for MVAPICH2)

  • One-Sided Get Bandwidth Test: The bandwidth tests were carried out by origin process calling a fixed number of back to back Gets and then wait on a synchronization call (MPI_Win_complete) for completion. This process is repeated for several iterations and the bandwidth is calculated based on the elapsed time and the number of bytes sent by the origin process. This test is available here.

One-Sided Put Bidirectional Bandwidth Test (only applicable for MVAPICH2)

  • One-Sided Put Bidirectional Bandwidth Test: The bidirectional bandwidth test is similar to the bandwidth test,except that both the nodes involved send out a fixed number of back to back put messages and wait for the completion. This test measures the maximum sustainable aggregrate bandwidth by two nodes. This test is available here.

Accumulate Latency Test (only applicable for MVAPICH2)

  • One-Sided Accumulate Latency Test: The origin process calls MPI_Accumulate to combine the data moved to the target process window with the data that resides at the remote window. The combining operation used in the test is MPI_SUM. It then waits on a synchronization call (MPI_Win_complete) for local completion. After the synchronization call, the target and origin process are switched for the pong message. Several iterations of this test are carried out and the average accumulate latency number is obtained. This test is available here.

Please note that there are many different ways to measure these performance parameters. For example, the bandwidth test can have different variations wrt the types of MPI calls (blocking vs. non-blocking) being used, total number of back-to-back messages sent in one iteration, number of iterations, etc. Other ways to measure bandwidth may give different numbers. Readers are welcome to use other tests, as appropriate to their application environments.