4. OSU Microbenchmarks

4.1. Purpose

The OSU Microbenchmarks (OMB) are widely used to measure and evaluate the performance of MPI operations for point-to-oiint, multi-pair, collective, and one-sided communications.

4.2. Characteristics

Official site: OSUMB
LANL Benchmarks: Benchmarks Repository

4.2.1. Problem

The OSU benchmarks are a suite of microbenchmarks designed to measure network characteristics on HPC systems.

4.2.2. Run Rules

N/A

4.3. Building

On GPU enabled systems add these flags to the following configure lines:

--enable-cuda
--with-cuda-include=/path/to/cuda/include
--with-cuda-libpath=/path/to/cuda/lib

Build and install the benchmarks.

./configure --prefix=$INSTALL_DIR
make -j
make -j install

Before configuring make sure your CXX and CC environment variables are set to an MPI compiler or wrapper. On most systems this will look like:

export CC=mpicc CXX=mpicxx

On systems with vendor provided wrappers it may look different. For example, on HPE-Cray systems:

export CC=cc CXX=CC

4.4. Running

For any GPU enabled system, please also include the GPU variants of the following benchmarks.

Table 4.6 OSU Microbenchmark Tests
Program	Description	Msg Size	Num Nodes	Rank Config
osu_latency	P2p Latency	8 B	2	2 tests per node: Longest Path (worst case) Shortest Path (best case)
osu_bibw	P2p Bi-directional BW	16 kB	2	1 per node
osu_mbw_mr	P2p Multi-BW & Msg Rate	16 KB	2	Host-to-Host (two tests): 1 per NIC 1 per core Device-to-Device (two tests): 1 per NIC 1 per accelerator
osu_get_acc_latency	P2p 1 sided Accumulate Latency	8 B	2	1 per node
osu_get	Get latency	8 B	2	1 per node
osu_put	Put latency	8 B	2	1 per node
osu_barrier	Barrier time	N/A	full-system	Two tests: 1 per physical core 1 per GPU/accelerator
osu_ibarrier	Async-Barrier time	N/A	full-system	Two tests: 1 per physical core 1 per GPU/accelerator
osu_allreduce	All-reduce Latency	8B, 16 MB	full-system	Two tests: 1 per physical core 1 per GPU/accelerator
osu_alltoall	All-to-all Latency	8 B	full-system	Two tests: 1 per physical core 1 per GPU/accelerator

4.5. Example Results

Results for the OSU Microbenchmarks are provided on the following systems:

Crossroads (see ATS-3/Crossroads)

4.5.1. Crossroads

Table 4.7 OSU Microbenchmark Tests
Test	Ranks	Msg Size	Num Nodes	Result
osu_latency	1 per node	8 B	2	1.61 us
osu_bibw	1 per node	1 MB	2	45307.17 MB/s
osu_mbw_mr	1 per NIC	16 KB	2	49656.45 MB/s
osu_mbw_mr	1 per core	16 KB	2	45198.46 MB/s
osu_get_acc_latency	1 per node	8 B	2	10.85 us
osu_get	1 per node	8 B	2	3.59 us
osu_put	1 per node	8 B	2	4.87 us
osu_barrier	1 per physical core	N/A	full-system	550.66 us
osu_ibarrier	1 per physical core	N/A	full-system	4802.82 us
osu_allreduce	1 per physical core	8B, 16 MB	full-system	345.55, 2477365.95 us
osu_alltoall	1 per node	8B	full-system	1954.35 us