4. OSU Microbenchmarks
4.1. Purpose
The OSU Microbenchmarks (OMB) are widely used to measure and evaluate the performance of MPI operations for point-to-oiint, multi-pair, collective, and one-sided communications.
4.2. Characteristics
Official site: OSUMB
LANL Benchmarks: Benchmarks Repository
4.2.1. Problem
The OSU benchmarks are a suite of microbenchmarks designed to measure network characteristics on HPC systems.
4.2.2. Run Rules
N/A
4.3. Building
On GPU enabled systems add these flags to the following configure lines:
--enable-cuda
--with-cuda-include=/path/to/cuda/include
--with-cuda-libpath=/path/to/cuda/lib
Build and install the benchmarks.
./configure --prefix=$INSTALL_DIR
make -j
make -j install
Before configuring make sure your CXX and CC environment variables are set to an MPI compiler or wrapper. On most systems this will look like:
export CC=mpicc CXX=mpicxx
On systems with vendor provided wrappers it may look different. For example, on HPE-Cray systems:
export CC=cc CXX=CC
4.4. Running
For any GPU enabled system, please also include the GPU variants of the following benchmarks.
Program |
Description |
Msg Size |
Num Nodes |
Rank Config |
---|---|---|---|---|
osu_latency |
P2p Latency |
8 B |
2 |
|
osu_bibw |
P2p Bi-directional BW |
16 kB |
2 |
1 per node |
osu_mbw_mr |
P2p Multi-BW & Msg Rate |
16 KB |
2 |
|
osu_get_acc_latency |
P2p 1 sided Accumulate Latency |
8 B |
2 |
1 per node |
osu_get |
Get latency |
8 B |
2 |
1 per node |
osu_put |
Put latency |
8 B |
2 |
1 per node |
osu_barrier |
Barrier time |
N/A |
full-system |
|
osu_ibarrier |
Async-Barrier time |
N/A |
full-system |
|
osu_allreduce |
All-reduce Latency |
8B, 16 MB |
full-system |
|
osu_alltoall |
All-to-all Latency |
8 B |
full-system |
|
4.5. Example Results
Results for the OSU Microbenchmarks are provided on the following systems:
Crossroads (see ATS-3/Crossroads)
4.5.1. Crossroads
Test |
Ranks |
Msg Size |
Num Nodes |
Result |
---|---|---|---|---|
osu_latency |
1 per node |
8 B |
2 |
1.61 us |
osu_bibw |
1 per node |
1 MB |
2 |
45307.17 MB/s |
osu_mbw_mr |
1 per NIC |
16 KB |
2 |
49656.45 MB/s |
osu_mbw_mr |
1 per core |
16 KB |
2 |
45198.46 MB/s |
osu_get_acc_latency |
1 per node |
8 B |
2 |
10.85 us |
osu_get |
1 per node |
8 B |
2 |
3.59 us |
osu_put |
1 per node |
8 B |
2 |
4.87 us |
osu_barrier |
1 per physical core |
N/A |
full-system |
550.66 us |
osu_ibarrier |
1 per physical core |
N/A |
full-system |
4802.82 us |
osu_allreduce |
1 per physical core |
8B, 16 MB |
full-system |
345.55, 2477365.95 us |
osu_alltoall |
1 per node |
8B |
full-system |
1954.35 us |