8. MDTEST

8.1. Purpose

The intent of this benchmark is to measure the performance of file metadata operations on the platform storage. MDtest is an MPI-based application for evaluating the metadata performance of a file system and has been designed to test parallel file systems. It can be run on any type of POSIX-compliant file system but has been designed to test the performance of parallel file systems.

8.2. Characteristics

MDTEST is available in the benchmarks repository.

8.2.1. Problem

MDtest measures the performance of various metadata operations using MPI to coordinate execution and collect the results. In this case, the operations in question are file creation, stat, and removal.

8.2.2. Run Rules

Observed benchmark performance shall be obtained from a storage system configured as closely as possible to the proposed platform storage. If the proposed solution includes multiple file access protocols (e.g., pNFS and NFS) or multiple tiers accessible by applications, benchmark results for mdtest shall be provided for each protocol and/or tier.

Performance projections are permissible if they are derived from a similar system that is considered an earlier generation of the proposed system.

Modifications to the benchmark application code are only permissible to enable correct compilation and execution on the target platform. Any modifications must be fully documented (e.g., as a diff or patch file) and reported with the benchmark results.

8.3. Building

After extracting the tar file, ensure that the MPI is loaded and that the relevant compiler wrappers, cc or mpicc, are in $PATH.

cd microbenchmarks/mdtest
make

8.4. Running

The results for the three operations, create, stat, remove, should be obtained for three different file configurations:

  1. 2^20 files in a single directory.

  2. 2^20 files in separate directories, 1 per MPI process.

  3. 1 file accessed by multiple MPI processes.

These configurations are launched as follows.

# Shared Directory
srun -n 64 ./mdtest -F -C -T -r -n 16384 -d /scratch/$USER -N 16
# Unique Directories
srun -n 64 ./mdtest -F -C -T -r -n 16384 -d /scratch/$USER -N 16 -u
# One File Multi-Proc
srun -n 64 ./mdtest -F -C -T -r -n 16384 -d /scratch/$USER -N 16 -S

The following command-line flags MUST be changed:

  • -n - the number of files each MPI process should manipulate. For a test run with 64 MPI processes, specifying -n 16384 will produce the equired 2^20 files (2^6 MPI processes x 2^14 files each). This parameter must be changed for each level of concurrency.

  • -d /scratch - the absolute path to the directory in which this test should be run.

  • -N - MPI rank offset for each separate phase of the test. This parameter must be equal to the number of MPI processes per node in use (e.g., -N 16 for a test with 16 processes per node) to ensure that each test phase (read, stat, and delete) is performed on a different node.

The following command-line flags MUST NOT be changed or omitted:

  • -F - only operate on files, not directories

  • -C - perform file creation test

  • -T - perform file stat test

  • -r - perform file remove test

8.5. Example Results

These nine tests: three operations, three file conditions should be performed under 4 different launch conditions, for a total of 36 results:

  1. A single MPI process

  2. The optimal number of MPI processes on a single compute node

  3. The minimal number of MPI processes on multiple compute nodes that achieves the peak results for the proposed system.

  4. The maximum possible MPI-level concurrency on the proposed system. This could mean: 1) Using one MPI process per CPU core across the entire system. 2) Using the maximum number of MPI processes possible if one MPI process per core will not be possible on the proposed architecture. 3) Using more than 2^20 files if the system is capable of launching more than 2^20 MPI processes.

8.5.1. Crossroads

Table 8.6 MDTEST Microbenchmark Crossroads (MB/s)

Create

Stat

Remove

1048576 files in shared dir

Single Process

5702

6773

8361

Optimal MPI processes single node

6899

26175

7454

Minimal MPI processes on multiple nodes w/peak results

79901

370968

79583

Extrapolated maximum possible MPI-level concurrency

79901

370968

79583

1048576 files in individual dirs

Single Process

5706

6756

8352

Optimal MPI processes single node

72817

25697

5032

Minimal MPI processes on multiple nodes w/peak results

183740

807216

205460

Extrapolated maximum possible MPI-level concurrency

183740

807216

205460

One file, multi-processes

Single Process

1309

4151

4537

Optimal MPI processes single node

69240

27956

1421537

Minimal MPI processes on multiple nodes w/peak results

667616

141535

4351769

Extrapolated maximum possible MPI-level concurrency

667616

141535

4351769