7. MiniEM

This is the documentation for the ATS-5 Benchmark MiniEM. The content herein was created by the following authors (in alphabetical order).

This material is based upon work supported by the Sandia National Laboratories (SNL), a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia under the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Content herein considered unclassified with unlimited distribution under SAND2023-01069O.

7.1. Purpose

MiniEM solves a first order formulation of Maxwell’s equations of electromagnetics. MiniEM is the [Trilinos] proxy driver for the electromagnetics sub-problem solved by EMPIRE and exercises the relevant Trilinos components (i.e., Tpetra, Belos, MueLu, Ifpack2, Intrepid2, Panzer).

7.2. Characteristics

7.2.1. Application Version

The target application version corresponds to the Git SHA that the Trilinos git submodule at the root of this repository is set to, i.e., within trilinos.

7.2.2. Problem

The [Maxwell-Large] problem given by the input deck “maxwell-large.xml” describes a uniform mesh of a 3D box which makes it ideal for scaling studies. The stock input file for this can be found within the Trilinos repository in the aforementioned link.

Useful parameters from within this input deck are shown below.

<snip>
23    <ParameterList name="Inline Mesh">
<snip>
28      <ParameterList name="Mesh Factory Parameter List">
<snip>
35        <Parameter name="X Elements" type="int" value="40" />
36        <Parameter name="Y Elements" type="int" value="40" />
37        <Parameter name="Z Elements" type="int" value="40" />

These parameters are described below.

X Elements, Y Elements, Z Elements

This sets the size of the problem, which is the product of these 3 quantities. These parameters are set to other values with the cases shown herein. These values should be identical for the calculations herein.

7.2.3. Figure of Merit

Each MiniEM simulation writes out a Figure of Merit (FOM) block to STDOUT. The relevant portion of this block is in the below example.

=================================
FOM Calculation
=================================
  Number of cells = 4116000
  Time for Belos Linear Solve = 705.737 seconds
  Number of Time Steps (one linear solve per step) = 1541
  FOM ( num_cells * num_steps / solver_time / 1000) = 8987.42 k-cell-steps per second
=================================

The number of steps, specified with the --numTimeSteps command line option, described below in Crossroads), must be large enough so the time for the Belos Linear Solve is greater than 600 seconds, i.e., so the solver runs for at least 10 minutes. The figure of merit (FOM) is the bottom entry in this block, i.e., FOM ( num_cells * num_steps / solver_time / 1000).

It is desired to capture the FOM for varying problem sizes that encompass utilizing 35% to 75% of available memory (when all PEs are utilized). The ultimate goal is to maximize this throughput FOM while utilizing at least 50% of available memory.

7.2.4. Correctness Check

MiniEM also provides the [Maxwell-AnalyticSolution] problem given by the input deck “maxwell-analyticSolution.xml”. This will output analytic error values (see below for an example) and will cause the simulation to fail (and return a non-zero exit code) if it exceeds appropriate thresholds. This should be used to verify the build of MiniEM upon the system to assess both the used programming environment and any changes made to the benchmark.

The Belos solver "GMRES block system" of type ""Belos::BlockGmresSolMgr": {Flexible: true, Num Blocks: 10, Maximum Iterations: 10, Maximum Restarts: 20, Convergence Tolerance: 1e-08}" returned a solve status of "SOLVE_STATUS_CONVERGED" in 1 iterations with total CPU time of 0.0189103 sec
L2 Error E maxwell - analyticSolution = 0.0566793

* finished time step 6, t = 5e-09
**************************************************

This case can be run simply by following the overall instructions in Running and replacing the benchmark input file with “maxwell-analyticSolution.xml”. Example output of a failed case is provided below (also note that this case exited with an exit code of 134).

what():  /path/to/trilinos/packages/panzer/mini-em/example/BlockPrec/main.cpp:690:

Throw number = 1

Throw test that evaluated to true: !( (std::sqrt(Thyra::get_ele(*g,0))) < (0.065) )

Error, (std::sqrt(Thyra::get_ele(*g,0)) = 0.0819696) < (0.065 = 0.065)! FAILED!
terminate called after throwing an instance of 'std::out_of_range'
what():  /path/to/trilinos/packages/panzer/mini-em/example/BlockPrec/main.cpp:690:

7.2.5. Permissable Modifications

The authors of this benchmark invite vendors to propose any algorithmic improvements that: (1.) do not alter the current Multigrid solver approach; and (2.) follow the advice given in previous subsections. Please email the authors with any questions about what is or is not in scope. Some additional guidance is provided below.

A minimum of one level of V-cycle is required for both sub-hierarchies to ensure the Trilinos MueLu Algebraic Multigrid (AMG) code path is exercised. This behavior is reflected in the benchmark problem and needs to be preserved with vendor changes. In essence, the solver sets up two sub-problems, and each is solved using AMG. Example Multigrid output that demonstrates this is below. It is appropriate for the following characteristics of this output to be preserved.

  • Scalar should be double (e.g., line 838)

  • Number of levels should be at least 2 (e.g., line 839)

  • Cycle type should be V (e.g., line 842)

835 --------------------------------------------------------------------------------
836 ---                            Multigrid Summary RefMaxwell coarse (1,1)     ---
837 --------------------------------------------------------------------------------
838 Scalar              = double
839 Number of levels    = 2
840 Operator complexity = 1.02
841 Smoother complexity = 1.07
842 Cycle type          = V
843
844 level  rows   nnz      nnz/row  c ratio  procs
845   0  21510  1840968  85.59                 5
846   1  687    29525    42.98    31.31        1

Additionally, there are a couple of parameters within “solverMueLu.xml” that should not be altered since changes will impact the Multigrid work. The specified target size for the coarse grid problems should not be modified. These parameters are highlighted below for reference.

<ParameterList name="Linear Solver">
  <ParameterList name="Preconditioner Types">
    <ParameterList name="Teko">
      <ParameterList name="Inverse Factory Library">
        <ParameterList name="Maxwell">
          <ParameterList name="S_E Preconditioner">
            <ParameterList name="Preconditioner Types">
              <ParameterList name="MueLuRefMaxwell">
                <ParameterList name="refmaxwell: 11list">
                  <Parameter name="coarse: max size" type="int" value="2500"/>
                <ParameterList name="refmaxwell: 22list">
                  <Parameter name="coarse: max size" type="int" value="2500"/>

7.3. System Information

The platforms utilized for benchmarking activities are listed and described below.

7.3.1. Sandia National Laboratories’ “ascicgpu030”

This is a desktop-class system with the following details.

  • Host CPU information is found at [Intel-8260]

  • It has a single Nvidia V100 GPU

7.4. Building

MiniEM is a part of Trilinos, so building Trilinos and its dependencies is required. The [TrilinosBuild] documentation provides a lot of guidance. Information to augment the official Trilinos documentation is provided below.

The following requirements are present for MiniEM.

  • CMake version 3.23 or greater

  • OpenMPI version 3.1 or greater

  • Compilers ca. 2023

Detailed instructions are provided on how to build MiniEM for the following systems:

If submodules were cloned within this repository, then the source code to build MiniEM is already present at the top level within the “trilinos” and “miniem_build” folders.

7.4.1. Crossroads

Instructions for building on Crossroads are provided below. The “miniem_build” folder contains the following items.

build-crossroads.sh

This script carries out the build. All that should be needed is for the spack.yaml to be generated from template.yaml and then for this script to be executed.

spack

This contains a specific checkout of Spack needed to build MiniEM. This will need to be patched; the patch is taken care of via build-crossroads.sh.

spack-fixes-v0.21.0.patch

This is the patch file needed to address issues within the Spack checkout.

template.yaml

This file needs to be copied into spack.yaml and edited to contain the paths to the necessary items.

7.4.2. Sandia National Laboratories’ “ascicgpu030”

Instructions for building on “ascicgpu030” are provided below. The “miniem_build” folder contains the following item(s).

build-ascicgpu030.sh

This script carries out the build which leverages already installed third party libraries. This does not rely upon the Crossroads Spack-based methodology.

7.5. Running

Instructions are provided on how to run MiniEM for the following systems:

7.5.1. Crossroads

An example of how to run the test case on Crossroads is provided within the script (run-crossroads-mapcpu.sh)

7.5.2. Sandia National Laboratories’ “ascicgpu030”

An example of how to run the test case on “ascicgpu030” is provided within the script (run-ascicgpu030.sh)

7.6. Verification of Results

Results from MiniEM are provided on the following systems:

7.6.1. Crossroads

Strong scaling performance (i.e., fixed problem size being run on different MPI rank counts) plots of MiniEM on Crossroads are provided within the following subsections.

7.6.1.1. Problem Size 40 (18-43 GiB)

This problem size corresponds to X, Y, and Z Element values set to 40 which results in an overall discretization that contains 768,000 cells.

Table 7.1 MiniEM Strong Scaling Performance and Memory on Crossroads with 768k cells (18-43 GiB)

PEs / NUMA Domain

PEs / NUMA Domain (%)

Inputfile Elements

Cells

MaxRSS (GiB)

Actual FOM (k-cell-steps/sec)

Ideal FOM (k-cell-steps/sec)

1

0.071

40

768000

18.10

1683.32

1683.32

4

0.286

40

768000

23.62

5307.57

6733.28

7

0.500

40

768000

29.49

6275.48

11783.24

11

0.786

40

768000

37.49

6517.25

18516.52

14

1.000

40

768000

43.25

6311.29

23566.48

MiniEM Strong Scaling Performance on Crossroads with 768k cells (18-43 GiB)

Fig. 7.1 MiniEM Strong Scaling Performance on Crossroads with 768k cells (18-43 GiB)

MiniEM Strong Scaling Memory on Crossroads with 768k cells (18-43 GiB)

Fig. 7.2 MiniEM Strong Scaling Memory on Crossroads with 768k cells (18-43 GiB)

7.6.1.2. Problem Size 60 (57-84 GiB)

This problem size corresponds to X, Y, and Z Element values set to 60 which results in an overall discretization that contains 2,592,000 cells.

Table 7.2 MiniEM Strong Scaling Performance and Memory on Crossroads with 2,592k cells (57-84 GiB)

PEs / NUMA Domain

PEs / NUMA Domain (%)

Inputfile Elements

Cells

MaxRSS (GiB)

Actual FOM (k-cell-steps/sec)

Ideal FOM (k-cell-steps/sec)

1

0.071

60

2592000

57.02

1536.37

1536.37

4

0.286

60

2592000

63.76

4717.88

6145.48

7

0.500

60

2592000

69.93

7797.07

10754.59

11

0.786

60

2592000

78.65

8605.81

16900.07

14

1.000

60

2592000

83.95

9190.05

21509.18

MiniEM Strong Scaling Performance on Crossroads with 2,592k cells (57-84 GiB)

Fig. 7.3 MiniEM Strong Scaling Performance on Crossroads with 2,592k cells (57-84 GiB)

MiniEM Strong Scaling Memory on Crossroads with 2,592k cells (57-84 GiB)

Fig. 7.4 MiniEM Strong Scaling Memory on Crossroads with 2,592k cells (57-84 GiB)

7.6.1.3. Problem Size 70 (89-118 GiB)

This problem size corresponds to X, Y, and Z Element values set to 70 which results in an overall discretization that contains 4,116,000 cells.

Table 7.3 MiniEM Strong Scaling Performance and Memory on Crossroads with 4,116k cells (57-84 GiB)

PEs / NUMA Domain

PEs / NUMA Domain (%)

Inputfile Elements

Cells

MaxRSS (GiB)

Actual FOM (k-cell-steps/sec)

Ideal FOM (k-cell-steps/sec)

1

0.071

70

4116000

89.77

1543.12

1543.12

4

0.286

70

4116000

96.05

4748.66

6172.48

7

0.500

70

4116000

102.91

6947.34

10801.84

11

0.786

70

4116000

111.37

8987.42

16974.32

14

1.000

70

4116000

117.42

8423.60

21603.68

MiniEM Strong Scaling Performance on Crossroads with 4,116k cells (57-84 GiB)

Fig. 7.5 MiniEM Strong Scaling Performance on Crossroads with 4,116k cells (57-84 GiB)

MiniEM Strong Scaling Memory on Crossroads with 4,116k cells (57-84 GiB)

Fig. 7.6 MiniEM Strong Scaling Memory on Crossroads with 4,116k cells (57-84 GiB)

7.6.2. Sandia National Laboratories’ “ascicgpu030”

Strong single-node scaling throughput for varying problem sizes (i.e., changing X Elements, Y Elements, and Z Elements and running on a single Nvidia V100) of MiniEM on “ascicgpu030” are provided below. The throughput corresponds to kilo cell steps per second per node.

Table 7.4 MiniEM Single Node Strong Scaling Throughput and Memory on “ascicgpu030” Utilizing a Single Nvidia V100

Memory (%)

Memory (GiB)

Size

No. Cells

Actual

16.5%

2.65

20

96000

887.491

26.2%

4.19

25

187500

1175.83

43.6%

6.97

31

357492

2695.26

60.2%

9.63

35

514500

3395.71

74.8%

11.96

38

658464

3906.47

85.7%

13.71

40

768000

3831.73

MiniEM Single Node Strong Scaling Throughput on "ascicgpu030" Utilizing a Single Nvidia V100

Fig. 7.7 MiniEM Single Node Strong Scaling Throughput on “ascicgpu030” Utilizing a Single Nvidia V100

7.7. References

[Trilinos]

M. A. Heroux and R. A. Bartlett and V. E. Howle and R. J. Hoekstra and J. J. Hu and T. G. Kolda and R. B. Lehoucq and K. R. Long and R. P. Pawlowski and E. T. Phipps and A. G. Salinger and H. K. Thornquist and R. S. Tuminaro and J. M. Willenbring and A. Williams and K. S. Stanley, ‘An Overview of the Trilinos Project’, 2005, ACM Trans. Math. Softw., Volume 31, No. 3, ISSN 0098-3500.

[TrilinosBuild]

R. A. Bartlett, ‘Trilinos Configure, Build, Test, and Install Reference Guide’, 2023. [Online]. Available: https://docs.trilinos.org/files/TrilinosBuildReference.html. [Accessed: 26- Mar- 2023]

[Maxwell-Large]

Trilinos developers, ‘maxwell-large.xml’, 2024. [Online]. Available: https://github.com/trilinos/Trilinos/blob/master/packages/panzer/mini-em/example/BlockPrec/maxwell-large.xml. [Accessed: 22- Feb- 2024]

[Maxwell-AnalyticSolution]

Trilinos developers, ‘maxwell-analyticSolution.xml’, 2024. [Online]. Available: https://github.com/trilinos/Trilinos/blob/master/packages/panzer/mini-em/example/BlockPrec/maxwell-analyticSolution.xml. [Accessed: 22- Feb- 2024]

[Intel-8260]

Intel. ‘Intel Xeon Platinum 8260 Processor 35.75M Cache 2.40 GHz Product Specifications’, 2024. [Online]. Available: https://ark.intel.com/content/www/us/en/ark/products/192474/intel-xeon-platinum-8260-processor-35-75m-cache-2-40-ghz.html. [Accessed: 18- Mar- 2024]