7. MiniEM

This is the documentation for the ATS-5 Benchmark MiniEM. The content herein was created by the following authors (in alphabetical order).

This material is based upon work supported by the Sandia National Laboratories (SNL), a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia under the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Content herein considered unclassified with unlimited distribution under SAND2023-01069O.

7.1. Purpose

MiniEM solves a first order formulation of Maxwell’s equations of electromagnetics. MiniEM is the [Trilinos] proxy driver for the electromagnetics sub-problem solved by EMPIRE and exercises the relevant Trilinos components (i.e., Tpetra, Belos, MueLu, Ifpack2, Intrepid2, Panzer).

7.2. Characteristics

7.2.1. Application Version

The target application version corresponds to the Git SHA that the Trilinos git submodule at the root of this repository is set to, i.e., within trilinos.

7.2.2. Problem

The [Maxwell-Large] problem given by the input deck “maxwell-large.xml” describes a uniform mesh of a 3D box which makes it ideal for scaling studies. The stock input file for this can be found within the Trilinos repository in the aforementioned link.

Useful parameters from within this input deck are shown below.

<snip>
23    <ParameterList name="Inline Mesh">
<snip>
28      <ParameterList name="Mesh Factory Parameter List">
<snip>
35        <Parameter name="X Elements" type="int" value="40" />
36        <Parameter name="Y Elements" type="int" value="40" />
37        <Parameter name="Z Elements" type="int" value="40" />

These parameters are described below.

X Elements, Y Elements, Z Elements: This sets the size of the problem, which is the product of these 3 quantities. These parameters are set to other values with the cases shown herein. These values should be identical for the calculations herein.

7.2.3. Figure of Merit

Each MiniEM simulation writes out a Figure of Merit (FOM) block to STDOUT. The relevant portion of this block is in the below example.

=================================
FOM Calculation
=================================
  Number of cells = 4116000
  Time for Belos Linear Solve = 705.737 seconds
  Number of Time Steps (one linear solve per step) = 1541
  FOM ( num_cells * num_steps / solver_time / 1000) = 8987.42 k-cell-steps per second
=================================

The number of steps, specified with the --numTimeSteps command line option, described below in Crossroads), must be large enough so the time for the Belos Linear Solve is greater than 600 seconds, i.e., so the solver runs for at least 10 minutes. The figure of merit (FOM) is the bottom entry in this block, i.e., FOM ( num_cells * num_steps / solver_time / 1000).

It is desired to capture the FOM for varying problem sizes that encompass utilizing 35% to 75% of available memory (when all PEs are utilized). The ultimate goal is to maximize this throughput FOM while utilizing at least 50% of available memory.

7.2.4. Correctness Check

MiniEM also provides the [Maxwell-AnalyticSolution] problem given by the input deck “maxwell-analyticSolution.xml”. This will output analytic error values (see below for an example) and will cause the simulation to fail (and return a non-zero exit code) if it exceeds appropriate thresholds. This should be used to verify the build of MiniEM upon the system to assess both the used programming environment and any changes made to the benchmark.

The Belos solver "GMRES block system" of type ""Belos::BlockGmresSolMgr": {Flexible: true, Num Blocks: 10, Maximum Iterations: 10, Maximum Restarts: 20, Convergence Tolerance: 1e-08}" returned a solve status of "SOLVE_STATUS_CONVERGED" in 1 iterations with total CPU time of 0.0189103 sec
L2 Error E maxwell - analyticSolution = 0.0566793

* finished time step 6, t = 5e-09
**************************************************

This case can be run simply by following the overall instructions in Running and replacing the benchmark input file with “maxwell-analyticSolution.xml”. Example output of a failed case is provided below (also note that this case exited with an exit code of 134).

what():  /path/to/trilinos/packages/panzer/mini-em/example/BlockPrec/main.cpp:690:

Throw number = 1

Throw test that evaluated to true: !( (std::sqrt(Thyra::get_ele(*g,0))) < (0.065) )

Error, (std::sqrt(Thyra::get_ele(*g,0)) = 0.0819696) < (0.065 = 0.065)! FAILED!
terminate called after throwing an instance of 'std::out_of_range'
what():  /path/to/trilinos/packages/panzer/mini-em/example/BlockPrec/main.cpp:690:

7.2.5. Permissable Modifications

The authors of this benchmark invite vendors to propose any algorithmic improvements that: (1.) do not alter the current Multigrid solver approach; and (2.) follow the advice given in previous subsections. Please email the authors with any questions about what is or is not in scope. Some additional guidance is provided below.

A minimum of one level of V-cycle is required for both sub-hierarchies to ensure the Trilinos MueLu Algebraic Multigrid (AMG) code path is exercised. This behavior is reflected in the benchmark problem and needs to be preserved with vendor changes. In essence, the solver sets up two sub-problems, and each is solved using AMG. Example Multigrid output that demonstrates this is below. It is appropriate for the following characteristics of this output to be preserved.

Scalar should be double (e.g., line 838)
Number of levels should be at least 2 (e.g., line 839)
Cycle type should be V (e.g., line 842)

--------------------------------------------------------------------------------
---                            Multigrid Summary RefMaxwell coarse (1,1)     ---
--------------------------------------------------------------------------------
Scalar              = double
Number of levels    = 2
Operator complexity = 1.02
Smoother complexity = 1.07
Cycle type          = V
843
level  rows   nnz      nnz/row  c ratio  procs
 0  21510  1840968  85.59                 5
 1  687    29525    42.98    31.31        1

Additionally, there are a couple of parameters within “solverMueLu.xml” that should not be altered since changes will impact the Multigrid work. The specified target size for the coarse grid problems should not be modified. These parameters are highlighted below for reference.

<ParameterList name="Linear Solver">
  <ParameterList name="Preconditioner Types">
    <ParameterList name="Teko">
      <ParameterList name="Inverse Factory Library">
        <ParameterList name="Maxwell">
          <ParameterList name="S_E Preconditioner">
            <ParameterList name="Preconditioner Types">
              <ParameterList name="MueLuRefMaxwell">
                <ParameterList name="refmaxwell: 11list">
                  <Parameter name="coarse: max size" type="int" value="2500"/>
                <ParameterList name="refmaxwell: 22list">
                  <Parameter name="coarse: max size" type="int" value="2500"/>

7.3. System Information

The platforms utilized for benchmarking activities are listed and described below.

Crossroads (see ATS-3/Crossroads)
A GPU build and test system within Sandia National Laboratories named “ascicgpu030” (see Sandia National Laboratories’ “ascicgpu030”).

7.3.1. Sandia National Laboratories’ “ascicgpu030”

This is a desktop-class system with the following details.

Host CPU information is found at [Intel-8260]
It has a single Nvidia V100 GPU

7.4. Building

MiniEM is a part of Trilinos, so building Trilinos and its dependencies is required. The [TrilinosBuild] documentation provides a lot of guidance. Information to augment the official Trilinos documentation is provided below.

The following requirements are present for MiniEM.

CMake version 3.23 or greater
OpenMPI version 3.1 or greater
Compilers ca. 2023

Detailed instructions are provided on how to build MiniEM for the following systems:

Advanced Technology System 3 (ATS-3), also known as Crossroads (see Crossroads)
A GPU build and test system within Sandia National Laboratories named “ascicgpu030” (see Sandia National Laboratories’ “ascicgpu030”)

If submodules were cloned within this repository, then the source code to build MiniEM is already present at the top level within the “trilinos” and “miniem_build” folders.

7.4.1. Crossroads

Instructions for building on Crossroads are provided below. The “miniem_build” folder contains the following items.

build-crossroads.sh: This script carries out the build. All that should be needed is for the spack.yaml to be generated from template.yaml and then for this script to be executed.
spack: This contains a specific checkout of Spack needed to build MiniEM. This will need to be patched; the patch is taken care of via build-crossroads.sh.
spack-fixes-v0.21.0.patch: This is the patch file needed to address issues within the Spack checkout.
template.yaml: This file needs to be copied into spack.yaml and edited to contain the paths to the necessary items.

7.4.2. Sandia National Laboratories’ “ascicgpu030”

Instructions for building on “ascicgpu030” are provided below. The “miniem_build” folder contains the following item(s).

build-ascicgpu030.sh: This script carries out the build which leverages already installed third party libraries. This does not rely upon the Crossroads Spack-based methodology.

7.5. Running

Instructions are provided on how to run MiniEM for the following systems:

Advanced Technology System 3 (ATS-3), also known as Crossroads (see Crossroads)
A GPU build and test system within Sandia National Laboratories named “ascicgpu030” (see Sandia National Laboratories’ “ascicgpu030”)

7.5.1. Crossroads

An example of how to run the test case on Crossroads is provided within the script (run-crossroads-mapcpu.sh)

7.5.2. Sandia National Laboratories’ “ascicgpu030”

An example of how to run the test case on “ascicgpu030” is provided within the script (run-ascicgpu030.sh)

7.6. Verification of Results

Results from MiniEM are provided on the following systems:

Advanced Technology System 3 (ATS-3), also known as Crossroads (see Crossroads)
A GPU build and test system within Sandia National Laboratories named “ascicgpu030” (see Sandia National Laboratories’ “ascicgpu030”)

7.6.1. Crossroads

Strong scaling performance (i.e., fixed problem size being run on different MPI rank counts) plots of MiniEM on Crossroads are provided within the following subsections.

7.6.1.1. Problem Size 40 (18-43 GiB)

This problem size corresponds to X, Y, and Z Element values set to 40 which results in an overall discretization that contains 768,000 cells.

Table 7.1 MiniEM Strong Scaling Performance and Memory on Crossroads with 768k cells (18-43 GiB)
PEs / NUMA Domain	PEs / NUMA Domain (%)	Inputfile Elements	Cells	MaxRSS (GiB)	Actual FOM (k-cell-steps/sec)	Ideal FOM (k-cell-steps/sec)
1	0.071	40	768000	18.10	1683.32	1683.32
4	0.286	40	768000	23.62	5307.57	6733.28
7	0.500	40	768000	29.49	6275.48	11783.24
11	0.786	40	768000	37.49	6517.25	18516.52
14	1.000	40	768000	43.25	6311.29	23566.48

Fig. 7.1 MiniEM Strong Scaling Performance on Crossroads with 768k cells (18-43 GiB)

Fig. 7.2 MiniEM Strong Scaling Memory on Crossroads with 768k cells (18-43 GiB)

7.6.1.2. Problem Size 60 (57-84 GiB)

This problem size corresponds to X, Y, and Z Element values set to 60 which results in an overall discretization that contains 2,592,000 cells.

Table 7.2 MiniEM Strong Scaling Performance and Memory on Crossroads with 2,592k cells (57-84 GiB)
PEs / NUMA Domain	PEs / NUMA Domain (%)	Inputfile Elements	Cells	MaxRSS (GiB)	Actual FOM (k-cell-steps/sec)	Ideal FOM (k-cell-steps/sec)
1	0.071	60	2592000	57.02	1536.37	1536.37
4	0.286	60	2592000	63.76	4717.88	6145.48
7	0.500	60	2592000	69.93	7797.07	10754.59
11	0.786	60	2592000	78.65	8605.81	16900.07
14	1.000	60	2592000	83.95	9190.05	21509.18

Fig. 7.3 MiniEM Strong Scaling Performance on Crossroads with 2,592k cells (57-84 GiB)

Fig. 7.4 MiniEM Strong Scaling Memory on Crossroads with 2,592k cells (57-84 GiB)

7.6.1.3. Problem Size 70 (89-118 GiB)

This problem size corresponds to X, Y, and Z Element values set to 70 which results in an overall discretization that contains 4,116,000 cells.

Table 7.3 MiniEM Strong Scaling Performance and Memory on Crossroads with 4,116k cells (57-84 GiB)
PEs / NUMA Domain	PEs / NUMA Domain (%)	Inputfile Elements	Cells	MaxRSS (GiB)	Actual FOM (k-cell-steps/sec)	Ideal FOM (k-cell-steps/sec)
1	0.071	70	4116000	89.77	1543.12	1543.12
4	0.286	70	4116000	96.05	4748.66	6172.48
7	0.500	70	4116000	102.91	6947.34	10801.84
11	0.786	70	4116000	111.37	8987.42	16974.32
14	1.000	70	4116000	117.42	8423.60	21603.68

Fig. 7.5 MiniEM Strong Scaling Performance on Crossroads with 4,116k cells (57-84 GiB)

Fig. 7.6 MiniEM Strong Scaling Memory on Crossroads with 4,116k cells (57-84 GiB)

7.6.2. Sandia National Laboratories’ “ascicgpu030”

Strong single-node scaling throughput for varying problem sizes (i.e., changing X Elements, Y Elements, and Z Elements and running on a single Nvidia V100) of MiniEM on “ascicgpu030” are provided below. The throughput corresponds to kilo cell steps per second per node.

Table 7.4 MiniEM Single Node Strong Scaling Throughput and Memory on “ascicgpu030” Utilizing a Single Nvidia V100
Memory (%)	Memory (GiB)	Size	No. Cells	Actual
16.5%	2.65	20	96000	887.491
26.2%	4.19	25	187500	1175.83
43.6%	6.97	31	357492	2695.26
60.2%	9.63	35	514500	3395.71
74.8%	11.96	38	658464	3906.47
85.7%	13.71	40	768000	3831.73

MiniEM Single Node Strong Scaling Throughput on "ascicgpu030" Utilizing a Single Nvidia V100 — Fig. 7.7 MiniEM Single Node Strong Scaling Throughput on “ascicgpu030” Utilizing a Single Nvidia V100

7.7. References

[Trilinos]

M. A. Heroux and R. A. Bartlett and V. E. Howle and R. J. Hoekstra and J. J. Hu and T. G. Kolda and R. B. Lehoucq and K. R. Long and R. P. Pawlowski and E. T. Phipps and A. G. Salinger and H. K. Thornquist and R. S. Tuminaro and J. M. Willenbring and A. Williams and K. S. Stanley, ‘An Overview of the Trilinos Project’, 2005, ACM Trans. Math. Softw., Volume 31, No. 3, ISSN 0098-3500.

[TrilinosBuild]

R. A. Bartlett, ‘Trilinos Configure, Build, Test, and Install Reference Guide’, 2023. [Online]. Available: https://docs.trilinos.org/files/TrilinosBuildReference.html. [Accessed: 26- Mar- 2023]

[Maxwell-Large]

Trilinos developers, ‘maxwell-large.xml’, 2024. [Online]. Available: https://github.com/trilinos/Trilinos/blob/master/packages/panzer/mini-em/example/BlockPrec/maxwell-large.xml. [Accessed: 22- Feb- 2024]

[Maxwell-AnalyticSolution]

Trilinos developers, ‘maxwell-analyticSolution.xml’, 2024. [Online]. Available: https://github.com/trilinos/Trilinos/blob/master/packages/panzer/mini-em/example/BlockPrec/maxwell-analyticSolution.xml. [Accessed: 22- Feb- 2024]

[Intel-8260]

Intel. ‘Intel Xeon Platinum 8260 Processor 35.75M Cache 2.40 GHz Product Specifications’, 2024. [Online]. Available: https://ark.intel.com/content/www/us/en/ark/products/192474/intel-xeon-platinum-8260-processor-35-75m-cache-2-40-ghz.html. [Accessed: 18- Mar- 2024]