Feature Interpretability Documentation

The Feature Interpretability module contains tools for neural network interpretability by examining internal model states (features) in both Python TensorFlow and PyTorch. The code can be viewed here. These tools were utilized on two LANL machine learning problems, detailed in the documentation and in the examples directory.

More details about the use of feature interpretability on the LANL coupon problem can be found in:

Hickmann, K, Callis, S, & Andrews, S. “Training and Interpretability of Deep-Neural Methods for Damage Calibration in Copper.” Proceedings of the ASME 2023 Verification, Validation, and Uncertainty Quantification Symposium. ASME 2023 Verification, Validation, and Uncertainty Quantification Symposium. Baltimore, Maryland, USA. May 17–19, 2023. V001T04A001. ASME. https://doi.org/10.1115/VVUQ2023-108759

These tools were developed by Skylar Callis. They developed this code while working as a post-bachelors student at Los Alamos National Lab from 2022 - 2024. To see what they are up to these days, visit Skylar’s Website.

Module Use

The scripts in the feature interpretability module are command-line executable scripts that run some interperability tool on a given model. All scripts take argparse command-line inputs to specify the model and accessory files.

All scripts take a --PACKAGE argument that specifies if the model was built in tensorflow or pytorch. These two packages don’t tend to play nice with each other, so typically they must be kept in seperate enviorments. The user is responsible for activating the correct enviroments for the package they want to run.

All scripts also take an --EXPERIMENT argument that specifies what dataset the model was trained on. This data process has been established for two experients at LANL: the coupon experiment and the nestedcylinder experiment. If the user wants to analyze networks trained on other datasets, they would first have to develop a submodule to process date from that experiment, and then integrate that code into existing scripts.

While these tools are intended to be easily maluable to new networks and new datasets, they are not guarenteed to work outside of the tested use cases.

Network Nomenclature

Unfortunatly, no one agrees on what the inside of neural networks should be called. To prevent Skylar from going insane, they developed the following Neural Network Nomenclature:

  • A Model is a trained neural network.

  • A Layer is a single level of a model. Common types of layers include convolutions, batch normalizations, and activations.

  • A Feature is the intermediate output of the network at a single layer. Most layers will output multiple features. When not stated, a feaure is implied to be extracted from some specified layer.

  • A Field refers to either a radiograph or a hydrodynamic field related to the input for a given model; some fields are used as training data for models, while others remain unseen by the models.

  • An Prediction is the output from a model.

  • The Ground Truth or Truth refers to the true value of whatever quantity the model is predicting. The model attempts to estimate the ground truth with its predictions.

Please note that Activation does NOT have a specific meaning. Commonly in ML spaces, activation is used to refer to either a Layer or a Feature. Skylar decided the best path forward ignored activation as a term entirely to prevent confusion.