DSI Examples ============ PENNANT mini-app ---------------- `PENNANT` is an unstructured mesh physics mini-application developed at Los Alamos National Laboratory for advanced architecture research. It contains mesh data structures and a few physics algorithms from radiation hydrodynamics and serves as an example of typical memory access patterns for an HPC simulation code. This DSI PENNANT example is used to show a common use case: create and query a set of metadata derived from an ensemble of simulation runs. The example GitHub directory includes 10 PENNANT runs using the PENNANT *Leblanc* test problem. In the first step, a python script is used to parse the slurm output files and create a CSV (comma separated value) file with the output metadata. .. code-block:: unixconfig python3 parse_slurm_output.python In the second step, another python script, .. code-block:: unixconfig python3 dsi_pennant_dev.py reads in the CSV file and creates a database: .. literalinclude:: ../examples/pennant/dsi_pennant_dev.py Resulting in the output of the query: .. figure:: images/example-pennant-output.png :alt: Screenshot of computer program output. :class: with-shadow The output of the PENNANT example. Wildfire Dataset ---------------- This example highlights the use of the DSI framework with QUIC-Fire simulation data and resulting images. QUIC-Fire is a fire-atmosphere modeling framework for prescribed fire burn analysis. It is light-weight (able to run on a laptop), allowing scientists to generate ensembles of thousands of simulations in weeks. This QUIC-fire dataset is an ensemble of prescribed fire burns for the Wawona region of Yosemite National Park. The original file, wildfire.csv, lists 1889 runs of a wildfire simulation. Each row is a unique run with input and output values and associated image url. The columns list the various parameters of interest. The input columns are: wild_speed, wdir (wind direction), smois (surface moisture), fuels, ignition, safe_unsafe_ignition_pattern, safe_unsafe_fire_behavior, does_fire_meet_objectives, and rationale_if_unsafe. The output of the simulation (and post-processing steps) include the burned_area and the url to the wildfire images stored on the San Diego Super Computer. After loading dsi, run this example within the ``dsi/examples/wildfire/`` folder as all filepaths are relative to that location: .. code-block:: unixconfig python3 wildfire_dev.py .. literalinclude:: ../examples/wildfire/wildfire_dev.py This will generate a wildfire.cdb folder with downloaded images from the server and a data.csv file of numerical properties of interest. This cdb folder is called a `Cinema`_ database (CDB). Cinema is an ecosystem for management and analysis of high dimensional data artifacts that promotes flexible and interactive data exploration and analysis. A Cinema database is comprised of a CSV file where each row of the table is a data element (ex: run or ensemble member of a simulation) and each column is a property of the data element. Any column name that starts with 'FILE' is a path to a file associated with the data element. This could be an image, a plot, a simulation mesh or other data artifact. Cinema databases can be visualized through various tools. We illustrate two options below: To visualize the results using Jupyter Lab and Plotly, run: .. code-block:: unixconfig python3 -m pip install plotly python3 -m pip install jupyterlab Open Jupyter Lab with: .. code-block:: unixconfig jupyter lab --browser Firefox and navigate to ``wildfire_plotly.ipynb``. Run the cells to visualize the results of the DSI pipeline. .. figure:: images/example-wildfire-jupyter.png :alt: User interface showing the visualization code to load the CSV file and resultant parallel coordinates plot. :class: with-shadow :scale: 50% Screenshot of the JupyterLab workflow. The CSV file is loaded and used to generate a parallel coordinates plot showing the parameters of interest from the simulation. Another option is to use `Pycinema`_, a QT-based GUI that supports visualization and analysis of Cinema databases. To open a pycinema viewer, first install pycinema and then run the example script. .. code-block:: unixconfig python3 -m pip install pycinema cinema examples/wildfire/wildfire_pycinema.py .. figure:: images/example-wildfire-pycinema.png :class: with-shadow :scale: 40% Screenshot of the Pycinema user interface showing the minimal set of components. Left: the nodeview showing the various pycinema components in the visualization pipeline; upper-right: the table-view; lower-right: the image view. Pycinema components are linked such that making a selection in one view will propagate to the other views. .. _PENNANT: https://github.com/lanl/PENNANT .. _Cinema: https://github.com/cinemascience .. _PyCinema: https://github.com/cinemascience/pycinema .. _schema_section: Cloverleaf (Complex Schemas) ---------------------------- This example shows how to use DSI with ensemble data from 8 Cloverleaf_Serial runs, and how to create a complex schema compatible with DSI. The directory with this sample input and output data can be found in ``examples/clover3d/`` where each run has its own subfolder. Each run's input file is ``clover.in`` and the output is ``clover.out`` and the associated VTK files. After loading dsi, run this example within the ``dsi/examples/developer/`` folder as all filepaths are relative to that location: .. code-block:: unixconfig python3 3.schema.py This workflow uses a custom Cloverleaf reader to load the data, along with a complex schema that maps the input data, output data, and VTK files to the respective simulation runs. Once executing the workflow, users can see that the state2_density value is the only input parameter changed for each run. .. literalinclude:: ../examples/developer/3.schema.py where ``examples/test/example_schema.json`` is: .. code-block:: json { "simulation": { "primary_key": "sim_id" }, "input": { "foreign_key": { "sim_id": ["simulation", "sim_id"] } }, "output": { "foreign_key": { "sim_id": ["simulation", "sim_id"] } }, "viz_files": { "foreign_key": { "sim_id": ["simulation", "sim_id"] } } } and the generated ER diagram is: .. figure:: images/schema_erd.png :scale: 35% :align: center Entity Relationship Diagram of Cloverleaf data. Displays relations between the simulation, input, output, and viz_files tables. This section explains how to define primary and foreign key relationships in a JSON file for ``schema()``, such as ``examples/test/example_schema.json`` For futher clarity, each schema file must be structured as a dictionary where: - each table with a relation is a key whose value is a nested dictionary storing primary and foreign key information - The nested dictionary has 2 keys: 'primary_key' and 'foreign_key' which must be spelled exactly the same to be processed: - The value of 'primary_key' is the string name of the column in this table that is a primary key - Ex: "primary_key" : "id" - The value of 'foreign_key' is another inner dictionary, since a table can have multiple foreign keys: - Each inner dictionary's key is a column in this table that is a foreign key to another table's primary key - The key's value is a list of 2 elements - the other table storing the primary key, and the column in that table that is the primary key - Ex: "foreign_key" : { "name" : ["table1", "id"] , "age" : ["table2", "id"] } - If a table does not have a primary key there is no need to include an empty key/value pair for the table - If a table does not have foreign keys, there is no need for an empty inner dictionary For example, if a user has a a table 'Payments' with a primary key 'id' and a foreign key 'user_name' that points to another table 'Users' with primary key 'name', the schema is: .. code-block:: json { "Payments": { "primary_key" : "id", "foreign_key" : { "user_name" : ["Users", "name"] } } } For example, if we update the Cloverleaf schema by adding a new primary and foreign key relation (assuming the columns exist): .. code-block:: json { "simulation": { "primary_key": "sim_id" }, "input": { "primary_key": "input_id", // <--- new primary key "foreign_key": { "sim_id": ["simulation", "sim_id"] } }, "output": { "foreign_key": { "sim_id": ["simulation", "sim_id"], "input_id": ["input", "input_id"] // <--- new foreign key } }, "viz_files": { "foreign_key": { "sim_id": ["simulation", "sim_id"] } } } our new ER diagram is: .. figure:: images/schema_erd_added.png :scale: 35% :align: center ER Diagram of same data. However, there is now an additional primary/foreign key relation from "input" to "output" Jupyter Notebook ---------------- This example displays an example workflow for a user to read data into DSI, ingest it into a backend and then view the data interactively with a Jupyter notebook. ``examples/developer/10.notebook.py``: .. literalinclude:: ../examples/developer/10.notebook.py The above workflow generates ``dsi_sqlite_backend_output.ipynb`` which can be seen below. Users can make further edits to the Jupyter notebook to interact with the data. .. figure:: images/jupyter_1.png :scale: 65% :align: center .. figure:: images/jupyter_2.png :scale: 65% :align: center Screenshots of an example Jupyter notebook with loaded data.