Python API
Users can interact with DSI modules using the DSI class which provides an interface for Readers, Writers, and Backends.
This can be seen below and in dsi/dsi.py
. Example workflows using these functions can be seen in the following section: User Examples
Dsi: DSI
The DSI class is a user-level class that encapsulates the Terminal and Sync classes from DSI Core. DSI interacts with several functions within Terminal and Sync without requiring the user to differentiate them. The functionality has been simplified to improve user experience and reduce complexity.
When creating an instance of DSI(), users can optionally specify the type of backend and filename to use
If neither is provided, a temporary backend is automatically created, allowing users to interact with their data.
Read the __init__
documentation below for more details on the supported backend types.
Users should use read()
to load data into DSI and write()
to export data from DSI into supported external formats.
Their respective list functions print all valid readers/writers that can be used.
The primary backend interactions are find()
, query()
, and get_table()
where users can print a search result, or retrieve the result as a collection of data.
If users modify these collections, they can call
update()
to apply the changes to the active backend. Users must NOT edit any columns beginning with `dsi_`. Readupdate()
below to better understand its behavior.
Users can also view various data/metadata of an active backend with list()
, num_tables()
, display()
, summary()
- Notes for users:
When using a complex schema, must call
schema()
prior toread()
to store the relations with the associated data.If input to
update()
is a modified output fromquery()
, the existing table will be overwritten. Ensure data is secure or add backup flag inupdate()
to create a backup database.Read the DSI Data Cards section to learn which data card standards are supported and where to find templates compatible with DSI.
- class dsi.dsi.DSI(filename='.temp.db', backend_name='Sqlite')
A user-facing interface for DSI’s Core middleware.
The DSI Class abstracts Core.Terminal for managing metadata and Core.Sync for data management and movement.
- __init__(filename='.temp.db', backend_name='Sqlite')
Initializes DSI by activating a backend for data operations; default is a Sqlite backend for temporary data analysis. If users specify filename, data is saved to a permanent backend file. Can now call read(), find(), update(), query(), write() or any backend printing operations
- filenamestr, optional
If not specified, a temporary, hidden backend file is created for users to analyze their data. If specified and backend file already exists, it is activated for a user to explore its data. If specified and backend file does not exist, a file with this name is created.
- Accepted file extensions:
If backend_name = “Sqlite” → .db, .sqlite, .sqlite3
If backend_name = “DuckDB” → .duckdb, .db
- backend_namestr, optional
Name of the backend to activate. Must be either “Sqlite” or “DuckDB”. Default is “Sqlite”.
- close()
Closes the connection to the active backend and clears all loaded DSI modules.
- display(table_name, num_rows=25, display_cols=None)
Prints data from a specified table in the active backend.
- table_namestr
Name of the table to display.
- num_rowsint, optional, default=25
Maximum number of rows to print. If the table contains fewer rows, only those are shown.
- display_colslist of str, optional
List of specific column names to display from the table.
If None (default), all columns are displayed.
- find(query, collection=False)
Finds all rows across all tables in the active backend where query can be found.
If query is a string containing a column-level condition (e.g., “age > 4”), this method instead finds all rows in the first table where the condition is satisfied.
- queryint, float, or str
The value to search for in all rows across all tables.
If query is a string with a condition, it must be in the format of a column name, operator, then value. Valid operators on numbers or strings:
age > 4
age < 4
age >= 4
age <= 4
age = 4
age == 4
age != 4
age (4, 8) –> inclusive range between 4 and 8
- collectionbool, optional, default False.
If True, returns a pandas DataFrame representing a subset of table rows that match or satisfy query.
DataFrame includes ‘dsi_table_name’ and ‘dsi_row_index’ columns required for
dsi.update()
. Drop them if not usingupdate()
.
If False (default), prints the matching rows to the console.
return : If there are no matches found, then nothing is returned or printed
- get_table(table_name, collection=False)
Retrieves all data from a specified table without requiring knowledge of the active backend’s query language.
This method offers a simplified alternative to query() for retrieving a full table data without using SQL.
- table_namestr
Name of the table from which all data will be retrieved.
- collectionbool, optional, default False.
If True, returns the result as a pandas.DataFrame
DataFrame includes ‘dsi_table_name’ column required for
dsi.update()
. Drop if not usingupdate()
.
If False (default), prints the result.
return: If table_name does not exist in the backend, then nothing is returned or printed
- list(collection=False)
Gets the names and dimensions (rows x columns) of all tables in the active backend.
- collectionbool, optional, default False.
If True, returns a Python list of all the table names
If False (default), prints each table’s name and dimensions to the console.
- list_backends()
Prints a list of valid backends that can be used in the backend_name argument in backend()
- list_readers()
Prints a list of valid readers that can be used in the reader_name argument in read()
- list_writers()
Prints a list of valid writers that can be used in the writer_name argument in write()
- num_tables()
Prints the number of tables in the active backend.
- query(statement, collection=False)
Executes a SQL query on the active backend.
- statementstr
A SQL query to execute. Only SELECT and PRAGMA statements are allowed.
- collectionbool, optional, default False.
If True, returns the result as a pandas.DataFrame
DataFrame includes ‘dsi_table_name’ column required for
dsi.update()
. Drop if not usingupdate()
.
If False (default), prints the result.
return: If the statement is incorrectly formatted, then nothing is returned or printed
- read(filenames, reader_name, table_name=None)
Loads data into DSI using the specified parameter reader_name
- filenamesstr or list of str
Path(s) to the data file(s) to be loaded.
- The expected file extension depends on the selected reader_name:
“CSV” → .csv
“YAML1” → .yaml or .yml
“TOML1” → .toml
“JSON” → .json
“Ensemble” → .csv
“Cloverleaf” → /path/to/data/directory/
“Bueno” → .data
“DublinCoreDatacard” → .xml
“SchemaOrgDatacard” → .json
“GoogleDatacard” → .yaml or .yml
“Oceans11Datacard” → .yaml or .yml
- reader_namestr
Name of the DSI reader to use for loading the data.
If using a DSI-supported reader, this should be one of the reader_names from list_readers().
If using a custom reader, provide the relative file path to the Python script with the reader. For guidance on creating a DSI-compatible reader, view Custom DSI Reader.
- table_namestr, optional
Name to assign to the loaded table. Only used when the input file contains a single table for the CSV, JSON, or Ensemble reader
- schema(filename)
Loads a relational database schema into DSI from a specified filename
- filenamestr
Path to a JSON file describing the structure of a relational database. The schema should follow the format described in Cloverleaf (Complex Schemas)
Must be called before reading in any data files associated with the schema
- summary(table_name=None, collection=False)
Prints numerical metadata and (optionally) sample data from tables in the active backend.
- table_namestr, optional
If specified, only the numerical metadata for that table will be printed.
If None (default), metadata for all available tables is printed.
- update(collection, backup=False)
Updates data in one or more tables in the active backend using the provided input. Intended to be used after modifying the output of find(), query(), or get_table()
- collectionpandas.DataFrame
The data used to update a table. DataFrame must include unchanged `dsi_` columns from find(), query() or get_table() to successfully update.
If a ‘query()` DataFrame is the input, the corresponding table in the backend will be completely overwritten.
- backupbool, optional, default False.
If True, creates a backup file for the DSI backend before updating its data.
If False (default), only updates the data.
NOTE: Columns from the original table cannot be deleted during update. Only edits or column additions are allowed.
NOTE: If a updates affect a user-defined primary key column, row order may change upon reinsertion.
- write(filename, writer_name, table_name=None)
Exports data from the active backend using the specified writer_name.
- filenamestr
Name of the output file to write.
- Expected file extensions based on writer_name:
“ER_Diagram” → .png, .pdf, .jpg, .jpeg
“Table_Plot” → .png, .jpg, .jpeg
“Csv_Writer” → .csv
- writer_namestr
Name of the DSI writer to use. Call list_writers() to view all available writers.
- table_name: str, optional
Required when using “Table_Plot” or “Csv_Writer” to specify which table to export.
DSI Data Cards
DSI is expanding its support of several dataset metadata standards. Currently supported standards include:
Template file structures can be found and copied in examples/test/
.
To be compatible with DSI, a user’s data card must contain all the fields in its corresponding template. However, if certain metadata is not available for a dataset, the values of those fields may be left empty.
The supported datacards can be read into DSI by creating an instance of DSI() and calling:
read("file/path/to/datacard.XML", 'DublinCoreDatacard')
read("file/path/to/datacardh.JSON", 'SchemaOrgDatacard')
read("file/path/to/datacard.YAML", 'GoogleDatacard')
read("file/path/to/datacard.YAML", 'Oceans11Datacard')
Examples of each data card standard for the Wildfire dataset can be found in examples/wildfire/
User Examples
Examples below display various ways users can incorporate DSI into their data science workflows.
They are located in examples/user/
and must be run from that directory.
All of them either load or refer to data in examples/clover3d/
.
Example 1: Intro use case
Baseline use of DSI to list all valid Readers, Writers, and Backends, and descriptions of each.
# examples/user/1.baseline.py
from dsi.dsi import DSI
baseline_dsi = DSI()
# Lists available backends, readers, and writers in this dsi installation
baseline_dsi.list_backends()
baseline_dsi.list_readers()
baseline_dsi.list_writers()
Example 2: Read data
Reading Cloverleaf data into a DSI backend, and displaying some of that data
# examples/user/2.read.py
from dsi.dsi import DSI
read_dsi = DSI("data.db") # Target a backend, defaults to SQLite if not defined
#dsi.read(path, reader)
read_dsi.read("../clover3d/", 'Cloverleaf') # Read data into memory
#dsi.display(table_name)
read_dsi.display("input") # Print the specific table's data from the Cloverleaf data
read_dsi.close() # cleans DSI memory of all DSI modules - readers/writers/backends
Example 3: Visualize data
Printing various data and metadata from a DSI backend - number of tables, list of tables, actual table data, and summary of table statistics
# examples/user/3.visualize.py
from dsi.dsi import DSI
visual_dsi = DSI("data.db") # Assuming data.db has data from 2.read.py:
visual_dsi.num_tables()
visual_dsi.list()
#dsi.display(table_name, num_rows, display_cols)
# prints all data from 'input'
visual_dsi.display("input")
# optional input to specify number of rows from 'input' to print
visual_dsi.display("input", 2)
# optional input to specify which columns to print
visual_dsi.display("input", 2, ["sim_id", "state1_density", "state2_density", "initial_timestep", "end_step"])
#dsi.summary(table_name, num_rows)
# prints numerical stats for every table in a backend
visual_dsi.summary()
# prints numerical stats for only 'input'
visual_dsi.summary("input")
# prints numerical stats for only 'input' and prints first 5 rows of the actual table
visual_dsi.summary("input", 5)
visual_dsi.close()
Example 4: Find data
Finding data from an active DSI backend that matches an input query - a string or a number.
Prints all matches by default. If True
is passed as an additional argument, returns rows of the first table that satisfies the query.
# examples/user/4.find.py
from dsi.dsi import DSI
find_dsi = DSI("data.db") # Assuming data.db has data from 2.read.py:
#dsi.find(value)
find_dsi.find("Jun 2025") # finds the value 2 in all tables
#dsi.find(value, True)
find_df = find_dsi.find("Jun 2025", True) # Returns the first matching table as a DataFrame
#dsi.find(condition, True)
find_df = find_dsi.find("state2_density > 5.0", True) # Returns matching rows as a DataFrame
find_dsi.close()
Example 5: Update data
Updating data from the edited output of find()
. Input can be output of either find()
, query()
, or get_table()
.
Users must NOT change metadata columns starting with `dsi_` even if adding new rows.
# examples/user/5.update.py
from dsi.dsi import DSI
update_dsi = DSI("data.db") # Assuming data.db has data from 2.read.py:
#dsi.find(condition, collection)
find_df = update_dsi.find("state2_density > 5.0", True) # Returns matching rows as a DataFrame
update_dsi.display(find_df["dsi_table_name"][0], 5) # display table before update
find_df["new_col"] = 50 # add new column to this DataFrame
find_df["max_timestep"] = 100 # update existing column
#dsi.update(collection, backup)
update_dsi.update(find_df, True) # update the table in the backend
update_dsi.display(find_df["dsi_table_name"][0], 5) # display table after update
update_dsi.close()
Example 6: Query data
Querying data from an active DSI backend.
Users can either use query()
to view specific data with a SQL statement, or get_table()
to view all data from a specified table.
# examples/user/6.query.py
from dsi.dsi import DSI
query_dsi = DSI("data.db") # Assuming data.db has data from 2.read.py:
#dsi.query(sql_statement)
query_dsi.query("SELECT * FROM input")
#dsi.get_table(table_name)
query_dsi.get_table("input") # alternative to query() if want all data
query_dsi.close()
Example 7: Complex schema with data
Loading a complex JSON file with schema()
, the associated Cloverleaf data with read()
, and an ER Diagram to display the data relations.
Read Cloverleaf (Complex Schemas) to learn how to structure a DSI-compatible input file for schema()
# examples/user/7.schema.py
from dsi.dsi import DSI
schema_dsi = DSI("schema_data.db")
# dsi.schema(filename)
schema_dsi.schema("../test/example_schema.json") # must be before reading Cloverleaf data
#dsi.read(path, reader)
schema_dsi.read("../clover3d/", 'Cloverleaf')
#dsi.write(filename, writer)
schema_dsi.write("clover_er_diagram.png", "ER_Diagram")
#dsi.display(table_name, num_rows, display_cols)
schema_dsi.display("simulation")
schema_dsi.display("input", ["sim_id", "state1_density", "state2_density", "initial_timestep", "end_step"])
schema_dsi.display("output", ["sim_id", "step", "wall_clock", "average_time_per_cell"])
schema_dsi.display("viz_files")
schema_dsi.close()
Example 8: Write data
Writing data from a DSI backend as an Entity Relationship diagram, table plot, and CSV.
# examples/user/8.write.py
from dsi.dsi import DSI
write_dsi = DSI("schema_data.db") # Assuming schema_data.db has data from 7.schema.py:
#dsi.write(filename, writer, table)
write_dsi.write("er_diagram.png", "ER_Diagram")
write_dsi.write("input_table_plot.png", "Table_Plot", "input")
write_dsi.write("input.csv", "Csv_Writer", "input")
write_dsi.close()