Backends
Backends connect users to DSI Core middleware and backends allow DSI middleware data structures to read and write to persistent external storage. Backends are modular to support user contribution. Backend contributors are encouraged to offer custom backend abstract classes and backend implementations. A contributed backend abstract class may extend another backend to inherit the properties of the parent. In order to be compatible with DSI core middleware, backends should create an interface to Python built-in data structures or data structures from the Python collections
library. Backend extensions will be accepted conditional to the extention of backends/tests
to demonstrate new Backend capability. We can not accept pull requests that are not tested.
Note that any contributed backends or extensions should include unit tests in backends/tests
to demonstrate the new Backend capability.
- class dsi.backends.sqlite.Artifact
Primary Artifact class that holds database schema in memory. An Artifact is a generic construct that defines the schema for metadata that defines the tables inside of SQL
- class dsi.backends.sqlite.Sqlite(filename)
Primary storage class, inherits sql class
- check_type(text)
Tests input text and returns a predicted compatible SQL Type text: text string return: string description of a SQL data type
- export_csv(rquery, tname, fname, isVerbose=False)
Function that outputs a csv file of a return query, not the query itself
rquery: return of an already called query output
tname: name of the table for (all) columns to export
fname: target filename (including path) that will output the return query as a csv file
return: none
- export_csv_query(query, fname, isVerbose=False)
Function that outputs a csv file of a return query, given an initial query.
query: raw SQL query to be executed on current table
fname: target filename (including path) that will output the return query as a csv file
return: none
- get_artifact_list(isVerbose=False)
Function that returns a list of all of the Artifact names (represented as sql tables)
return: list of Artifact names
- put_artifact_type(types, isVerbose=False)
Primary class for defining metadata Artifact schema.
- types: DataType derived class that defines the string name, properties
(named SQL type), and units for each column in the schema.
return: none
- put_artifacts(collection, isVerbose=False)
Primary class for insertion of collection of Artifacts metadata into a defined schema
- collection: A Python Collection of an Artifact derived class that has multiple regular structures of a defined schema,
filled with rows to insert.
return: none
- put_artifacts_csv(fname, tname, isVerbose=False)
Function for insertion of Artifact metadata into a defined schema by using a CSV file, where the first row of the CSV contains the column names of the schema. Any rows thereafter contain data to be inserted. Data types are automatically assigned based on typecasting and default to a string type if none can be found.
fname: filepath to the .csv file to be read and inserted into the database
tname: String name of the table to be inserted
return: none
- put_artifacts_lgcy(artifacts, isVerbose=False)
Legacy function for insertion of artifact metadata into a defined schema
artifacts: data_type derived class that has a regular structure of a defined schema, filled with rows to insert.
return: none
- put_artifacts_only(artifacts, isVerbose=False)
Function for insertion of Artifact metadata into a defined schema as a Tuple
- Artifacts: DataType derived class that has a regular structure of a defined schema,
filled with rows to insert.
return: none
- put_artifacts_t(collection, tableName='TABLENAME', isVerbose=False)
Primary class for insertion of collection of Artifacts metadata into a defined schema, with a table passthrough
- collection: A Python Collection of an Artifact derived class that has multiple regular structures of a defined schema,
filled with rows to insert.
tableName: A passthrough to define a table and set the name of a table
return: none
- query_fctime(operator, ctime, isVerbose=False)
Function that queries file creation times within the filesystem metadata store
operator: operator input GT, LT, EQ as a modifier for a creation time search
ctime: creation time in POSIX format, see the utils dateToPosix conversion function
return: query list of filenames matching the creation time criteria with modifiers
- query_fname(name, isVerbose=False)
Function that queries filenames within the filesystem metadata store
name: string name of a subsection of a filename to be searched
return: query list of filenames matching name string
- query_fsize(operator, size, isVerbose=False)
Function that queries ranges of file sizes within the filesystem metadata store
operator: operator input GT, LT, EQ as a modifier for a filesize search
size: size in bytes
return: query list of filenames matching filesize criteria with modifiers
- sqlquery(query, isVerbose=False)
Function that provides a direct sql query passthrough to the database.
query: raw SQL query to be executed on current table
return: raw sql query list that contains result of the original query
- class dsi.backends.gufi.Gufi(prefix, index, dbfile, table, column, verbose=False)
GUFI Datastore
- get_artifacts(query)
Retrieves GUFI’s metadata joined with a dsi database query: an sql query into the dsi_entries table
- isVerbose = False
prefix: prefix to GUFI commands index: directory with GUFI indexes dbfile: sqlite db file from DSI table: table name from the DSI db we want to join on column: column name from the DSI db to join on
- class dsi.backends.parquet.Parquet(filename, **kwargs)
Support for a Parquet back-end.
Parquet is a convenient format when metadata are larger than SQLite supports.
- get_artifacts()
Get Parquet data from filename.
- static get_cmd_output(cmd: list) str
Runs a given command and returns the stdout if successful.
If stderr is not empty, an exception is raised with the stderr text.
- inspect_artifacts(collection, interactive=False)
Populate a Jupyter notebook with tools required to look at Parquet data.
- put_artifacts(collection)
Put artifacts into file at filename path.