Core
The DSI Core middleware defines the Terminal concept. An instantiated Terminal is the human/machine DSI interface. The person setting up a Core Terminal only needs to know how they want to ask questions, and what metadata they want to ask questions about. If they don’t see an option to ask questions the way they like, or they don’t see the metadata they want to ask questions about, then they should ask a Backend Contributor or a Plugin Contributor, respectively.
A Core Terminal is a home for Plugins (Readers/Writers), and an interface for Backends. A Core Terminal is instantiated with a set of default Plugins and Backends, but they must be loaded before a user query is attempted. core.py
contains examples of how you might work with DSI using an interactive Python interpreter for your data science workflows:
#Loading using plugins and backends
from dsi.core import Terminal
a=Terminal()
a.list_available_modules('plugin')
# ['GitInfo', 'Hostname', 'SystemKernel', 'Bueno', 'Csv']
a.load_module('plugin','Bueno','reader',filenames='./data/bueno1.data'')
# Bueno plugin reader loaded successfully.
a.load_module('plugin','Hostname','writer')
# Hostname plugin writer loaded successfully.
a.list_available_modules('backend')
#['Gufi', 'Sqlite', 'Parquet']
#a.load_module('backend','Sqlite','back-end',filenames='./data/bueno.sqlite_db')
#a.load_module('backend','Parquet','back-end',filename='./data/bueno.pq')
a.list_loaded_modules()
# {'writer': [<dsi.plugins.env.Hostname object at 0x7f21232474d0>],
# 'reader': [<dsi.plugins.env.Bueno object at 0x7f2123247410>],
# 'front-end': [],
# 'back-end': []}
At this point, you might decide that you are ready to collect data for inspection. It is possible to utilize DSI Backends to load additional metadata to supplement your Plugin metadata, but you can also sample Plugin data and search it directly.
The process of transforming a set of Plugin writers and readers into a queryable format is called transloading. A DSI Core Terminal has a transload()
method which may be called to execute all Plugins at once:
>>> a.transload()
>>> a.active_metadata
>>> # OrderedDict([('uid', [1000]), ('effective_gid', [1000]), ('moniker', ['qwofford'])...
Once a Core Terminal has been transloaded, no further Plugins may be added.
Core:Sync
The DSI Core middleware also defines data management functionality in Sync
. The purpose of Sync
is to provide file metadata documentation and data movement capabilities when moving data to/from local and remote locations. The purpose of data documentation is to capture and archive metadata (i.e. location of local file structure, their access permissions, file sizes, and creation/access/modification dates) and track their movement to the remote location for future access. The primary functions, Copy
, Move
, and Get
serve as mechanisms to copy data, move data, or retrieve data from remote locations by creating a DSI database in the process, or retrieving an existing DSI database that contains the location(s) of the target data.
Core Modules and Functions
- class dsi.core.Sync(project_name='test')
A class defined to assist in data management activities for DSI
Sync is where data movement functions such as copy (to remote location) and sync (local filesystem with remote) exist.
- copy(local_loc, remote_loc, isVerbose=False)
Helper function to stage location and get filesystem information, and copy data over using a preferred API
- dircrawl(filepath)
Crawls the root ‘filepath’ directory and returns files
filepath: source filepath to be crawled
return: returns crawled file-list
- get()
Helper function that searches remote location based on project name, and retrieves DSI database
- populate(local_loc, remote_loc, isVerbose=False)
Helper function to gather filesystem information, local and remote locations to create a filesystem entry in a new or existing database
- class dsi.core.Terminal
An instantiated Terminal is the DSI human/machine interface.
Terminals are a home for Plugins and an interface for Backends. Backends may be front-ends or back-ends. Plugins may be Writers or readers. See documentation for more information.
- add_external_python_module(mod_type, mod_name, mod_path)
Adds an external, meaning not from the DSI repo, Python module to the module_collection.
Afterwards, load_module can be used to load a DSI module from the added Python module. Note: mod_type is needed because each Python module only implements plugins or backends.
For example,
term = Terminal() term.add_external_python_module(‘plugin’, ‘my_python_file’,
‘/the/path/to/my_python_file.py’)
term.load_module(‘plugin’, ‘MyPlugin’, ‘reader’)
term.list_loaded_modules() # includes MyPlugin
- artifact_handler(interaction_type, **kwargs)
Store or retrieve using all loaded DSI Backends with storage functionality.
A DSI Core Terminal may load zero or more Backends with storage functionality. Calling artifact_handler will execute all back-end functionality currently loaded, given the provided
interaction_type
.
- list_available_modules(mod_type)
List available DSI modules of an arbitrary module type.
This method is useful for Core Terminal setup. Plugin and Backend type DSI modules are supported, but this getter can be extended to support any new DSI module types which are added. Note: self.VALID_MODULES refers to _DSI_ Modules however, DSI Modules are classes, hence the naming idiosynchrocies below.
- list_loaded_modules()
List DSI modules which have already been loaded.
These Plugins and Backends are active or ready to execute a post-processing task.
- load_module(mod_type, mod_name, mod_function, **kwargs)
Load a DSI module from the available Plugin and Backend module collection.
DSI modules may be loaded which are not explicitly listed by the list_available_modules. This flexibility ensures that advanced users can access higher level abstractions. We expect most users will work with module implementations rather than templates, but but all high level class abstractions are accessible with this method.
- transload(**kwargs)
Transloading signals to the DSI Core Terminal that Plugin set up is complete.
A DSI Core Terminal must be transloaded before queries, metadata collection, or metadata storage is possible. Transloading is the process of merging Plugin metadata from many data sources to a single DSI Core Middleware data structure.
- unload_module(mod_type, mod_name, mod_function)
Unloads a DSI module from the active_modules collection