Federation for DSI
This allows users to pull databases from many different locations and provide a centralized view of the databases. To run, from the DSI folder:
To run:
python tools/federated/federate_datasets.py tools/federated/input.yaml
where input.yaml is a config file.
Config file
input.yaml is the config file that contains the paths from which data can be pulled:
repo_paths:
- "~/remote_sources.csv"
- "local_dsi_sources.csv"
workspace_folder: "dsi_databases_01"
download_limit: 10485760 # 10 MB
repo_paths: repo_paths: points to CSV files where the user can specify DSI repos. The paths should be relative to the config file or absolute pathsworkspace_folder: where the remote federated datasets will be stored; those local to your computer will not be moveddownload_limit: after this file limit, the user will be asked to confirm
Data Sources
An example of database sources is as follows:
location_type,location,path,type,submitter_name,submitter_email,timsestamp
local,local,tools/federated/database/ocean_11_datasets.db,data,pascal grosset,pascalgrosset@lanl.gov,2026-2-10--16:40:00s
local,local,/home/pascalgrosset/data/artimis/fracture/3d/aleks/fracture_aleks.sqlite,data,pascal grosset,pascalgrosset@lanl.gov,2026-2-10--16:40:00s
HPC,ch-fe.lanl.gov,/lustre/scratch5/pascalgrosset/test_db/nif.db,data,pascal grosset,pascalgrosset@lanl.gov,2026-3-10--16:38:00
url,url,https://www.timestored.com/data/sample/sakila.db,data,unknown,unknown,2026-3-10--16:38:00
Note: only location_type, location, and path are required
location_type: the currently supported values are:local: refers to your local computerHPC: refers to a supercomputerurl: a file on the webgithub: a file on a GitHub repository
location: on HPC systems, indicates the name of the cluster the data is on. For the others, it is the same aslocation_typepath: the path of the dataset
Other Notes
Local repositories will not be downloaded; there will just be a reference to them
A file called
dsi_database_list.jsonwill be created that shows all the paths of the downloaded files