Federation for DSI ================== This allows users to pull databases from many different locations and provide a centralized view of the databases. To run, from the DSI folder: To run:: python tools/federated/federate_datasets.py tools/federated/input.yaml where ``input.yaml`` is a config file. Config file ----------- ``input.yaml`` is the config file that contains the paths from which data can be pulled:: repo_paths: - "~/remote_sources.csv" - "local_dsi_sources.csv" workspace_folder: "dsi_databases_01" download_limit: 10485760 # 10 MB - ``repo_paths``: repo_paths: points to CSV files where the user can specify DSI repos. The paths should be relative to the config file or absolute paths - ``workspace_folder``: where the remote federated datasets will be stored; those local to your computer will not be moved - ``download_limit``: after this file limit, the user will be asked to confirm Data Sources ------------ An example of database sources is as follows:: location_type,location,path,type,submitter_name,submitter_email,timsestamp local,local,tools/federated/database/ocean_11_datasets.db,data,pascal grosset,pascalgrosset@lanl.gov,2026-2-10--16:40:00s local,local,/home/pascalgrosset/data/artimis/fracture/3d/aleks/fracture_aleks.sqlite,data,pascal grosset,pascalgrosset@lanl.gov,2026-2-10--16:40:00s HPC,ch-fe.lanl.gov,/lustre/scratch5/pascalgrosset/test_db/nif.db,data,pascal grosset,pascalgrosset@lanl.gov,2026-3-10--16:38:00 url,url,https://www.timestored.com/data/sample/sakila.db,data,unknown,unknown,2026-3-10--16:38:00 **Note:** only ``location_type``, ``location``, and ``path`` are required - ``location_type``: the currently supported values are: - ``local``: refers to your local computer - ``HPC``: refers to a supercomputer - ``url``: a file on the web - ``github``: a file on a GitHub repository - ``location``: on HPC systems, indicates the name of the cluster the data is on. For the others, it is the same as ``location_type`` - ``path``: the path of the dataset Other Notes ----------- - Local repositories will not be downloaded; there will just be a reference to them - A file called ``dsi_database_list.json`` will be created that shows all the paths of the downloaded files