Command Line Interface API
Users can interact with DSI Readers, Writers and Backends even easier with DSI’s Command Line Interface (CLI). While slightly more restrictive than the Python API, the CLI allows users to interact with DSI without any knowledge of Python.
Users can store several files in DSI, and query/find/export loaded data to other formats. Users can also write loaded data to a permanent database store for post-analysis.
The CLI actions and example workflows are shown below.
CLI Setup and Actions
Once a user has successfully installed DSI, they can activate the CLI environment by entering dsi in their command line.
This automatically creates a hidden Sqlite database that users can interact with.
However, if a user wants to use DuckDB instead, they should activate the CLI with dsi -b duckdb in their command line.
From here on out, all actions will be using a hidden DuckDB database.
To view all available CLI actions without launching the CLI, users can enter dsi help in their command line.
A comprehensive list of all actions in the CLI environment is as follows:
- help
Displays a help menu for CLI actions and their inputs.
- display <table name> [-n num rows] [-e filename]
Displays data from a specified table, with optional arguments.
table_name is a mandatory input to display that table.
num_rows is optional and only displays the first N rows.
filename is optional and exports the table to a CSV or Parquet file.
- draw [-f filename]
Draws an ER diagram of all data loaded into DSI.
filename is optional; default is er_diagram.png.
- exit
Exits the CLI and closes all active DSI modules.
- find <condition>
Finds all rows of a table that match the condition in the format: [column] [operator] [value]. Ex: find ‘age = 6’
Valid operators:
age > 4
age < 4
age >= 4
age <= 4
age = 4
age == 4
age ~ 4 –> column age contains the number 4
age ~~ 4 –> column age contains the number 4
age != 4
age (4, 8) –> all values in ‘age’ between 4 and 8 (inclusive)
- list
Lists the names of all tables and their dimensions.
- plot_table <table_name> [-f filename]
Plots numerical data from the specified table.
table_name is a mandatory input to plot that table
filename is optional; default is <table_name>_plot.png.
- query <SQL query> [-n num rows] [-e filename]
Executes a specified query (in quotes) and prints the result with optional arguments.
SQL query is mandatory and must match SQLite or DuckDB syntax.
num_rows is optional; prints the first N rows of the result.
filename is optional; export the result as CSV or Parquet file.
- read <filename> [-t table name]
Reads specified data into DSI
filename is a mandatory input of data to ingest. Accepted formats:
CSV, JSON, TOML, YAML, Parquet, SQLite databases, DuckDB databases
URL pointing to data stored in one of the above formats
table_name is optional. If reading a CSV, JSON, or Parquet, users can specify table_name
- search <value>
Searches for an input value across all data loaded into DSI. Can be a number or text.
- summary [-t table_name]
Displays numerical statistics of all tables or a specified table.
table_name is optional and summarizes only that specified table.
- write <filename>
Writes the hidden DSI backend to a designated location. This permanent file will be of the same type as the hidden backend.
Users can also expect basic unix commands such as cd (change directory), ls (list all files) and clear (clear command line view).
CLI Example
The terminal output below displays various ways users can utilize DSI’s CLI for seamless data science analysis.
my_user@local-machine examples % dsi
_____ ___
/ / \ / /\ ___
/ / /\ \ / / /_ / /\
/ / / \ \ / / / /\ / / /
/__/ / \__\ | / / / / \ /__/ \
\ \ \ / / / /__/ / / /\ \ \__\/\ \__
\ \ \ / / \ \ \/ / / / \ \ \/\
\ \ \/ / \ \ / / / \__\ /
\ \ / \__\/ / / /__/ /
\__\/ /__/ / \__\/
\__\/ v1.2.1
Created a temporary Sqlite DSI backend
Enter "help" for usage hints.
dsi> help
display <table_name> [-n num_rows] [-e filename] Displays a table's data. Optionally limit
displayed rows and export to CSV/Parquet
draw [-f filename] Draws an ER diagram of all tables in the
current DSI database
exit Exits the DSI Command Line Interface (CLI)
federate <config file> [-w workspace_folder] Collects data from sources defined in the
YAML config file, optionally saving it to
a workspace folder.
find <condition> Finds all rows of a table that match a
column-level condition.
help Shows this help message.
list Lists all tables in the current DSI database
plot_table <table_name> [-f filename] Plots numerical data from a table to an
optional file name argument
query <SQL_query> [-n num_rows] [-e filename] Executes a SQL query (in quotes). Optionally
limit printed rows or export to CSV/Parquet
read <data_source> [-t table_name] Reads a file or URL into the DSI database.
Optionally set table name.
search <value> Searches for a string or number across DSI.
summary [-t table_name] Summary of the database or a specific table.
viewers Prints the available viewers for the user.
view <available viewer> Creates an instance of the DSI viewer in
another application.
write <filename> Writes data in DSI database to a permanent
location.
ls Lists all files in the current or specified
directory.
cd <path> Changes the working directory within the CLI
environment.
dsi> read wildfire/wildfire_google.yml -t google_data
Loaded wildfire/wildfire_google.yml into the table google_data
Database now has 1 table
dsi> read test/example.toml
Loaded test/example.toml into the table nodes
Database now has 2 tables
dsi> read test/results.toml
Loaded test/results.toml into the table people
Database now has 3 tables
dsi> read test/yosemite5.csv
Loaded test/yosemite5.csv into the table yosemite5
Database now has 4 tables
dsi> list
Table: google_data
- num of columns: 36
- num of rows: 1
Table: nodes
- num of columns: 2
- num of rows: 2
Table: people
- num of columns: 6
- num of rows: 1
Table: yosemite5
- num of columns: 9
- num of rows: 4
dsi> query "SELECT * FROM nodes" -e nodes.csv
Printing the result from input SQL query: SELECT * FROM nodes
name | resources_gpu
---------------------
node1 | 4
node2 | 2
Exported the query result to nodes.csv
dsi> display people -e people_output.csv
Table: people
avg_height_units | avg_height_value | median_speed_units | median_speed_value | std_gravity_units | std_gravity_value
---------------------------------------------------------------------------------------------------------------------
m | 5.5 | s | 6.95 | m/s/s | 9.83
Exported people to people_output.csv
dsi> draw -f dsi_er_diagram.png
Saved an ER Diagram at dsi_er_diagram.png
dsi> plot_table people -f people_plot.png
Saved a plot of the people table in people_plot.png
dsi> summary -t google_data
Table: google_data
column | type | unique | min | max | avg | std_dev
---------------------------------------------------------------------------------------
summary_dataset_name | VARCHAR | 1 | None | None | None | None
summary_summary | VARCHAR | 1 | None | None | None | None
summary_dataset_link | VARCHAR | 1 | None | None | None | None
summary_documentation_link | VARCHAR | 1 | None | None | None | None
authorship_datacard_author1 | VARCHAR | 1 | None | None | None | None
authorship_datacard_author2 | VARCHAR | 1 | None | None | None | None
authorship_datacard_author3 | VARCHAR | 1 | None | None | None | None
authorship_publishing_organization | VARCHAR | 1 | None | None | None | None
authorship_publishing_POC | VARCHAR | 1 | None | None | None | None
authorship_publishing_POC_affiliation | VARCHAR | 1 | None | None | None | None
authorship_publishing_POC_contact | VARCHAR | 1 | None | None | None | None
authorship_dataset_owner1 | VARCHAR | 1 | None | None | None | None
authorship_dataset_owner2 | VARCHAR | 1 | None | None | None | None
authorship_dataset_owner3 | VARCHAR | 1 | None | None | None | None
authorship_dataset_owners_affiliation | VARCHAR | 1 | None | None | None | None
authorship_dataset_owners_contact | VARCHAR | 1 | None | None | None | None
authorship_funding_institution | VARCHAR | 1 | None | None | None | None
authorship_funding_summary | VARCHAR | 1 | None | None | None | None
overview_data_subjects | VARCHAR | 1 | None | None | None | None
overview_data_sensitivity | VARCHAR | 1 | None | None | None | None
overview_version | FLOAT | 1 | 1.0 | 1.0 | 1.0 | 0
overview_maintenance_status | VARCHAR | 1 | None | None | None | None
overview_last_updated | VARCHAR | 1 | None | None | None | None
overview_release_date | VARCHAR | 1 | None | None | None | None
overview_motivation | VARCHAR | 1 | None | None | None | None
overview_dataset_uses | VARCHAR | 1 | None | None | None | None
overview_citation_guidelines | VARCHAR | 1 | None | None | None | None
overview_citation_bibtex | VARCHAR | 1 | None | None | None | None
provenance_collection_methods_used | VARCHAR | 1 | None | None | None | None
provenance_source | VARCHAR | 1 | None | None | None | None
provenance_platform | VARCHAR | 1 | None | None | None | None
provenance_dates_of_collection | VARCHAR | 1 | None | None | None | None
provenance_type_of_data | VARCHAR | 1 | None | None | None | None
provenance_data_selection | VARCHAR | 1 | None | None | None | None
provenance_data_inclusion | VARCHAR | 1 | None | None | None | None
provenance_data_exclusion | VARCHAR | 1 | None | None | None | None
dsi> viewers
Available viewers are: dashboard, ml
dsi> view ml
View the ML emulator at http://localhost:8501
To exit, press [Ctrl + C] here
^C
Closing ML Emulator.
dsi> write dsi_output.db
Successfully wrote all data to dsi_output.db
dsi> exit
Exiting...
my_user@local-machine examples %