Command Line Interface API

Users can interact with DSI Readers, Writers and Backends even easier using DSI’s own Command Line Interace (CLI). While slightly more restrictive, the CLI allows users without any knowledge of Python to utilize DSI for their own needs. Users can store several files in one database, query/find/export data to other formats and save it in a database for further analysis.

The CLI actions and example workflows are shown below.

CLI Setup and Actions

Once a user is within a dsi virtual environment, they should enter dsi in their command line to active the CLI environment. By default, this creates a hidden DSI Sqlite database that users can interact with. However, if a user wants to use a hidden DuckDB database, they should enter dsi -b duckdb into their command line. From here on out, all actions will be with this DuckDB database.

A comprehensive list of all actions within the CLI environment are:
  • help : displays a help menu for CLI actions and their inputs

  • display <table name> [-n num rows] [-e filename] : displays data from a specified table, with optional num rows and export filename

    • table_name is a mandatory input to display that table

    • num_rows is an optional input that limits the display to the first N rows

    • filename is an optional input that exports this display to either a CSV or Parquet file

  • exit : exits the CLI and closes all active DSI Readers/Writers/Backends

  • draw [-f filename] : Draws an ER diagram of all data that has been loaded into DSI.

    • filename is an optional input to which the ER diagram is saved. Default is “er_diagram.png”

  • find <var> : searches for an input variable from all data loaded into DSI. Can be tables, columns or datapoints

  • list : lists the names of all tables of data and their dimensions

  • load <filename> [-t table name] : loads a filename/url into DSI. Optional table_name input if data is only table. Accepted data files:

    • CSV, TOML, YAML, Parquet, Sqlite databases, DuckDB databases

  • plot_table <table_name> [-f filename]: plots a specified table’s numerical data to an optional file name input. Default is table_name + “_plot.png”

  • query <SQL query> [-n num rows] [-e filename] : Runs the specified query with optional num_rows to display and export filename

    • SQL query is a mandatory input that must be compatible with hidden database: Sqlite or DuckDB

    • num_rows is an optional input that limits the display to the first N rows

    • filename is an optional input that exports this display to either a CSV or Parquet file

  • save <filename> : Saves the hidden DSI database to an official name which must be same type. Ex: sqlite database cannot have a .duckdb extension

  • summary [-t table] [-n num_rows] : displays statistics of each table in the database, with optional input to limit to just one table

    • table is an optional input that specifies one table’s statistics to display

    • num_rows is an optional input that when specified prints N rows of that table’s data.

Beyond this, users can expect basic unix commands such as clear.

CLI Examples

The terminal output below displays various ways users can utilize DSI’s CLI to simplify data science workflows.

(mydsi) viyer@pn2405703 data % dsi
DSI version 1.1.1

Enter "help" for usage hints.

Sqlite backend back-write loaded successfully.
dsi> help
display <table name> [-n num rows] [-e filename]  Displays the contents of that table, num rows to display is 
                                                      optional, and it can be exported to a csv/parquet file
exit                                              Exit the DSI Command Line Interface (CLI)
draw [-f filename]                                Draws an ER Diagram of all tables in the current DSI database
find <var>                                        Search for a variable in the dataset
help                                              Shows this help
list                                              Lists the tables in the current DSI databse
load <filename> [-t table name]                   Loads this filename/url to a DSI database. optional
                                                      table name argument if input file is only one table
plot_table <table_name> [-f filename]             Plots a table's numerical data to an optional file name argument
query <SQL query> [-n num rows] [-e filename]     Runs a query (in quotes), displays an optionl num rows,
                                                      and exports output to a csv/parquet file
save <filename>                                   Save the local database as <filename>, which will be the same type.
summary [-t table] [-n num_rows]                  Get a summary of the database or just a table and optionally 
                                                     specify number of data rows to display
ls                                                Lists all files in current directory or a specified path
cd <path>                                         Changes the working directory within the CLI environment


dsi> load student_test1.yml
YAML1 plugin reader loaded successfully.
Database now has 4 tables
student_test1.yml successfully loaded.


dsi> load student_test2.yml
YAML1 plugin reader loaded successfully.
Database now has 4 tables
student_test2.yml successfully loaded.


dsi> load results.toml
TOML1 plugin reader loaded successfully.
Database now has 5 tables
results.toml successfully loaded.


dsi> load yosemite5.csv
Csv plugin reader loaded successfully.
Database now has 6 tables
yosemite5.csv successfully loaded.


dsi> list

Table: math
  - num of columns: 7
  - num of rows: 2

Table: address
  - num of columns: 9
  - num of rows: 2

Table: physics
  - num of columns: 7
  - num of rows: 2

Table: dsi_units
  - num of columns: 3
  - num of rows: 8

Table: people
  - num of columns: 3
  - num of rows: 1

Table: yosemite5
  - num of columns: 9
  - num of rows: 4


dsi> query "SELECT * FROM physics" -e physics_output.csv
  specification     n        o    p        q   r       s
0          !amy   9.8  gravity   23  home 23   1 -0.0012
1         !amy1  91.8  gravity  233  home 23  12 -0.0122
Csv_Writer plugin writer loaded successfully.


dsi> display math -e math_output.csv
table_name: math

specification | a | b | c     | d | e    | f     
-------------------------------------------------
!jack         | 1 | 2 | 45.98 | 2 | 34.8 | 0.0089
!jack1        | 2 | 3 | 45.98 | 3 | 44.8 | 0.0099

Csv_Writer plugin writer loaded successfully.


dsi> draw -f dsi_er_diagram.png
ER_Diagram plugin writer loaded successfully.


dsi> plot_table physics -f physics_plot.png
Table_Plot plugin writer loaded successfully.


dsi> summary -t physics -n 1
Table: physics

column        | type    | min     | max     | avg     | std_dev              
-----------------------------------------------------------------------------
specification | VARCHAR | None    | None    | None    | None                 
n             | FLOAT   | 9.8     | 91.8    | 50.8    | 41.0                 
o             | VARCHAR | None    | None    | None    | None                 
p             | INTEGER | 23      | 233     | 128.0   | 105.0                
q             | VARCHAR | None    | None    | None    | None                 
r             | INTEGER | 1       | 12      | 6.5     | 5.5                  
s             | FLOAT   | -0.0122 | -0.0012 | -0.0067 | 0.0055000000000000005

specification | n    | o       | p   | q       | r  | s      
-------------------------------------------------------------
!amy          | 9.8  | gravity | 23  | home 23 | 1  | -0.0012
  ... showing 1 of 2 rows


dsi> save dsi_output.db

dsi> exit
Exiting...
Closing the abstraction layer, and all active plugins/backends
(mydsi) viyer@pn2405703 data %