Making a Reader for Your Application

DSI readers are the primary way to transform outside data to metadata that DSI can ingest. Readers are Python classes that must include a few methods, namely __init__, pack_header, and add_rows.

Initializer: `init(self) -> None:`

__init__ is where you can include all of your initialization logic, just make sure to initialize your superclass. Note: __init__ can also take whatever parameters needed for a given application.

Example __init__:

def __init__(self) -> None:
  super().__init__()  # see "plugins" to determine which superclass your reader should extend

Pack Header: `pack_header(self) -> None`

pack_header is responsible for setting a schema, registering which columns will be populated by the reader. The set_schema(self, table_data: list, validation_model=None) -> None method is available to subclasses of StructuredMetadata, which allows one to simply give a list of column names to register. validation_model is an pydantic model that can help you enforce types, but is completely optional.

Example pack_header:

def pack_header(self) -> None:
  column_names = ["foo", "bar", "baz"]
  self.set_schema(column_names)

Add Rows: `add_rows(self) -> None`

add_rows is responsible for appending to the internal metadata buffer. Whatever data is being ingested, it’s done here. The add_to_output(self, row: list) -> None method is available to subclasses of StructuredMetadata, which takes a list of data that matches the schema and appends it to the internal metadata buffer.

Note: pack_header must be called before metadata is appended in add_rows. Another helper method of StructuredMetadata is schema_is_set, which provides a way to tell if this restriction is met.

Example add_rows:

def add_rows(self) -> None:
  if not self.schema_is_set():
    self.pack_header()

  # data parsing can go here (or abstracted to other functions)
  my_data = [1, 2, 3]

  self.add_to_output(my_data)

Newer Add Rows: `add_rows(self) -> None`

If you are confident that the the data you read in add_rows is in the form of an OrderedDict (the data structure used to store all ingested data), you can bypass the use of pack_header and add_to_output with an alternate set_schema function.

This function, set_schema_2(self, collection, validation_model=None) -> None, directly assigns the data you read in add_rows to the internal DSI abstraction layer, provided that the data you pass as the collection variable is an OrderedDict. This method allows you to quickly append data to the abstraction wholesale, rather than row-by-row.

Example alternate add_rows:

def add_rows(self) -> None:

  # data is stored as an OrderedDict so can use set_schema2
  my_data = OrderedDict()
  my_data["jack"] = 10
  my_data["joey"] = 20
  my_data["amy"] = 30

  self.set_schema_2(my_data)

Implemented Examples

If you want to see some full reader examples in-code, some can be found in dsi/plugins/file_reader.py. Csv is an especially simple example to go off of.

Loading Your Reader

There are two ways to load your reader, internally and externally.

Internally: If you want your reader loadable internally with the rest of the provided implementations (in dsi/plugins), it must be registered in the class variables of Terminal in dsi/core.py. If this is done correctly, your reader will be loadable by the load_module method of Terminal.

Externally: If your reader is not along side the other provided implementations, possibly somewhere else on the filesystem, your reader will be loaded externally. This is done by using the add_external_python_module method of Terminal. If you load an external Python module this way (ex. term.add_external_python_module('plugin','plugin_class_name','/path/to/python_file.py')), your reader will then be loadable by the load_module method of Terminal.

Contributing Your Reader

If your reader is helpful and acceptable for public use, you should consider making a pull request (PR) into DSI.

Please note that any accepted PRs into DSI should satisfy the following:

Passes all tests in dsi/plugins/tests
Has no pylama errors/warnings (see dsi/.githooks)

Making a Reader for Your Application

Initializer: __init__(self) -> None:

Pack Header: pack_header(self) -> None

Add Rows: add_rows(self) -> None

Newer Add Rows: add_rows(self) -> None