Databases ============= ``hippynn`` :mod:`~hippynn.databases` wrap pytorch datasets. They aim to provide for a simple interface for going from on-disk data to training, as well as the capability to re-open the database (restart) without pickling all of the data. Arrays can have arbitrary names, which should be assigned to the `db_name` of a node to which they correspond. The basic :class:`~hippynn.databases.Database` object takes a set of numpy arrays in the following formats:: system_variables.shape == (n_systems, *variable_shape) atom_variables.shape == (n_systems, n_atoms_max, *variable_shape) bond_variables.shape == (n_systems, n_atoms_max, n_atoms_max, *variable_shape) e.g., for charges, your array should have a shape (n_systems, n_atoms_max,1). for positions, your array should have a shape (n_systems, n_atoms_max,3). For species, your array should have a shape (n_systems,n_atoms_max) -- this is the one exception to the shapes given above. Note that input of bond variables for periodic systems can be ill-defined if there are multiple bonds between the same pairs of atoms. This is not yet supported. A note on *cell* variables. The shape of a cell variable should be specified as (n_systems,3,3), as described above. It is important to know that there are two common conventions for the cell matrix itself; we use the convention that the basis index comes first, and the cartesian index comes second. That is, similar to the ``ase`` package, the element ``cell[sys,i,j]`` gives the ``j`` cartesian coordinate of cell vector ``i`` in system ``sys``. If you experience massive errors while fitting to periodic boundary conditions, you may check the transposed version of your cell data, or compute the RDF. Database Formats and notes --------------------------- Numpy arrays on disk ........................ see :class:`hippynn.databases.NPZDatabase` (if arrays are stored in a `.npz` dictionary) or :class:`hippynn.databases.DirectoryDatabase` (if each array is in its own file). Numpy arrays in memory ........................ Use the base :class:`hippynn.databases.Database` class directly to initialize a database from a dictionary mapping db_names to numpy arrays. pyanitools H5 files ........................ See :class:`hippynn.databases.PyAniFileDB` and see :class:`hippynn.databases.PyAniDirectoryDB`. This format requires ``h5py`` and ``ase`` to be installed. Snap JSON Format ........................ See :class:`hippynn.databases.SNAPDirectoryDatabase`. This format requires ``ase`` to be installed. For more information on this format, see the FitSNAP_ software. .. _FitSNAP: https://fitsnap.github.io ASE Database ........................ If your training data is stored as ASE files of any type, (.json,.db,.xyz,.traj ... etc.) it can be loaded directly as a Database for hippynn. The ASE database :class:`~hippynn.databases.AseDatabase` can be loaded with ASE installed. See ~/examples/ase_db_example.py for a basic example utilizing the class.