Databases
hippynn
databases
wrap pytorch datasets.
They aim to provide for a simple interface for going from on-disk data to training,
as well as the capability to re-open the database (restart) without pickling all of the data.
Arrays can have arbitrary names, which should be assigned to the db_name of a node to which they correspond.
The basic Database
object takes a set of numpy arrays in
the following formats:
system_variables.shape == (n_systems, *variable_shape)
atom_variables.shape == (n_systems, n_atoms_max, *variable_shape)
bond_variables.shape == (n_systems, n_atoms_max, n_atoms_max, *variable_shape)
e.g., for charges, your array should have a shape (n_systems, n_atoms_max,1). for positions, your array should have a shape (n_systems, n_atoms_max,3). For species, your array should have a shape (n_systems,n_atoms_max) – this is the one exception to the shapes given above. Note that input of bond variables for periodic systems can be ill-defined if there are multiple bonds between the same pairs of atoms. This is not yet supported.
A note on cell variables. The shape of a cell variable should be specified as (n_systems,3,3), as described above.
It is important to know that there are two common conventions for the cell matrix itself; we use the convention that the basis index
comes first, and the cartesian index comes second. That is, similar to the ase
package,
the element cell[sys,i,j]
gives the j
cartesian coordinate of cell vector i
in system sys
. If you experience
massive errors while fitting to periodic boundary conditions, you may check the transposed version
of your cell data, or compute the RDF.
Database Formats and notes
Numpy arrays on disk
see hippynn.databases.NPZDatabase
(if arrays are stored
in a .npz dictionary) or hippynn.databases.DirectoryDatabase
(if each array is in its own file).
Numpy arrays in memory
Use the base hippynn.databases.Database
class directly to initialize
a database from a dictionary mapping db_names to numpy arrays.
pyanitools H5 files
See hippynn.databases.PyAniFileDB
and see hippynn.databases.PyAniDirectoryDB
.
This format requires h5py
and ase
to be installed.
Snap JSON Format
See hippynn.databases.SNAPDirectoryDatabase
. This format requires ase
to be installed.
For more information on this format, see the FitSNAP software.
ASE Database
If your training data is stored as ASE files of any type, (.json,.db,.xyz,.traj … etc.) it can be loaded directly as a Database for hippynn.
The ASE database AseDatabase
can be loaded with ASE installed.
See ~/examples/ase_db_example.py for a basic example utilizing the class.