.. pyDRESCALk documentation master file, created by
   sphinx-quickstart on Fri Dec  3 21:44:28 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to pyDRESCALk's documentation!
======================================
pyDRESCALk is a software package for applying non-negative RESCAL decomposition in a distributed fashion to large datasets. It can be utilized for decomposing relational datasets. It can minimize the difference between reconstructed data and the original data through Frobenius norm.  Additionally, the Custom Clustering algorithm allows for automated determination for the number of Latent features.

Features
========================

* Ability to decompose relational datasets.
* Utilization of MPI4py for distributed operation.
* Distributed random initializations.
* Distributed Custom Clustering algorithm for estimating automated latent feature number (k) determination.
* Objective of minimization of Frobenius norm.
* Support for distributed CPUs/GPUs.
* Support for Dense/Sparse data.
* Demonstrated scaling performance upto 10TB of dense and 9Exabytes of Sparse data.


Scalability
========================
pyDRESCALk Scales from laptops to clusters. The library is convenient on a laptop. It can be installed easily  with conda or pip and extends the matrix decomposition from a single core to numerous cores across nodes.
pyDRESCALk is efficient and has been tested on powerful servers across LANL and Oakridge scaling beyond 1000+ nodes.
This library facilitates the transition between single-machine to large scale cluster so as to enable users to both start simple and scale up when necessary.


Installation
========================

.. code-block:: console

   git clone https://github.com/lanl/pyDRESCALk.git
   cd pyDRESCALk
   conda create --name pyDRESCALk python=3.7.1 openmpi mpi4py
   source activate pyDRESCALk
   python setup.py install


Usage Example
========================
We provide a sample dataset that can be used for estimation of k:

.. code-block:: python

   '''Imports block'''

   import sys
   import pyDRESCALk.config as config
   config.init(0)
   from pyDRESCALk.pyDRESCALk import *
   from pyDRESCALk.data_io import *
   from pyDRESCALk.dist_comm import *
   from scipy.io import loadmat
   from mpi4py import MPI
   comm = MPI.COMM_WORLD
   args = parse()
   comm = MPI.COMM_WORLD
   p_r, p_c = 2, 2
   comms = MPI_comm(comm, p_r, p_c)
   comm1 = comms.comm
   rank = comm.rank
   size = comm.size
   args = parse()
   args.size, args.rank, args.comm, args.p_r, args.p_c = size, rank, comms, p_r, p_c
   args.row_comm, args.col_comm, args.comm1 = comms.cart_1d_row(), comms.cart_1d_column(), comm1
   rank = comms.rank
   args.fpath = '../data/'
   args.fname = 'dnations'
   args.ftype = 'mat'
   args.start_k = 2
   args.end_k = 5
   args.itr = 200
   args.init = 'rand'
   args.noise_var = 0.005
   args.verbose = True
   args.norm = 'fro'
   args.method = 'mu'
   args.np = np
   args.precision = np.float32
   args.key = 'R'
   A_ij = np.moveaxis(data_read(args).read().astype(args.precision),-1,0) #Always make data of dimension mxnxn.
   print('Data dimension for rank=',rank,'=',A_ij.shape)
   args.results_path = '../Results/'
   pyDRESCALk(A_ij, factors=None, params=args).fit()

Indices and tables
========================

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   modules


Indices and tables
========================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`