Welcome to pyDNMFk's documentation!
===================================

pyDNMFk is a software package for applying non-negative matrix factorization in a distrubuted fashion to large datasets. It has the ability to minimize the difference between reconstructed data and the original data through various norms (Frobenious, KL-divergence). Additionally, the Custom Clustering algorithm allows for automated determination for the number of Latent features.

Features
========================

* Utilization of MPI4py for distributed operation.
* Distributed NNSVD and SVD initiaizations.
* Distributed Custom Clustering algorithm for estimating automated latent feature number (k) determination.
* Objective of minimization of KL divergence/Frobenius norm.
* Optimization with multiplicative updates, BCD, and HALS.


Scalability
========================
pyDNMFk Scales from laptops to clusters. The library is convenient on a laptop. It can be installed easily  with conda or pip and extends the matrix decomposition from a single core to numerous cores across nodes.
pyDNMFk is efficient and has been tested on powerful servers across LANL and Oakridge scaling beyond 1000+ nodes.
This library facilitates the transition between single-machine to large scale cluster so as to enable users to both start simple and scale up when necessary.


Installation
========================

.. code-block:: console

   git clone https://github.com/lanl/pyDNMFk.git
   cd pyDNMFk
   conda create --name pyDNMFk python=3.7.1 openmpi mpi4py
   source activate pyDNMFk
   python setup.py install


Usage Example
========================
We provide a sample dataset that can be used for estimation of k:

.. code-block:: python

   '''Imports block'''

   import sys
   import pyDNMFk.config as config
   config.init(0)
   from pyDNMFk.pyDNMFk import *
   from pyDNMFk.data_io import *
   from pyDNMFk.dist_comm import *
   from scipy.io import loadmat
   from mpi4py import MPI
   comm = MPI.COMM_WORLD
   args = parse()


   '''parameters initialization block'''


   # Data Read here
   args.fpath = 'data/'
   args.fname = 'wtsi'
   args.ftype = 'mat'
   args.precision = np.float32

   #Distributed Comm config block
   p_r, p_c = 4, 1

   #NMF config block
   args.norm = 'kl'
   args.method = 'mu'
   args.init = 'nnsvd'
   args.itr = 5000
   args.verbose = True

   #Cluster config block
   args.start_k = 2
   args.end_k = 5
   args.sill_thr = 0.9

   #Data Write
   args.results_path = 'results/'


   '''Parameters prep block'''


   comms = MPI_comm(comm, p_r, p_c)
   comm1 = comms.comm
   rank = comm.rank
   size = comm.size
   args.size, args.rank, args.comm, args.p_r, args.p_c = size, rank, comms, p_r, p_c
   args.row_comm, args.col_comm, args.comm1 = comms.cart_1d_row(), comms.cart_1d_column(), comm1
   A_ij = data_read(args).read().astype(args.precision)

   nopt = PyNMFk(A_ij, factors=None, params=args).fit()
   print('Estimated k with NMFk is ',nopt)


Indices and tables
========================

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   modules


Indices and tables
========================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`