pyDNMFk package¶
Submodules¶
pyDNMFk.config module¶
pyDNMFk.data_generator module¶
-
class
pyDNMFk.data_generator.
data_generator
(args)[source]¶ Bases:
object
Generates synthetic data in distributed manner where each MPI process generates a chunk from the data parallelly. The W matrix is generated with gaussian distribution whereas the H matrix is random.
- argsclass
Class which comprises following attributes
- fpathstr
Directory path of file to be stored
- p_rint
Count of row processor in the cartesian grid
- p_cint
Count of column processor in the cartesian grid
- mint
row dimension of the data
- nint
Column dimension of the data
- kint
Feature count
-
determine_block_index_range_asymm
()[source]¶ Determines the start and end indices for the Data block for each rank
-
dist_fromfunction
(func, shape, pgrid, *args, unravel_index=<function unravel_index>, **kwargs)[source]¶ produces X_{i,j} = func(i,j) in a distributed manner, so that each processor has an array_split section of X according to the grid.
pyDNMFk.data_io module¶
-
class
pyDNMFk.data_io.
data_read
(args)[source]¶ Bases:
object
Class for reading data.
- argsclass
Class which comprises following attributes
- fpathstr
Directory path of file to be read
- pgridtuple
Cartesian grid configuration
- ftypestr
Type of data to read(mat/npy/csv/folder)
- fnamestr
Name of the file to read
comm (object): comm object for distributed read
-
data_partition
()[source]¶ This function divides the input matrix into chunks as specified by grid configuration.
Return n array of shape (nrows_i, ncols_i) where i is the index of each chunk. Sum_i^n ( nrows_i * ncols_i ) = arr.size
If arr is a 2D array, the returned array should look like n subblocks with each subblock preserving the "physical" layout of arr.
-
class
pyDNMFk.data_io.
data_write
(args)[source]¶ Bases:
object
Class for writing data/results.
args (class): class which comprises following attributes results_path (str): Directory path of file to write pgrid (tuple): Cartesian grid configuration ftype (str): Type of data to read(mat/npy/csv/folder) comm (object): comm object for distributed read
pyDNMFk.dist_clustering module¶
-
class
pyDNMFk.dist_clustering.
custom_clustering
(Wall, Hall, params)[source]¶ Bases:
object
Greedy algorithm to approximate a quadratic assignment problem to cluster vectors. Given p groups of k vectors, construct k clusters, each cluster containing a single vector from each of the p groups. This clustering approximation uses cos distances and mean centroids.
- W_allndarray
Order three tensor of shape m by k by p, where m is the ambient dimension of the vectors, k is the number of vectors in each group, and p is the number of groups of vectors.
- H_allndarray
Order three tensor of shape n by k by p, where n is the ambient dimension of the vectors, k is the number of vectors in each group, and p is the number of groups of vectors.
- paramsclass
Class object with communication parameters which comprises of grid information (p_r,p_c) , commincator (comm) and epsilon (eps).
-
dist_custom_clustering
(centroids=None, vb=0)[source]¶ Performs the distributed custom clustering
- centroidsndarray, optional
The m by k initialization of the centroids of the clusters. None corresponds to using the first slice, W_all[:,:,0], as the initial centroids. Defaults to None.
- vbbool, optional
Verbose to display intermediate results
- centroidsndarray
The m by k centroids of the clusters
- W_all :ndarray
Clustered organization of the vectors W_all
- H_allndarray
Clustered organization of the vectors H_all
- permute_orderlist
Indices of the permuted features
-
dist_silhouettes
()[source]¶ Computes the cosine distances silhouettes of a distributed clustering of vectors.
- silsndarray
The k by p array of silhouettes where sils[i,j] is the silhouette measure for the vector W_all[:,i,j]
-
fit
()[source]¶ Calls the sub routines to perform distributed custom clustering and compute silhouettes
- centroidsndarray
The m by k centroids of the clusters
- CentStdndarray
Absolute deviation of the features from the centroid
- W_allndarray
Clustered organization of the vectors W_all
- H_allndarray
Clustered organization of the vectors H_all
- S_avgndarray
mean Silhouette score
- permute_orderlist
Indices of the permuted features
pyDNMFk.dist_comm module¶
-
class
pyDNMFk.dist_comm.
MPI_comm
(comm, p_r, p_c)[source]¶ Bases:
object
Initialization of MPI communicator to construct the cartesian topology and sub communicators
- commobject
MPI communicator object
- p_rint
row processors count
- p_cint
column processors count
pyDNMFk.dist_nmf module¶
-
class
pyDNMFk.dist_nmf.
nmf_algorithms_1D
(A_ij, W_i, H_j, params=None)[source]¶ Bases:
object
Performs the distributed NMF operation along 1D cartesian grid
- A_ijndarray
Distributed Data
- W_indarray
Distributed factor W
- H_jndarray
Distributed factor H
- paramsclass
Class which comprises following attributes
- params.comm1object
Global Communicator
- params.kint
Rank for decomposition
- params.mint
Global dimensions m
- params.nint
Global dimensions n
- params.p_rint
Cartesian grid row count
- params.p_cint
Cartesian grid column count
- params.W_updatebool
flag to set W update True/False
- params.normstr
NMF norm to be minimized
- params.methodstr
NMF optimization method
- params.epsfloat
Epsilon value
-
FRO_BCD_update
(W_update=True, itr=1000)[source]¶ Frobenius norm minimization based BCD update of W and H parameter Function computes updated W and H parameter for each mpi rank
- W_update: bool
flag to enable/disable W update
self.W_i : ndarray (m/p_r X k) self.H_j : ndarray (k X n/p_c)
-
FRO_HALS_update
(W_update=True)[source]¶ Frobenius norm minimizatio based HALS update of W and H parameter Function computes updated W and H parameter for each mpi rank
- W_updatebool
Flag to enable/disable W_update
self.H_j : ndarray (k X n/p_r) self.W_i : ndarray (m/p_c X k)
-
FRO_HALS_update_H
()[source]¶ Frobenius norm minimization based HALS update of H parameter Function computes updated H parameter for each mpi rank
self : object
self.H_j : ndarray ( k X n/p_c)
-
FRO_HALS_update_W
()[source]¶ Frobenius norm minimization based HALS update of W parameter Function computes updated W parameter for each mpi rank
self : object
self.W_i : ndarray (m/p_r X k)
-
Fro_MU_update
(W_update=True)[source]¶ Frobenius norm based multiplicative update of W and H parameter Function computes updated W and H parameter for each mpi rank
self : object
self.H_ij : ndarray self.W_ij : ndarray
-
Fro_MU_update_H
()[source]¶ Frobenius norm based multiplicative update of H parameter Function computes updated H parameter for each mpi rank
self : object
self.H_j : ndarray
-
Fro_MU_update_W
()[source]¶ Frobenius norm based multiplicative update of W parameter Function computes updated H parameter for each mpi rank
self : object
self.W_i : ndarray
-
KL_MU_update
(W_update=True)[source]¶ KL divergence based multiplicative update of W and H parameter Function computes updated W and H parameter for each mpi rank
- W_updatebool
Flag to enable/disable W_update
self.H_j : ndarray (k X n/p_r) self.W_i : ndarray (m/p_c X k)
-
KL_MU_update_H
()[source]¶ KL divergence based multiplicative update of H parameter Function computes updated H parameter for each mpi rank
self : object
- self.H_jndarray
Distributed factor H of shape k X n/p_c
-
KL_MU_update_W
()[source]¶ KL divergence based multiplicative update of W parameter Function computes updated W parameter for each mpi rank
self : object
- self.W_indarray
Distributed factor W of shape m/p_r X k
-
global_gram
(A, p=1)[source]¶ Distributed gram computation
Computes the global gram operation of matrix A .. math:: A^TA
A : ndarray p : Processor count
A_TA_glob : ndarray
-
global_mm
(A, B, p=- 1)[source]¶ Distributed matrix multiplication
Computes the global matrix multiplication of matrix A and B .. math:: AB
A : ndarray B : ndarray p : processor count
AB_glob : ndarray
-
class
pyDNMFk.dist_nmf.
nmf_algorithms_2D
(A_ij, W_ij, H_ij, params=None)[source]¶ Bases:
object
Performs the distributed NMF operation along 2D cartesian grid
- A_ijndarray
Distributed Data
- W_ijndarray
Distributed factor W
- H_ijndarray
Distributed factor H
- paramsclass
Class which comprises following attributes
- params.comm1object
Global Communicator
- params.commobject
Modified communicator object
- params.kint
Rank for decomposition
- params.mint
Global dimensions m
- params.nint
Global dimensions n
- params.p_rint
Cartesian grid row count
- params.p_cint
Cartesian grid column count
- params.row_commobject
Sub communicator along row
- params.col_commobject
Sub communicator along columns
- params.W_updatebool
flag to set W update True/False
- params.normstr
NMF norm to be minimized
- params.methodstr
NMF optimization method
- params.epsfloat
Epsilon value
-
AH_glob
(H_ij=None)[source]¶ Distributed computation of AH^T
Computes the global matrix multiplication of matrix A and H .. math:: AH^T
A : ndarray H : ndarray
AH : ndarray
-
ATW_glob
()[source]¶ Distributed computation of W^TA
Computes the global matrix multiplication of matrix W and A .. math:: W^TA
W : ndarray A : ndarray
Atw : ndarray
-
FRO_BCD_update
(W_update=True, itr=1000)[source]¶ Frobenius norm minimization based BCD update of W and H parameter Function computes updated W and H parameter for each mpi rank
self : object
self.W_ij : ndarray (m/p X k)
self.H_ij : ndarray (k X n/p)
-
FRO_HALS_update
(W_update=True)[source]¶ Frobenius norm minimization based HALS update of W and H parameter Function computes updated W and H parameter for each mpi rank
self : object
self.W_ij : ndarray (m/p X k) self.H_ij : ndarray (k X n/p)
-
FRO_HALS_update_H
()[source]¶ Frobenius norm minimization based HALS update of H parameter Function computes updated H parameter for each mpi rank
self : object
self.H_ij : ndarray ( k X n/p)
-
FRO_HALS_update_W
()[source]¶ Frobenius norm minimization based HALS update of W parameter Function computes updated W parameter for each mpi rank
self : object
self.W_ij : ndarray (m/p X k)
-
Fro_MU_update
(W_update=True)[source]¶ Frobenius norm based multiplicative update of W and H parameter Function computes updated W and H parameter for each mpi rank
self : object
self.H_ij : ndarray self.W_ij : ndarray
-
Fro_MU_update_H
()[source]¶ Frobenius norm based multiplicative update of H parameter Function computes updated H parameter for each mpi rank
self : object
self.H_ij : ndarray
-
Fro_MU_update_W
()[source]¶ Frobenius norm based multiplicative update of W parameter Function computes updated H parameter for each mpi rank
self : object
self.W_ij : ndarray
-
KL_MU_update
(W_update=True)[source]¶ KL divergence based multiplicative update of W and H parameter Function computes updated W and H parameter for each mpi rank
self : object
self.H_ij : ndarray (k X n/p) self.W_ij : ndarray (m/p X k)
-
KL_MU_update_H
()[source]¶ Frobenius norm based multiplicative update of H parameter Function computes updated H parameter for each mpi rank
self : object
- self.H_ijndarray
Distributed factor H of shape k X n/p
-
KL_MU_update_W
()[source]¶ KL divergence based multiplicative update of W parameter Function computes updated W parameter for each mpi rank
self : object
- self.W_ijndarray
Distributed factor W of shape m/p X k
-
UHT_glob
()[source]¶ Distributed computation of UH^T
Computes the global matrix multiplication of matrix W and U for KL .. math:: UH^T
W : ndarray H : ndarray A : ndarray
UHT : ndarray
-
WTU_glob
()[source]¶ Distributed computation of W^TU
Computes the global matrix multiplication of matrix W and U for KL .. math:: W^TU
W : ndarray H : ndarray A : ndarray
WTU : ndarray
-
gather_W_H
(gW=True, gH=True)[source]¶ Gathers W and H factors across cartesian groups i.e H_ij -> H_j if gH=True and W_ij -> W_i and gW=True
gW : boolen gH : boolen
self.H_j : ndarray self.W_i : ndarray
-
global_gram
(A)[source]¶ Distributed gram computation
Computes the global gram operation of matrix A .. math:: A^TA
A : ndarray
A_TA_glob : ndarray
pyDNMFk.dist_svd module¶
-
class
pyDNMFk.dist_svd.
DistSVD
(args, A)[source]¶ Bases:
object
Distributed Computation of SVD along 1D distribution of the data. Only U or V is distributed based on data size.
- Andarray
Distributed Data
- argsclass
Class which comprises following attributes
- args.globalmint
Global row dimensions of A
- args.globalnint
Global column dimension of A
- args.kint(optional)
Rank for decomposition
- args.p_rint
Cartesian grid row count
- args.p_cint
Cartesian grid column count
- args.seedint(optional)
Set the random seed
- args.commobject
comm object for distributed read
- args.epsfloat
Epsilon value
-
nnsvd
(flag=1, verbose=1)[source]¶ Computes the distributed Non-Negative SVD(NNSVD) components from the computed SVD factors.
- flagbool, optional
Computes nnSVD factors with different configurations
- verbosebool, optional
Verbose to set returned errors. If true returns SVD and NNSVD reconstruction errors.
- W :ndarray
Non-negative factor W of shape (m/p_r,k)
- Hndarray
Non-negative factor H of shape (k,n/p_c)
- errordictionary (optional)
Dictinoary of reconstruction error for svd and nnsvd
-
rel_error
(U, S, V)[source]¶ Computes the relative error between the reconstructed data with factors vs original data
pyDNMFk.plot_results module¶
-
pyDNMFk.plot_results.
box_plot
(dat, respath)[source]¶ Plots the boxplot from the given data and saves the results
-
pyDNMFk.plot_results.
plot_err
(err)[source]¶ Plots the relative error for NMF decomposition as a function of number of iterations
-
pyDNMFk.plot_results.
plot_results
(startProcess, endProcess, RECON, RECON1, SILL_MIN, out_put, name)[source]¶ Plots the relative error and Silhouette results for estimation of k
-
pyDNMFk.plot_results.
plot_timing_stats
(fpath, respath)[source]¶ Plots the timing stats for the MPI operation. fpath: Stats data path respath: Path to save graph
pyDNMFk.pyDNMF module¶
-
class
pyDNMFk.pyDNMF.
PyNMF
(A_ij, factors=None, save_factors=False, params=None)[source]¶ Bases:
object
Performs the distributed NMF decomposition of given matrix X into factors W and H
- A_ijndarray
Distributed Data
- factorstuple (optional)
Distributed factors W and H
- paramsclass
Class which comprises following attributes
- params.initstr
NMF initialization(rand/nnsvd)
- params.comm1object
Global Communicator
- params.commobject
Modified communicator object
- params.kint
Rank for decomposition
- params.mint
Global dimensions m
- params.nint
Global dimensions n
- params.p_rint
Cartesian grid row count
- params.p_cint
Cartesian grid column count
- params.row_commobject
Sub communicator along row
- params.col_commobject
Sub communicator along columns
- params.W_updatebool
flag to set W update True/False
- params.normstr
NMF norm to be minimized
- params.methodstr
NMF optimization method
- params.epsfloat
Epsilon value
- params.verbosebool
Flag to enable/disable display results
- params.save_factorsbool
Flag to enable/disable saving computed factors
-
compute_global_dim
()[source]¶ Computes global dimensions m and n from given chunk sizes for any grid configuration
pyDNMFk.pyDNMFk module¶
-
class
pyDNMFk.pyDNMFk.
PyNMFk
(A_ij, factors=None, params=None)[source]¶ Bases:
object
Performs the distributed NMF decomposition with custom clustering for estimating hidden factors k
- A_ijndarray
Distributed Data
- factorstuple (optional)
Distributed factors W and H
- paramsclass
Class which comprises following attributes
- params.initstr
NMF initialization(rand/nnsvd)
- params.comm1object
Global Communicator
- params.commobject
Modified communicator object
- params.kint
Rank for decomposition
- params.mint
Global dimensions m
- params.nint
Global dimensions n
- params.p_rint
Cartesian grid row count
- params.p_cint
Cartesian grid column count
- params.row_commobject
Sub communicator along row
- params.col_commobject
Sub communicator along columns
- params.W_updatebool
flag to set W update True/False
- params.normstr
NMF norm to be minimized
- params.methodstr
NMF optimization method
- params.epsfloat
Epsilon value
- params.verbosebool
Flag to enable/disable display results
- params.save_factorsbool
Flag to enable/disable saving computed factors
- params.perturbationsint
Number of Perturbations for clustering
- params.noise_varfloat
Set noise variance for perturbing the data
- params.sill_thrfloat
Set the sillhouette threshold for estimating K with p-test
- params.start_kint
Starting range for Feature search K
- params.end_kint
Ending range for Feature search K
-
fit
()[source]¶ Calls the sub routines to perform distributed NMF decomposition and then custom clustering to estimate k
- noptint
Estimated value of latent features
-
class
pyDNMFk.pyDNMFk.
sample
(data, noise_var, method, seed=None)[source]¶ Bases:
object
Generates perturbed version of data based on sampling distribution.
- datandarray
Array of which to find a perturbation.
- noise_varfloat
The perturbation amount.
- methodstr
Method for sampling (uniform/poisson)
- seedfloat(optional)
Set seed for random data generation
-
fit
()[source]¶ Calls the sub routines to perform resampling on data
- X_perndarry
Perturbed version of data
pyDNMFk.utils module¶
-
class
pyDNMFk.utils.
comm_timing
[source]¶ Bases:
object
Decorator class for computing timing for MPI operations. The class uses the global variables flag and time initialized in config file and updates them for each call dynamically.
- flag: bool
if Set true, enables the decorator to compute the timings.
- time: dict
Dictionary to store timing for each function calls
-
class
pyDNMFk.utils.
data_operations
(data)[source]¶ Bases:
object
Performs various operations on the data
- datandarray
Data to operate on
-
class
pyDNMFk.utils.
determine_block_params
(comm, pgrid, shape)[source]¶ Bases:
object
Computes the parameters for each chunk to be read by MPI process
- commobject
MPI communicator object
- pgridtuple
Cartesian grid configuration
- shapetuple
Data shape
-
pyDNMFk.utils.
norm
(X, comm, norm=2, axis=None, p=- 1)[source]¶ Compute the data norm
- Xndarray
Data to operate on
- commobject
MPI communicator object
- normint
type of norm to be computed
- axisint
axis of array for the norm to be computed along
- p: int
Processor count
- normfloat
Norm of the given data X
-
class
pyDNMFk.utils.
parse
[source]¶ Bases:
object
Define a class parse which is used for adding attributes