TELF.factorization.TriNMFk: NMFk with Automatic Determination of Latent Clusters and Patterns#
TriNMFk is a Non-negative Matrix Factorization module with the capability to do automatic model determination for both estimating the number of latent patterns (Wk
) and clusters (Hk
).
Example#
First generate synthetic data with pre-determined k. It can be either dense (np.ndarray
) or sparse matrix (scipy.sparse._csr.csr_matrix
). Here, we are using the provided scripts for matrix generation (located here):
import sys; sys.path.append("../../scripts/")
from generate_X import gen_trinmf_data
import matplotlib.pyplot as plt
kwkh=(5,3)
shape=(20,20)
data = gen_trinmf_data(shape=shape,
kwkh=kwkh,
factor_wh=(0.5, 1.0),
factor_S=1,
random_state=10)
X = data["X"]
Wtrue = data["W"]
Strue = data["S"]
Htrue = data["H"]
Initilize the model:
nmfk_params = {
"n_perturbs":64,
"n_iters":2000,
"epsilon":0.01,
"n_jobs":-1,
"init":"nnsvd",
"use_gpu":False,
"save_path":"../../results/",
"verbose":True,
"sill_thresh":0.8,
"nmf_method":"nmf_fro_mu",
"perturb_type":"uniform",
"calculate_error":True,
"pruned":True,
"predict_k":True,
"predict_k_method":"sill",
"transpose":False,
"mask":None,
"use_consensus_stopping":False,
"calculate_pac":True,
"consensus_mat":True,
"simple_plot":True,
"collect_output":True
}
tri_nmfk_params = {
"experiment_name":"TriNMFk",
"nmfk_params":nmfk_params,
"nmf_verbose":False,
"use_gpu":True,
"n_jobs":-1,
"mask":None,
"use_consensus_stopping":False,
"alpha":(0,0),
"n_iters":100,
"n_inits":10,
"pruned":False,
"transpose":False,
"verbose":True
}
from TELF.factorization import TriNMFk
model = TriNMFk(**tri_nmfk_params)
Perform NMFk first:
Ks = range(1,8,1)
note = "This the the NMFk portion of the TriNMFk method!"
results = model.fit_nmfk(X, Ks, note)
Select number of latent patterns Wk
(use the Silhouette score from the plots in ../../results
) and number of latent clusters Hk
(use the PAC score from the plots in ../../results
) from the NMFk results. Next perform TriNFMk
k1k2=(5,3)
tri_nmfk_results = model.fit_tri_nmfk(X, k1k2)
W = tri_nmfk_results["W"]
S = tri_nmfk_results["S"]
H = tri_nmfk_results["H"]
Available Functions#
|
TriNMFk is a Non-negative Matrix Factorization module with the capability to do automatic model determination for both estimating the number of latent patterns ( |
|
Factorize the input matrix |
|
Factorize the input matrix |
Module Contents#
© 2022. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.
- class TELF.factorization.TriNMFk.TriNMFk(experiment_name='TriNMFk', nmfk_params={}, save_path='TriNMFk', nmf_verbose=False, use_gpu=False, n_jobs=-1, mask=None, use_consensus_stopping=0, alpha=(0, 0), n_iters=100, n_inits=10, pruned=True, transpose=False, verbose=True)[source]#
Bases:
object
TriNMFk is a Non-negative Matrix Factorization module with the capability to do automatic model determination for both estimating the number of latent patterns (
Wk
) and clusters (Hk
).- Parameters:
experiment_name (str, optional) – Name used for the experiment. Default is “TriNMFk”.
nmfk_params (str, optional) – Parameters for NMFk. See documentation for NMFk for the options.
save_path (str, optional) – Used for save location when NMFk fit is not performed first, and TriNMFk fit is done.
nmf_verbose (bool, optional) – If True, shows progress in each NMF operation. The default is False.
use_gpu (bool, optional) – If True, uses GPU for operations. The default is True.
n_jobs (int, optional) – Number of parallel jobs. Use -1 to use all available resources. The default is 1.
mask (
np.ndarray
, optional) – Numpy array that points out the locations in input matrix that should be masked during factorization. The default is None.use_consensus_stopping (str, optional) – When not 0, uses Consensus matrices criteria for early stopping of NMF factorization. The default is 0.
alpha (tupl, optional) – Error rate used in bootstrap operation. Default is (0, 0).
n_iters (int, optional) – Number of NMF iterations. The default is 100.
n_inits (int, optional) – Number of matrix initilization for the bootstrap operation. The default is 10.
pruned (bool, optional) – When True, removes columns and rows from the input matrix that has only 0 values. The default is True.
transpose (bool, optional) – If True, transposes the input matrix before factorization. The default is False.
verbose (bool, optional) – If True, shows progress in each k. The default is False.
- Return type:
None.
- fit_nmfk(X, Ks, note='')[source]#
Factorize the input matrix
X
for the each given K value inKs
.- Parameters:
X (
np.ndarray
orscipy.sparse._csr.csr_matrix
matrix) – Input matrix to be factorized.Ks (list) –
List of K values to factorize the input matrix.
Example:
Ks=range(1, 10, 1)
.name (str, optional) – Name of the experiment. Default is “NMFk”.
note (str, optional) – Note for the experiment used in logs. Default is “”.
- Returns:
results – Resulting dict can include all the latent factors, plotting data, predicted latent factors, time took for factorization, and predicted k value depending on the settings specified in
nmfk_params
.If
get_plot_data=True
, results will include field forplot_data
.If
predict_k=True
, results will include field fork_predict
. This is an intiger for the automatically estimated number of latent factors.If
predict_k=True
andcollect_output=True
, results will include fields forW
andH
which are the latent factors in type ofnp.ndarray
.results will always include a field for
time
, that gives the total compute time.
- Return type:
dict
- fit_tri_nmfk(X, k1k2: tuple)[source]#
Factorize the input matrix
X
.after applying
fit_nmfk()
to select theWk
andHk
, to factorize the given matrix withk1k2=(Wk, Hk)
.- Parameters:
X (
np.ndarray
orscipy.sparse._csr.csr_matrix
matrix) – Input matrix to be factorized.k1k2 (tuple) – Tuple of
Wk
(number of latent patterns) andHk
(number of latent clusters), to factorize the matrixX
to. Example:Ks=range(4,3)
.
- Returns:
results – Resulting dict will include latent patterns
W
,H
, and mixing matrixS
along with the error from eachn_inits
.- Return type:
dict