TELF.factorization.TriNMFk: NMFk with Automatic Determination of Latent Clusters and Patterns#

TriNMFk is a Non-negative Matrix Factorization module with the capability to do automatic model determination for both estimating the number of latent patterns (Wk) and clusters (Hk).

Example#

First generate synthetic data with pre-determined k. It can be either dense (np.ndarray) or sparse matrix (scipy.sparse._csr.csr_matrix). Here, we are using the provided scripts for matrix generation (located here):

import sys; sys.path.append("../../scripts/")
from generate_X import gen_trinmf_data
import matplotlib.pyplot as plt

kwkh=(5,3)
shape=(20,20)
data = gen_trinmf_data(shape=shape,
                       kwkh=kwkh,
                       factor_wh=(0.5, 1.0),
                       factor_S=1,
                       random_state=10)

X = data["X"]
Wtrue = data["W"]
Strue = data["S"]
Htrue = data["H"]

Initilize the model:

nmfk_params = {
    "n_perturbs":64,
    "n_iters":2000,
    "epsilon":0.01,
    "n_jobs":-1,
    "init":"nnsvd",
    "use_gpu":False,
    "save_path":"../../results/",
    "verbose":True,
    "sill_thresh":0.8,
    "nmf_method":"nmf_fro_mu",
    "perturb_type":"uniform",
    "calculate_error":True,
    "pruned":True,
    "predict_k":True,
    "predict_k_method":"sill",
    "transpose":False,
    "mask":None,
    "use_consensus_stopping":False,
    "calculate_pac":True,
    "consensus_mat":True,
    "simple_plot":True,
    "collect_output":True
}

tri_nmfk_params = {
    "experiment_name":"TriNMFk",
    "nmfk_params":nmfk_params,
    "nmf_verbose":False,
    "use_gpu":True,
    "n_jobs":-1,
    "mask":None,
    "use_consensus_stopping":False,
    "alpha":(0,0),
    "n_iters":100,
    "n_inits":10,
    "pruned":False,
    "transpose":False,
    "verbose":True
}

from TELF.factorization import TriNMFk
model = TriNMFk(**tri_nmfk_params)

Perform NMFk first:

Ks = range(1,8,1)
note = "This the the NMFk portion of the TriNMFk method!"
results = model.fit_nmfk(X, Ks, note)

Select number of latent patterns Wk (use the Silhouette score from the plots in ../../results) and number of latent clusters Hk (use the PAC score from the plots in ../../results) from the NMFk results. Next perform TriNFMk

k1k2=(5,3)
tri_nmfk_results = model.fit_tri_nmfk(X, k1k2)
W = tri_nmfk_results["W"]
S = tri_nmfk_results["S"]
H = tri_nmfk_results["H"]

Available Functions#

TriNMFk.__init__([experiment_name, ...])

TriNMFk is a Non-negative Matrix Factorization module with the capability to do automatic model determination for both estimating the number of latent patterns (Wk) and clusters (Hk).

TriNMFk.fit_nmfk(X, Ks[, note])

Factorize the input matrix X for the each given K value in Ks.

TriNMFk.fit_tri_nmfk(X, k1k2)

Factorize the input matrix X.

Module Contents#

© 2022. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

class TELF.factorization.TriNMFk.TriNMFk(experiment_name='TriNMFk', nmfk_params={}, save_path='TriNMFk', nmf_verbose=False, use_gpu=False, n_jobs=-1, mask=None, use_consensus_stopping=0, alpha=(0, 0), n_iters=100, n_inits=10, pruned=True, transpose=False, verbose=True)[source]#

Bases: object

TriNMFk is a Non-negative Matrix Factorization module with the capability to do automatic model determination for both estimating the number of latent patterns (Wk) and clusters (Hk).

Parameters:
  • experiment_name (str, optional) – Name used for the experiment. Default is “TriNMFk”.

  • nmfk_params (str, optional) – Parameters for NMFk. See documentation for NMFk for the options.

  • save_path (str, optional) – Used for save location when NMFk fit is not performed first, and TriNMFk fit is done.

  • nmf_verbose (bool, optional) – If True, shows progress in each NMF operation. The default is False.

  • use_gpu (bool, optional) – If True, uses GPU for operations. The default is True.

  • n_jobs (int, optional) – Number of parallel jobs. Use -1 to use all available resources. The default is 1.

  • mask (np.ndarray, optional) – Numpy array that points out the locations in input matrix that should be masked during factorization. The default is None.

  • use_consensus_stopping (str, optional) – When not 0, uses Consensus matrices criteria for early stopping of NMF factorization. The default is 0.

  • alpha (tupl, optional) – Error rate used in bootstrap operation. Default is (0, 0).

  • n_iters (int, optional) – Number of NMF iterations. The default is 100.

  • n_inits (int, optional) – Number of matrix initilization for the bootstrap operation. The default is 10.

  • pruned (bool, optional) – When True, removes columns and rows from the input matrix that has only 0 values. The default is True.

  • transpose (bool, optional) – If True, transposes the input matrix before factorization. The default is False.

  • verbose (bool, optional) – If True, shows progress in each k. The default is False.

Return type:

None.

fit_nmfk(X, Ks, note='')[source]#

Factorize the input matrix X for the each given K value in Ks.

Parameters:
  • X (np.ndarray or scipy.sparse._csr.csr_matrix matrix) – Input matrix to be factorized.

  • Ks (list) –

    List of K values to factorize the input matrix.

    Example: Ks=range(1, 10, 1).

  • name (str, optional) – Name of the experiment. Default is “NMFk”.

  • note (str, optional) – Note for the experiment used in logs. Default is “”.

Returns:

results – Resulting dict can include all the latent factors, plotting data, predicted latent factors, time took for factorization, and predicted k value depending on the settings specified in nmfk_params.

  • If get_plot_data=True, results will include field for plot_data.

  • If predict_k=True, results will include field for k_predict. This is an intiger for the automatically estimated number of latent factors.

  • If predict_k=True and collect_output=True, results will include fields for W and H which are the latent factors in type of np.ndarray.

  • results will always include a field for time, that gives the total compute time.

Return type:

dict

fit_tri_nmfk(X, k1k2: tuple)[source]#

Factorize the input matrix X.

after applying fit_nmfk() to select the Wk and Hk, to factorize the given matrix with k1k2=(Wk, Hk).

Parameters:
  • X (np.ndarray or scipy.sparse._csr.csr_matrix matrix) – Input matrix to be factorized.

  • k1k2 (tuple) – Tuple of Wk (number of latent patterns) and Hk (number of latent clusters), to factorize the matrix X to. Example: Ks=range(4,3).

Returns:

results – Resulting dict will include latent patterns W, H, and mixing matrix S along with the error from each n_inits.

Return type:

dict