TELF.factorization.TriNMFk: NMFk with Automatic Determination of Latent Clusters and Patterns#

TriNMFk is a Non-negative Matrix Factorization module with the capability to do automatic model determination for both estimating the number of latent patterns (Wk) and clusters (Hk).


First generate synthetic data with pre-determined k. It can be either dense (np.ndarray) or sparse matrix (scipy.sparse._csr.csr_matrix). Here, we are using the provided scripts for matrix generation (located here):

import sys; sys.path.append("../../scripts/")
from generate_X import gen_trinmf_data
import matplotlib.pyplot as plt

data = gen_trinmf_data(shape=shape,
                       factor_wh=(0.5, 1.0),

X = data["X"]
Wtrue = data["W"]
Strue = data["S"]
Htrue = data["H"]

Initilize the model:

nmfk_params = {

tri_nmfk_params = {

from TELF.factorization import TriNMFk
model = TriNMFk(**tri_nmfk_params)

Perform NMFk first:

Ks = range(1,8,1)
note = "This the the NMFk portion of the TriNMFk method!"
results = model.fit_nmfk(X, Ks, note)

Select number of latent patterns Wk (use the Silhouette score from the plots in ../../results) and number of latent clusters Hk (use the PAC score from the plots in ../../results) from the NMFk results. Next perform TriNFMk

tri_nmfk_results = model.fit_tri_nmfk(X, k1k2)
W = tri_nmfk_results["W"]
S = tri_nmfk_results["S"]
H = tri_nmfk_results["H"]

Available Functions#

TriNMFk.__init__([experiment_name, ...])

TriNMFk.fit_nmfk(X, Ks[, note])

Factorize the input matrix X for the each given K value in Ks.

TriNMFk.fit_tri_nmfk(X, k1k2)

Factorize the input matrix X.

Module Contents#

class TELF.factorization.TriNMFk.TriNMFk(experiment_name='TriNMFk', nmfk_params={}, save_path='TriNMFk', nmf_verbose=False, use_gpu=False, n_jobs=-1, mask=None, use_consensus_stopping=0, alpha=(0, 0), n_iters=100, n_inits=10, pruned=True, transpose=False, verbose=True)[source]#

Bases: object

TriNMFk is a Non-negative Matrix Factorization module with the capability to do automatic model determination for both estimating the number of latent patterns (Wk) and clusters (Hk).

  • experiment_name (str, optional) – Name used for the experiment. Default is “TriNMFk”.

  • nmfk_params (str, optional) – Parameters for NMFk. See documentation for NMFk for the options.

  • save_path (str, optional) – Used for save location when NMFk fit is not performed first, and TriNMFk fit is done.

  • nmf_verbose (bool, optional) – If True, shows progress in each NMF operation. The default is False.

  • use_gpu (bool, optional) – If True, uses GPU for operations. The default is True.

  • n_jobs (int, optional) – Number of parallel jobs. Use -1 to use all available resources. The default is 1.

  • mask (np.ndarray, optional) – Numpy array that points out the locations in input matrix that should be masked during factorization. The default is None.

  • use_consensus_stopping (str, optional) – When not 0, uses Consensus matrices criteria for early stopping of NMF factorization. The default is 0.

  • alpha (tupl, optional) – Error rate used in bootstrap operation. Default is (0, 0).

  • n_iters (int, optional) – Number of NMF iterations. The default is 100.

  • n_inits (int, optional) – Number of matrix initilization for the bootstrap operation. The default is 10.

  • pruned (bool, optional) – When True, removes columns and rows from the input matrix that has only 0 values. The default is True.

  • transpose (bool, optional) – If True, transposes the input matrix before factorization. The default is False.

  • verbose (bool, optional) – If True, shows progress in each k. The default is False.

fit_nmfk(X, Ks, note='')[source]#

Factorize the input matrix X for the each given K value in Ks.

  • X (np.ndarray or scipy.sparse._csr.csr_matrix matrix) – Input matrix to be factorized.

  • Ks (list) –

    List of K values to factorize the input matrix.

    Example: Ks=range(1, 10, 1).

  • name (str, optional) – Name of the experiment. Default is “NMFk”.

  • note (str, optional) – Note for the experiment used in logs. Default is “”.


results – Resulting dict can include all the latent factors, plotting data, predicted latent factors, time took for factorization, and predicted k value depending on the settings specified in nmfk_params.

  • If get_plot_data=True, results will include field for plot_data.

  • If predict_k=True, results will include field for k_predict. This is an intiger for the automatically estimated number of latent factors.

  • If predict_k=True and collect_output=True, results will include fields for W and H which are the latent factors in type of np.ndarray.

  • results will always include a field for time, that gives the total compute time.

fit_tri_nmfk(X, k1k2: tuple)[source]#

Factorize the input matrix X.

after applying fit_nmfk() to select the Wk and Hk, to factorize the given matrix with k1k2=(Wk, Hk).

  • X (np.ndarray or scipy.sparse._csr.csr_matrix matrix) – Input matrix to be factorized.

  • k1k2 (tuple) – Tuple of Wk (number of latent patterns) and Hk (number of latent clusters), to factorize the matrix X to. Example: Ks=range(4,3).


results – Resulting dict will include latent patterns W, H, and mixing matrix S along with the error from each n_inits.

