TELF.factorization.HNMFk: Hierarchical Non-negative Matrix Factorization with Automatic Model Determination#
Hierarchical Non-negative matrix factorization with automatic model determination with custom settings including missing value prediction. HNMFk has HPC and
Available Functions#
|
HNMFk is a Hierarchical Non-negative Matrix Factorization module with the capability to do automatic model determination. |
|
Factorize the input matrix |
Module Contents#
job-schedule - 200 job-complete - 300 signal-exit - 400
- class TELF.factorization.HNMFk.HNMFk(nmfk_params=[{}], cluster_on='H', depth=1, sample_thresh=-1, Ks_deep_min=1, Ks_deep_max=None, Ks_deep_step=1, K2=False, experiment_name='HNMFk_Output', generate_X_callback=None, n_nodes=1, verbose=True, comm_buff_size=10000000, random_identifiers=False)[source]#
Bases:
object
HNMFk is a Hierarchical Non-negative Matrix Factorization module with the capability to do automatic model determination.
- Parameters:
nmfk_params (list of dicts, optional) –
We can specify NMFk parameters for each depth, or use same for all depth.
If there is single items in
nmfk_params
, HMMFk will use the same NMFk parameters for all depths.When using for each depth, append to the list. For example, [nmfk_params0, nmfk_params1, nmfk_params2] for depth of 2 The default is
[{}]
, which defaults to NMFk with defaults with requiredparams["collect_output"] = False
,params["save_output"] = True
, andparams["predict_k"] = True
whenK2=False
.cluster_on (str, optional) – Where to perform clustering, can be W or H. Ff W, row of X should be samples, and if H, columns of X should be samples. The default is “H”.
depth (int, optional) – How deep to go in each topic after root node. if -1, it goes until samples cannot be seperated further. The default is 1.
sample_thresh (int, optional) – Stopping criteria for num of samples in the cluster. When -1, this criteria is not used. The default is -1.
Ks_deep_min (int, optional) – After first nmfk, when selecting Ks search range, minimum k to start. The default is 1.
Ks_deep_max (int, optinal) –
After first nmfk, when selecting Ks search range, maximum k to try.
When None, maximum k will be same as k selected for parent node.
The default is None.
Ks_deep_step (int, optional) – After first nmfk, when selecting Ks search range, k step size. The default is 1.
K2 (bool, optional) – If K2=True, decomposition is done only for k=2 instead of finding and predicting the number of stable latent features. The default is False.
experiment_name (str, optional) – Where to save the results.
generate_X_callback (object, optional) –
This can be used to re-generate the data matrix X before each NMFk operation. When not used, slice of original X is taken, which is equal to serial decomposition.
generate_X_callback
object should be a class withdef __call__(original_indices)
defined so thatnew_X, save_at_node=generate_X_callback(original_indices)
can be done.original_indices
hyper-parameter is the indices of samples (columns of original X when clustering on H).Here
save_at_node
is a dictionary that can be used to save additional information in each node’suser_node_data
variable. The default is None.n_nodes (int, optional) – Number of HPC nodes. The default is 1.
verbose (bool, optional) – If True, it prints progress. The default is True.
random_identifiers (bool, optional) – If True, model will use randomly generated strings as the identifiers of the nodes. Otherwise, it will use the k for ancestry naming convention.
- Return type:
None.
- fit(X, Ks, from_checkpoint=False, save_checkpoint=True)[source]#
Factorize the input matrix
X
for the each given K value inKs
.- Parameters:
X (
np.ndarray
orscipy.sparse._csr.csr_matrix
matrix) – Input matrix to be factorized.Ks (list) –
List of K values to factorize the input matrix.
Example:
Ks=range(1, 10, 1)
.from_checkpoint (bool, optional) – If True, it continues the process from the checkpoint. The default is False.
save_checkpoint (bool, optional) – If True, it saves checkpoints. The default is True.
- Return type:
None
- get_node()[source]#
Graph iterator. Returns the current node.
This operation is online, only one node is kept in the memory at a time.
- Returns:
data – Dictionary format of node.
- Return type:
dict
- go_to_children(idx: int)[source]#
Graph iterator. Goes to the child node specified by index.
This operation is online, only one node is kept in the memory at a time.
- Parameters:
idx (int) – Child index.
- Returns:
data – Dictionary format of node.
- Return type:
dict
- go_to_node(name: str)[source]#
Graph iterator. Goes to node specified by name (node.node_name).
This operation is online, only one node is kept in the memory at a time.
- Parameters:
name (str) – Name of the node
- Returns:
data – Dictionary format of node.
- Return type:
dict
- go_to_parent()[source]#
Graph iterator. Goes to the parent of current node.
This operation is online, only one node is kept in the memory at a time.
- Returns:
data – Dictionary format of node.
- Return type:
dict
- go_to_root()[source]#
Graph iterator. Goes to root node.
This operation is online, only one node is kept in the memory at a time.
- Returns:
data – Dictionary format of node.
- Return type:
dict
- class TELF.factorization.HNMFk.Node(node_name: str, depth: int, parent_topic: int, parent_node_k: int, W: ndarray, H: ndarray, k: int, parent_node_name: str, child_node_names: list, original_indices: ndarray, num_samples: int, leaf: bool, user_node_data: dict, cluster_indices_in_parent: list, node_save_path: str, parent_node_save_path: str, parent_node_factors_path: str, exception: bool, signature: array, probabilities: array, centroids: array, factors_path: str)[source]#
Bases:
object