pyCP_APR.applications package#
Submodules#
pyCP_APR.applications.ktensor_utils module#
ktensor_utils.py contains the utility functions for KRUSKAL tensor M.
@author: Maksim Ekin Eren
- pyCP_APR.applications.ktensor_utils.get_X_hat(components, indices)[source]#
Calculate X hat from KRUSKAL tensor M, given the non-zero indicies.
components: KRUSKAL tensor components
indices: non-zero coordinates
- Parameters:
components (dict) -- KRUSKAL Tensor M in dict format.
indices (array) -- Array of indices in X hat.
- Returns:
lambdas -- Array of lambdas in X calculated from M using the indices.
- Return type:
array
pyCP_APR.applications.sptensor_utils module#
sptensor_utils.py contains the utility functions for tensor X.
@author: Maksim Ekin Eren
- pyCP_APR.applications.sptensor_utils.get_X_dim_size(X)[source]#
Returns the shape of X. i.e. size of each mode.
- Parameters:
X (array) -- Tensor X in COO format. i.e. X is the coordinates of the non-zero values.
- Returns:
size -- Tensor X shape.
- Return type:
int
- pyCP_APR.applications.sptensor_utils.get_X_dimensions(X)[source]#
Returns the number of dimensions that tensor X has.
- Parameters:
X (array) -- Tensor X in COO format. i.e. X is the coordinates of the non-zero values.
- Returns:
dimensions -- Number of dimensions that X has.
- Return type:
int
- pyCP_APR.applications.sptensor_utils.get_X_num_non_zeros(X)[source]#
Calculates the number of non-zero elements in X.
- Parameters:
X (array) -- Tensor X in COO format. i.e. X is the coordinates of the non-zero values.
- Returns:
non-zeros -- Number of non-zeros in X.
- Return type:
int
pyCP_APR.applications.stat_utils module#
stat_utils.py contains the tensor statistic utilities.
@author: Maksim Ekin Eren
- pyCP_APR.applications.stat_utils.mrr_fuse_ranks(x, weights=None, axis=0, k=60.0, y=None)[source]#
Calculates Mean Reciprocal Rank (MRR).
Under development.
- Parameters:
x (array) -- Tensor x.
weights (array, optional) -- Array of weights. The default is None.
axis (int, optional) -- Dimension number. The default is 0.
k (int, optional) -- Top k. The default is 60..
y (array, optional) -- Labels. The default is None.
- Returns:
result -- MRR score.
- Return type:
float
pyCP_APR.applications.tensor_anomaly_detection module#
tensor_anomaly_detection.py performs p-value scoring over the tensor decomposition, i.e. the KRUSKAL tensor M. The calculated p-values are used to detect anomalies.
This method was introduced by Eren et al. in [1].
CyberToaster, Project 1, Summer 2020
Los Alamos National Laboratory
Anomaly detection using Tensors and their Decompositions.
Student: Maksim E. Eren
Primary Mentor: Juston Moore
Secondary Mentors: Boian Alexandrov and Patrick Avery
References
[1] M. E. Eren, J. S. Moore and B. S. Alexandro, "Multi-Dimensional Anomalous Entity Detection via Poisson Tensor Factorization," 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), 2020, pp. 1-6, doi: 10.1109/ISI49825.2020.9280524.
@author: Maksim Ekin Eren, Juston S. Moore
- class pyCP_APR.applications.tensor_anomaly_detection.PoissonTensorAnomaly(dimensions={}, weights=[], objective='p_value', lambda_method='single_tensor', p_value_fusion_index=[0], ensemble_dimensions={}, ensemble_weights=[], ensemble_significance=[0.1, 0.9], mode_weights=[1], ignore_dimensions_indx=[])[source]#
Bases:
object
Anomaly detection using Poisson Distribution and Canonical Polyadic (CP) with Alternating Poisson Regression tensor decomposition (CP-APR).
Componenets of the CP-APR used to calculate the p-values for each instance through Poisson cumulative distribution function (cdf).
p-values are then used to determine if the event is an anomaly. Lower p-values are more anomalous.
v2: Utilizes Numpy vectorization for the calculations.
References:
1) Chi, Eric C. and Tamara G. Kolda. “On Tensors, Sparsity, and Nonnegative Factorizations.” SIAM J. Matrix Anal. Appl. 33 (2012): 1272-1299.
2) Turcotte, Melissa J. M. et al. “Unified Host and Network Data Set. ” ArXiv abs/1708.07518 (2017): n. pag.
3) Wikipedia contributors. "Poisson distribution." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 29 Jun. 2020. Web. 6 Jul. 2020.
Initilize the anomaly detector class.
- Parameters:
dimensions (dict, required) -- Components of the KRUSKAL Tensor Decomposition. The default is dict.n Each element is a dimension (factors of a component) and each dimension has (nxK) elements for that factor for rank K.
weights (list, required) -- Weights of each component of parameter dimensions. The default is list.
objective (string, optional) -- What to calculate.n Options: p_value, p_value_fusion_harmonic, p_value_fusion_harmonic_observed, p_value_fusion_chi2, p_value_fusion_chi2_observed, p_value_fusion_arithmetic, log_likelihood The default is 'p_value'.
lambda_method (string, optional) -- How to calculate lambda.n If 'single_tensor', it will use single ktensor passed in dimensions when calculating lambda.n If 'ensemble', it will use two ktensors where parameter dimensions is a K>=1 rank tensor with lambda weight ensemble_significance[0] and parameter ensemble_dimensions is a ktensor with K>1 rank tensor with lambda weight ensemble_significance[1]. The default is 'single_tensor'.
p_value_fusion_index (list) -- Index to fix, or calculate the p-value fusions. Only used when objective is set to p_value_fusion. The default is [0].
ensemble_dimensions (dict, optional) -- Components of the KRUSKAL Tensor Decomposition.n Each element is a dimension (factors of a component) and each dimension has (nxK) elements for that factor for rank K.n This is the second ktensor dimension passed. It will be used if lambda_method is set to 'ensemble'. Its lambda weight is ensemble_significance[1]. The default is dict().
ensemble_weights (list, optional) -- Weights of each component of ensemble_dimensions. The default is list(). Only used if lambda_method is 'ensemble'.
ensemble_significance (list, optional) -- lambda weight of each ktensor when using 'ensemble' lambda_method.n Weight of dimensions: ensemble_significance[0]n. Weight of ensemble_dimensions: ensemble_significance[1]n The default is [0.1, 0.9].
mode_weights (list, optional) -- Weight of each dimension.n The default is [1].
ignore_dimensions_indx (list, optional) -- If any dimension in latent factors should be ignored when calculating the lambdas.n The default is [].
- predict(coords, values, from_matlab=False)[source]#
Get the scores using the KRUSKAL components given the non-zero coordinates and values and the objective.
- Parameters:
coords (list of list) -- Coordinates of the non-zero elements within the sparse tensor.
values (list) -- Non-zero values that are in the sparse tensor.
from_matlab (bool) --
Set True if need to substract 1 to the coordinates, since matlab starts at 1.
The default is False.
- Returns:
prediction -- Dictionary of calculated objective.
- Return type:
dict
pyCP_APR.applications.tensor_anomaly_detection_v2 module#
tensor_anomaly_detection_v2.py performs p-value scoring over the tensor decomposition, i.e. the KRUSKAL tensor M. The calculated p-values are used to detect anomalies.
This method was introduced by Eren et al. in [1].
The second version performs faster calculation of the inner products of the components to extract the lambdas.
This version also provides dimension fusion methods for lambda calculations.
References
[1] M. E. Eren, J. S. Moore and B. S. Alexandro, "Multi-Dimensional Anomalous Entity Detection via Poisson Tensor Factorization," 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), 2020, pp. 1-6, doi: 10.1109/ISI49825.2020.9280524.
@author: Juston S. Moore, Maksim Ekin Eren
- class pyCP_APR.applications.tensor_anomaly_detection_v2.PoissonTensorAnomaly_v2(components, indicies, tensor_weights=[1])[source]#
Bases:
object
Initilize the anomaly detection class.
Calculates the lambdas, and obtains tensor information.
- Parameters:
components (dict) -- KRUSKAL Tensor M in dict format.
indicies (array) -- Non-zero coordinates.
tensor_weights (list, optional) --
Weight of each lambda for the tensors.
Used only when ensemble of tensors used in lambda calculations. The default is [1].
- get_dimension_fusion_scores(axis_map, y_true)[source]#
Calculates the prediction scores given fuzed lambdas and the true labels y.
Fusion is performed for the dimension in axis_map.
- Parameters:
axis_map (list) -- Which dimensions to fuse.
y_true (list) -- List of true labels for each entry.
- Returns:
df -- Fusion scores.
- Return type:
Pandas DataFrame
Module contents#
2021. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.