pyCP_APR.CP_APR API#
pyCP_APR.pyCP_APR module#
|
Initilize the pyCP_APR.CP_APR class. |
|
Takes the decomposition of sparse or dense tensor X and returns the KRUSKAL tensor M. |
|
Given the sparse test tensor entries (or indicies, i.e. coordinates of non-zero values), calculate the Poisson lambda values. |
|
Calculates the probabilities of the test tensor entries. |
|
The function call that can be used for classification of anomalies after fitting the tensor. |
The function call that returns the model parameters in a dictionary where a key is the model variable name and the value is its current value. |
|
The function call that returns the objective value of the model. |
|
The function call that returns the predictions. |
|
|
Sets the model parameters. |
pyCP_APR.py is the Scikit-learn like API for interacting with the CP-APR algorithm. The pyCP_APR.CP_APR wraps both the Numpy and PyTorch backend. pyCP_APR.CP_APR also includes API calls for anomaly detection utilities. Sparse tensor entries are scored by calculating p-values over the fitted model, where the lower p-value scores are an indicator for anomaly.
The fitted model (or factorized tensor, i.e. the KRUSKAL tensor M) during the training time describes the normal or the expected behavior which follows the Poisson distribution. We say that the entries of the tensor in the test set are drawn from the same distribution as the factorized training tensor M. Using M, we calculate how likely the entries of the test tensor to occur given what was expected. This methodology was introduced by Eren et al. in [1].
Some code comments are borrowed from the original implementation of CP-APR in [2-5].
References
[1] M. E. Eren, J. S. Moore and B. S. Alexandrov, "Multi-Dimensional Anomalous Entity Detection via Poisson Tensor Factorization," 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), 2020, pp. 1-6, doi: 10.1109/ISI49825.2020.9280524.
[2] General software, latest release: Brett W. Bader, Tamara G. Kolda and others, Tensor Toolbox for MATLAB, Version 3.2.1, www.tensortoolbox.org, April 5, 2021.
[3] Dense tensors: B. W. Bader and T. G. Kolda, Algorithm 862: MATLAB Tensor Classes for Fast Algorithm Prototyping, ACM Trans. Mathematical Software, 32(4):635-653, 2006, http://dx.doi.org/10.1145/1186785.1186794.
[4] Sparse, Kruskal, and Tucker tensors: B. W. Bader and T. G. Kolda, Efficient MATLAB Computations with Sparse and Factored Tensors, SIAM J. Scientific Computing, 30(1):205-231, 2007, http://dx.doi.org/10.1137/060676489.
[5] Chi, E.C. and Kolda, T.G., 2012. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4), pp.1272-1299.
@author Maksim Ekin Eren
- class pyCP_APR.pyCP_APR.CP_APR(**parameters)[source]#
Bases:
object
Initilize the pyCP_APR.CP_APR class. pyCP_APR.CP_APR class is the wrapper for the CP-APR algorithm's Python implementation with both Numpy and PyTorch backend.
- Parameters:
method (string, optional) --
Specifies which backend to use when running CP-APR, and sets the model (i.e. pyCP_APR.CP_APR.model) accordingly.
method='torch'
ormethod='pytorch'
will use PyTorch and enable GPU utilization.method='numpy'
will use Numpy backend.Default is
method='torch'
.Warning
method='torch'
ormethod='pytorch'
only supports sparse tensors in COO format.method='numpy'
supports both sparse (COO format) & dense tensors.
epsilon (float, optional) -- Prevents zero division. Default is 1e-10.
kappa (float, optional) -- Fix slackness level. Default is 1e-2.
kappa_tol (float, optional) -- Tolerance on slackness level. Default is 1e-10.
max_inner_iters (int, optional) -- Number of inner iterations per epoch. Default is 10.
follow_M (bool, optional) -- Saves M on each iteration if
True
. The default is False.n_iters (int, optional) -- Number of iterations during optimization or epoch. Default is 1000.
print_inner_itn (int, optional) -- Print every n inner iterations. Does not print if 0. Default is 0.
verbose (int, optional) -- Print every n epoch, or
n_iters
. Does not print if 0. Default is 10.stoptime (float, optional) -- Number of seconds before early stopping. Default is 1e6.
tol (float, optional) -- KKT violations tolerance. Default is 1e-4.
random_state (int, optional) -- Random seed for the initial M (KRUSKAL Tensor). Default is 42.
device (string, optional) --
Specifies CPU or GPU utilization for factorizing the tensor.
device='cpu'
to use PyTorch with CPU.device='gpu'
to use PyTorch with GPU. Only if a CUDA device is available.Default is
device='cpu'
.Warning
Only used when
method='torch'
ormethod='pytorch'
.device_num (int or string, optional) --
Which GPU to use to compute the KRUSKAL tensor M.
Default is
device_num=0
.Warning
Only used when
method='torch'
anddevice='gpu'
.return_type (string, optional) --
The return type for the final KRUSKAL tensor M.
return_type='torch'
keep as torch tensors.return_type='numpy'
convert to Numpy arrays and transfer the tensor back to CPU if GPU was used.Default is
return_type='numpy'
.Warning
Only used when
method='torch'
ormethod='pytorch'
.dtype --
Type to be used in torch tensors.
Default is 'torch.DoubleTensor' when
device='cpu'
. Default is 'torch.cuda.DoubleTensor' whendevice='gpu'
.
Note
Example
Using the PyTorch backend on GPU 0:
from pyCP_APR.pyCP_APR import CP_APR # CP-APR Object with PyTorch backend on a GPU. Transfer the latent factors back to Numpy arrays. cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1, device='gpu', device_num=0, return_type='numpy')
Using the Numpy backend:
cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1, method='numpy')
- fit(**parameters)[source]#
Takes the decomposition of sparse or dense tensor X and returns the KRUSKAL tensor M.
Here M is latent factors and the weight of each R (rank) component.
If a list of ranks is passed, factorize the tensor for each 2 of the ranks.
The factorized 2 tensors M in this case will follow the weighted lambda calculations during prediction.
- Parameters:
tensor (PyTorch.sparse or Numpy array) --
Original dense or sparse tensor X.
Can be used when
Type='sptensor'
. In this case,Tensor
needs to be a PyTorch Sparse tensor format.Or use with
Type='tensor'
and passTensor
as a dense Numpy array.Warning
Note that PyTorch backend only supports
Type='sptensor'
.coords (Numpy array) --
Array of non-zero coordinates for sparse tensor X. COO format.
Each entry in this array is a coordinate of a non-zero value in the original tensor X.
Warning
Used when
Type='sptensor'
, and whentensor
parameter is not passed.len(coords)
is number of total entiries in X, andlen(coords[0])
should give the number of dimensions X has.
values (Numpy array) --
List of non-zero values corresponding to each list of non-zero coordinates (
coords
). Array of non-zero tensor entries. COO format.Warning
Used when
Type='sptensor'
andtensor
parameter is not passed.Length of
values
must match the length ofcoords
.
rank (int or list) --
Tensor rank, or list of ranks for two tensors.
Tensor rank determines the number of components.
List of ranks will allow using weighted prediction between the two latent factors in KRUSKAL tensor M.
Pass a single integer or list of length two.
The default is
rank=2
.Minit (string or dictionary of latent factors) --
Initial value of latent factors.
If
Minit='random'
, initial factors are drawn randomly from uniform distribution between 0 and 1.Else, pass a dictionary where the key is the mode number and value is array size
d x r
whered
is the number of elements on the dimension andr
is the rank.The default is
Minit='random'
.Note
Example on creating initial M for 3 dimensional tensor shaped 5x5x5 for rank 4 decomposition:
import numpy as np num_dimensions = 3 tensor_shape = [5,5,5] rank = 4 M_init = {"Factors":{}, "Weights":[1,1,1]} for d in range(num_dimensions): M_init["Factors"][str(d)] = np.random.uniform(low=0, high=1, size=(tensor_shape[d], rank)) M_init["Factors"]
{ '0': array([[0.821161 , 0.419537 , 0.62692165, 0.06294969], [0.02032657, 0.88625546, 0.74128504, 0.71855629], [0.70760879, 0.83813636, 0.35128158, 0.94442011], [0.35780608, 0.83703369, 0.84602297, 0.93760842], [0.00746915, 0.05974905, 0.49097518, 0.60615737]]), '1': array([[0.61902526, 0.78453503, 0.05596952, 0.69149084], [0.56300552, 0.82418509, 0.04278352, 0.25716303], [0.66221183, 0.13888761, 0.92502242, 0.57817265], [0.31738958, 0.87061048, 0.64170398, 0.62236073], [0.9110603 , 0.5133135 , 0.89232955, 0.09881775]]), '2': array([[0.0580065 , 0.82367217, 0.07616138, 0.93873983], [0.89247679, 0.41388867, 0.82089524, 0.10293565], [0.13540868, 0.09809637, 0.10844113, 0.90405324], [0.91167498, 0.67068632, 0.51705956, 0.82211517], [0.80942828, 0.08450466, 0.6306868 , 0.78132797]]) }
Type (string) --
Type of tensor (i.e. sparse or dense).
Use
Type='sptensor'
for sparse, andType='tensor'
for dense tensors.'sptensor' can be used with
method='torch'
, andmethod='numpy'
.If 'sptensor' used, pass the list of non-zero coordinates using the
coords
parameter and the corresponding list of non-zero elements withvalues
parameter. This is the COO representation of X.'sptensor' can also be used with the PyTorch Sparse format. In that case, pass the tensor X that is in torch.sparse format using the
tensor
parameter.'tensor' can be used with
method='numpy'
. In this case, pass the tensor X using thetensor
parameter.The default is
Type='sptensor'
.
- Returns:
KRUSKAL tensor M -- KRUSKAL tensor M is returned in dict format.
The latent factors can be found with the key 'Factors'.
The weight of each component can be found with the key 'Weights'.
- Return type:
dict
Note
Example
Sparse tensor X in COO format decomposed using a GPU in the below example:
from pyCP_APR.datasets import load_dataset from pyCP_APR import CP_APR from collections import OrderedDict import numpy as np data = load_dataset(name = "TOY") # Training set coords_train = data['train_coords'] nnz_train = data['train_count'] # Test set coords_test = data['test_coords'] nnz_test = data['test_count'] # CP-APR model cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1, method='torch', device='gpu', device_num=0 ) # factorize the tensor for ranks 1 and 4 M = cp_apr.fit(coords=coords_train, values=nnz_train, rank=[1,4])
Above example takes the tensor decomposition of X for ranks 1 and 4. Below is an example showing a single rank decomposition:
M = cp_apr.fit(coords=coords_train, values=nnz_train, rank=4)
An example of factorized X, i.e. M (KRUSKAL tensor). Below example M is rank 2, and has 3 dimensions:
{ 'Factors': { '0': array([[5.88838457e-51, 2.13058370e-01], [3.23364716e-04, 1.34610100e-01], [2.05013230e-01, 7.12928005e-37], [1.48424405e-01, 0.00000000e+00], [9.76200219e-02, 1.48477484e-01], [2.51566211e-02, 2.06908903e-01], [1.43573934e-01, 3.34319439e-88], [2.61925420e-01, 4.76257924e-33], [9.37106506e-02, 1.87295857e-01], [2.42523537e-02, 1.09649287e-01]]), '1': array([[2.31775360e-241, 5.03672967e-002], [7.79309622e-002, 1.00144467e-137], [0.00000000e+000, 7.84481789e-002], [1.23105143e-001, 9.77480876e-002], [3.30736653e-002, 5.64828345e-002], [1.56285154e-078, 9.36029407e-003], [4.85047483e-002, 0.00000000e+000], [3.10430389e-002, 0.00000000e+000], [2.39290092e-002, 3.29838934e-002], [0.00000000e+000, 0.00000000e+000], [6.75832826e-002, 0.00000000e+000]]), '2': array([[2.71626813e-002, 0.00000000e+000], [1.68530286e-003, 4.18040234e-002], [0.00000000e+000, 2.00577503e-002], [0.00000000e+000, 5.34873341e-002], [0.00000000e+000, 2.89723060e-002], [4.00972915e-002, 4.85187813e-161], [4.49477703e-002, 0.00000000e+000], [1.00243229e-002, 0.00000000e+000], [0.00000000e+000, 3.69954061e-002], [2.29589330e-002, 1.33718335e-002], [3.23365254e-002, 0.00000000e+000]])}, 'Weights': array([3092.47820339, 2243.52179661]) }
Below is an example of how torch.sparse format can be used as the tensor X:
import torch i = torch.LongTensor([[0, 1, 1], [2, 0, 2], [2, 0, 1]]) v = torch.FloatTensor([3, 4, 5]) X = torch.sparse.FloatTensor(i, v, torch.Size([4,4,4]))
from pyCP_APR import CP_APR cp_apr = CP_APR(n_iters=100, verbose=10, device='gpu') result = cp_apr.fit(tensor=X, rank=30)
Using TITAN RTX CP-APR (MU): Iter=1, Inner Iter=30, KKT Violation=0.425532, obj=4.887921, nViolations=0 Exiting because all subproblems reached KKT tol. =========================================== Final log-likelihood = 4.888204 Final least squares fit = 0.999995 Final KKT violation = 0.000007 Total inner iterations = 37 Total execution time = 0.2447 seconds Converting the latent factors to Numpy arrays.
Below is an example on using a dense Numpy array as tensor X:
import numpy as np from pyCP_APR import CP_APR # X has the shape 10 x 30 x 40 X = np.arange(1, 12001).reshape([10,30,40]) cp_apr = CP_APR(n_iters=100, verbose=10, method='numpy') result = cp_apr.fit(tensor=X, Type='tensor', rank=2)
CP-APR (MU): Iter=1, Inner Iter=30, KKT Violation=0.244501, obj=534739600.517348, nViolations=0 Exiting because all subproblems reached KKT tol. =========================================== Final log-likelihood = 534841753.347965 Final least squares fit = 0.971281 Final KKT violation = 0.000091 Total inner iterations = 161 Total execution time = 1.1126 seconds
result.keys()
dict_keys(['Factors', 'Weights'])
M = result['Factors'] Gamma = result['Weights'] M_0 = M['0'] Gamma_0 = Gamma[0] print('Component 0:', M_0, 'Gamma 0:', Gamma_0)
Component 0: [[0.01002107 0.0099889 ] [0.03001639 0.02999136] [0.05001171 0.04999383] [0.07000709 0.0699962 ] [0.09000233 0.08999878] [0.10999784 0.11000099] [0.12999289 0.13000382] [0.14998825 0.15000623] [0.16998359 0.17000867] [0.18997884 0.19001122]] Gamma 0: 41633867.33685632
- get_params()[source]#
The function call that returns the model parameters in a dictionary where a key is the model variable name and the value is its current value. model is the backend used during factorization (i.e. pyCP_APR.torch_cp.CP_APR_Torch or pyCP_APR.numpy_cp.CP_APR).
Note
Model parameters can also be accessed with a call directly to the model (i.e. pyCP_APR.CP_APR.model).
Note
Example
from pyCP_APR import CP_APR cp_apr = CP_APR(n_iters=10) cp_apr.get_params()
{ 'verbose': 10, 'print_inner_itn': 0, 'start_time': -1, 'final_iter': -1, 'dtype': 'torch.DoubleTensor', 'device': 'cpu', 'device_num': '0', 'return_type': 'numpy', 'X': None, 'M': None, 'tol': 0.0001, 'stoptime': 1000000.0, 'exec_time': -1, 'n_iters': 10, 'max_inner_iters': 10, 'random_state': 42, 'kappa': 0.01, 'kappa_tol': 1e-10, 'kktViolations': tensor([-1., -1., -1., -1., -1., -1., -1., -1., -1., -1.]), 'nInnerIters': tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), 'times': tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), 'logLikelihoods': tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]), 'epsilon': tensor(1.0000e-10), 'obj': 0 }
The model parameters of a fitted model may look like following:
{ 'verbose': 1, 'print_inner_itn': 0, 'start_time': 1621842138.9178197, 'final_iter': 10, 'dtype': 'torch.cuda.DoubleTensor', 'device': device(type='cuda', index=0), 'device_num': '0', 'return_type': 'numpy', 'X': <pyCP_APR.torch_cp.sptensor_Torch.SP_TENSOR at 0x7f759dde0eb0>, 'M': <pyCP_APR.torch_cp.ktensor_Torch.K_TENSOR at 0x7f759dde0c40>, 'tol': 0.0001, 'stoptime': 1000000.0, 'exec_time': 0.41489696502685547, 'n_iters': 10, 'max_inner_iters': 10, 'random_state': 42, 'kappa': 0.01, 'kappa_tol': 1e-10, 'kktViolations': array([ 0.90469806, 170.76687748, 439.2454979 , 171.40815408, 32.90351131, 49.3060232 , 112.32568973, 156.01606934, 30.25656979, 24.61207396]), 'nInnerIters': array([38., 34., 34., 33., 33., 33., 33., 33., 33., 33.]), 'times': array([0.05231047, 0.09365845, 0.14037013, 0.18132734, 0.22193289, 0.26159644, 0.29836512, 0.33511233, 0.37501359, 0.41115212]), 'logLikelihoods': array([9.59298322e+12, 1.02443325e+13, 1.03623011e+13, 1.04159710e+13, 1.04551136e+13, 1.04828857e+13, 1.05091620e+13, 1.05297763e+13, 1.05492323e+13, 1.05635448e+13]), 'epsilon': array(1.e-10), 'obj': 10563544752580.432 }
- get_prediction()[source]#
The function call that returns the predictions.
The model must be already fitted using pyCP_APR.CP_APR.fit(), and links must be already predicted using pyCP_APR.CP_APR.predict_scores().
Warning
The prediction is returned as dictionary with the keys 'objective' and 'lambda'.
The 'objective' key depends on the
objective
parameter used when predicting the scores in pyCP_APR.CP_APR.predict_scores().- Returns:
predictions -- The prediction contents are based on the objective parameter used during pyCP_APR.CP_APR.predict_scores().
If
objective='p_value'
, returns{'p_value': array, 'lambda': array}
.If
objective='log_likelihood'
, returns{'log_likelihood': array, 'lambda': array}
.- Return type:
dict
Note
Example
cp_apr.get_prediction()
{ 'p_value': array([1., 1., 1., ..., 1., 1., 1.]), 'lambda': array([ 3796.43165171, 2315.69440274, 1001.22377495, ..., 290.35037293, 2952.72557334, 30309.82134089]) }
- get_score()[source]#
The function call that returns the objective value of the model.
Model is pyCP_APR.CP_APR.model.obj.
If an ensemble of tensors is trained (see pyCP_APR.CP_APR.fit()), a list of scores is provided instead.
Note that the model has to be already fit.
- Returns:
score -- Model fit score.
- Return type:
float
- predict_probas(y, axis_map=None)[source]#
Calculates the probabilities of the test tensor entries.
Then returns the prediction scores with ROC-AUC and PR-AUC metrics.
Fusion is performed on the dimensions indicated by
axis_map
. Dimension fusion uses the Harmonic Mean.This function is in beta stage.
Note
Anomaly detection can be performed either using predict_scores(), or using transform() and predict_probas(). While using transform() and predict_probas() yields faster computation time and more established dimension fusion results, predict_scores() provides wider range of features for anomaly detection.
Warning
In order to use the predict_probas() function, below has to be done first:
Tensor has to be factorized using fit() function first to extract the KRUSKAL tensor M.
After fit(), Poisson lambda values for the test tensor has to be calculated using the transform() function.
- Parameters:
y (list or array) -- Labels for each entry in the sparse test tensor.
axis_map (OrderedDict, optional) --
If fusing dimensions, list of dimension numbers can be passed as OrderedDict to identify which dimensions to fuse.
The tuples in the ordered dictionary map will have 2 entries, where the first entry is the dimension name in string, and the second entry is the dimension number(s) in list. The default is
axis_map=None
.
- Returns:
prediction scores -- If not fusing the dimensions over the transformed tensor, returns a dictionary with prediction scores.
{"roc_auc":float, "pr_auc":float}
If dimensions are being fused using the
axis_map
parameter, returns a Pandas DataFrame with the prediction scores. The columns for the returned DataFrame in this case are 'fusion', 'method', 'metric', and 'score'.- Return type:
dict or Pandas DataFrame
Note
For a four dimensional tensor with dimension names U x S x D x s,
axis_map
to fuse to first dimension, and first and second dimensions would be:axis_map=OrderedDict((('U', [0]), ('US', [0, 1])))
. Here'U'
and'US'
are the dimension names, and[0]
and[0,1]
are the dimension numbers respectively.Another example is illustrated below.
from collections import OrderedDict axis_map = OrderedDict((('U', [0]), ('S', [1]), ('D', [2]), ('US', [0, 1]), ('UD', [0, 2]), ('SD', [1, 2])))
- predict_scores(**parameters)[source]#
The function call that can be used for classification of anomalies after fitting the tensor. The model will use the trained latent factors to generate the Poisson lambda scores corresponding to the given new coordinate.
These lambda values are then used to calculate the p-values for classification of the entries.
The lower p-value here is an indicator of an anomaly.
Since the learned or expected behaviour during the training time is represented by the KRUSKAL tensor M, we can calculate how likely a new index to occur in M (i.e. M represents the average number of expected events for each coordinate). If two tensors trained during fitting, the prediction will weight the lambdas before calculating the p-values.
Note
We find that using ensemble of tensors during prediction significantly reduces the false positive rates for anomaly detection as shown in [2].
Anomaly detection can be performed either using predict_scores(), or using transform() and predict_probas(). While using transform() and predict_probas() yields faster computation time and more established dimension fusion results, predict_scores() provides wider range of features for anomaly detection.
Warning
To use predict_scores(), fit() the model first.
- Parameters:
coords (array) -- Coordinates of the non-zero values.
values (list) --
List of non-zero values.
Length must match the
coords
parameter length.Example binary links:
array([1, 1, 1])
.from_matlab (boolean, optional) --
If the dataset used in MATLAB as well, indices may start at 1 instead of 0.
This parameter can be used to subtract 1 from the indices.
The default is False.
objective (string, optional) --
objective='p_value'
calculates the Poisson p-value.Fusion
objective
options: 'p_value_fusion_harmonic', 'p_value_fusion_harmonic_observed', 'p_value_fusion_chi2', 'p_value_fusion_chi2_observed', 'p_value_fusion_arithmetic'.If fusion is being used, specify the list of dimensions that are being targeted via the
p_value_fusion_index
parameter.Calculate log_likelihood of observing the link with
objective='log_likelihood'
.The default is
objective='p_value'
.ensemble_significance (list of length two, optional) -- Weight of each tensor, if two is trained. Two is trained when
rank=[r1,r2]
during pyCP_APR.CP_APR.fit() where r1 and r2 are intiger ranks. The default isensemble_significance=[0.1, 0.9]
.p_value_fusion_index (list, optional) --
Fuses down to the target dimensions.
List should contain the index of the dimensions to fuse. The default is
p_value_fusion_index=[0]
.Warning
Only used if fusion objective is being used.
ignore_dimensions_indx (list, optional) -- If used, the dimension numbers in the list will be ignored during the calculation of the lambdas.
- Returns:
predictions -- Returns the prediction objective.
For instance, if parameter was
objective='p_value'
, array of p-values are returned for each entry in the test tensor.- Return type:
array
Note
Example
Sample coordinate and value pair for a four dimensional tensor with 3 entries:
# coordinates of 3 entries of 4 dimensional tensor coords = array([[ 0, 961, 0, 0], [ 0, 961, 1742, 0], [ 0, 961, 2588, 0]]) values = [1,2,1]
Extracting the p-values from the test tensor:
from pyCP_APR.datasets import load_dataset from pyCP_APR import CP_APR from collections import OrderedDict import numpy as np data = load_dataset(name = "TOY") # Training set coords_train = data['train_coords'] nnz_train = data['train_count'] # Test set coords_test = data['test_coords'] nnz_test = data['test_count'] # CP-APR model cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1, method='torch', device='gpu', device_num=0) # factorize the tensor for ranks 1 and 4. M = cp_apr.fit(coords=coords_train, values=nnz_train, rank=[1,4]) # calculate the p-values for the entries in the test set p_values = cp_apr.predict_scores(coords=coords_test, values=nnz_test)
These p-values are also saved in the class variable, and can be found as follows:
scores = list(cp_apr.prediction['p_value'])
- set_params(**parameters)[source]#
Sets the model parameters.
Here the model is pyCP_APR.CP_APR.model.
- Parameters:
parameters (dict) -- All model parameters. See pyCP_APR.CP_APR class initilization parameters.
- Returns:
self -- self is returned.
- Return type:
object
- transform(indicies, ensemble_significance=[])[source]#
Given the sparse test tensor entries (or indicies, i.e. coordinates of non-zero values), calculate the Poisson lambda values.
The Poission lambda value can be used to identify how likely was that coordinate in the test tensor to occur given what we have learned during the training time.
Note
If 2 tensors are weighted during the calculation of lambdas, the weight of each can be specified using
ensemble_significance
. See pyCP_APR.CP_APR.fit() for factorazing ensemble of tensor ranks. For instance, ifensemble_significance=[0.1, 0.9]
,lambda = (0.1 x lambda_1) + (0.9 x lambda_2)
.Anomaly detection can be performed either using predict_scores(), or using transform() and predict_probas(). While using transform() and predict_probas() yields faster computation time and more established dimension fusion results, predict_scores() provides wider range of features for anomaly detection.
Warning
To use the transform() function, the model has to be fit() first.
- Parameters:
indicies (list or array) -- List of non-zero coordinates in the test tensor.
ensemble_significance (list, optional) --
The weight of each tensor during lambda calculations.
Ensemble significance is automatically applied if multiple tensors are fitted.
If multiple tensors are fitted, the default is
ensemble_significance=[0.1, 0.9]
.If single tensor is fitted, the default is
ensemble_significance=[1]
.
- Returns:
lambdas -- List of lambda values for each of the non-zero coordinates.
- Return type:
array
Note
Example
from pyCP_APR.datasets import load_dataset from pyCP_APR import CP_APR from collections import OrderedDict import numpy as np data = load_dataset(name = "TOY") # Training set coords_train = data['train_coords'] nnz_train = data['train_count'] # Test set coords_test = data['test_coords'] nnz_test = data['test_count'] # CP-APR model cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1, method='numpy') # factorize X for ranks 1 and 4 M = cp_apr.fit(coords=coords_train, values=nnz_train, rank=[1,4]) # get the lambdas # returned lambdas are weighted values for rank 1 and rank 4 tensor decomposition Ms lambdas = cp_apr.transform(coords_test)
Module contents#
2021. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.