pyCP_APR package#

Subpackages#

Submodules#

pyCP_APR.datasets module#

datasets.py is used to load the example tensors.

@author Maksim Ekin Eren

pyCP_APR.datasets.list_datasets()[source]

This function returns the list of tensor names that are available to load.

If the listing is requested for the first time, a new directory for the datasets is created.

Returns:

datasets -- List of tensor names that are available to load.

Return type:

list

Note

Example

from pyCP_APR.datasets import list_datasets

list_datasets()
['TOY']
pyCP_APR.datasets.load_dataset(name='TOY')[source]

Loads the tensor specified by its name.

Warning

If a dataset is requested for the first time, it gets downloaded from GitHub.

Parameters:

name (string, optional) -- The name of the tensor to load. The default is name="TOY".

Returns:

data -- Tensor contents compressed in Numpy NPZ format.

Return type:

Numpy NPZ

Note

Example

from pyCP_APR.datasets import load_dataset

# Load a sample authentication training and test tensors along with the labels
data = load_dataset(name="TOY")
coords_train, nnz_train = data['train_coords'], data['train_count']
coords_test, nnz_test = data['test_coords'], data['test_count']

Available tensor data can be listed as follows:

data = load_dataset(name = "TOY")
list(data)
['train_coords',
 'train_count',
 'test_coords',
 'test_count']

pyCP_APR.pyCP_APR module#

pyCP_APR.py is the Scikit-learn like API for interacting with the CP-APR algorithm. The pyCP_APR.CP_APR wraps both the Numpy and PyTorch backend. pyCP_APR.CP_APR also includes API calls for anomaly detection utilities. Sparse tensor entries are scored by calculating p-values over the fitted model, where the lower p-value scores are an indicator for anomaly.

The fitted model (or factorized tensor, i.e. the KRUSKAL tensor M) during the training time describes the normal or the expected behavior which follows the Poisson distribution. We say that the entries of the tensor in the test set are drawn from the same distribution as the factorized training tensor M. Using M, we calculate how likely the entries of the test tensor to occur given what was expected. This methodology was introduced by Eren et al. in [1].

Some code comments are borrowed from the original implementation of CP-APR in [2-5].

References

[1] M. E. Eren, J. S. Moore and B. S. Alexandrov, "Multi-Dimensional Anomalous Entity Detection via Poisson Tensor Factorization," 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), 2020, pp. 1-6, doi: 10.1109/ISI49825.2020.9280524.

[2] General software, latest release: Brett W. Bader, Tamara G. Kolda and others, Tensor Toolbox for MATLAB, Version 3.2.1, www.tensortoolbox.org, April 5, 2021.

[3] Dense tensors: B. W. Bader and T. G. Kolda, Algorithm 862: MATLAB Tensor Classes for Fast Algorithm Prototyping, ACM Trans. Mathematical Software, 32(4):635-653, 2006, http://dx.doi.org/10.1145/1186785.1186794.

[4] Sparse, Kruskal, and Tucker tensors: B. W. Bader and T. G. Kolda, Efficient MATLAB Computations with Sparse and Factored Tensors, SIAM J. Scientific Computing, 30(1):205-231, 2007, http://dx.doi.org/10.1137/060676489.

[5] Chi, E.C. and Kolda, T.G., 2012. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4), pp.1272-1299.

@author Maksim Ekin Eren

class pyCP_APR.pyCP_APR.CP_APR(**parameters)[source]

Bases: object

Initilize the pyCP_APR.CP_APR class. pyCP_APR.CP_APR class is the wrapper for the CP-APR algorithm's Python implementation with both Numpy and PyTorch backend.

Parameters:
  • method (string, optional) --

    Specifies which backend to use when running CP-APR, and sets the model (i.e. pyCP_APR.CP_APR.model) accordingly.

    method='torch' or method='pytorch' will use PyTorch and enable GPU utilization.

    method='numpy' will use Numpy backend.

    Default is method='torch'.

    Warning

    • method='torch' or method='pytorch' only supports sparse tensors in COO format.

    • method='numpy' supports both sparse (COO format) & dense tensors.

  • epsilon (float, optional) -- Prevents zero division. Default is 1e-10.

  • kappa (float, optional) -- Fix slackness level. Default is 1e-2.

  • kappa_tol (float, optional) -- Tolerance on slackness level. Default is 1e-10.

  • max_inner_iters (int, optional) -- Number of inner iterations per epoch. Default is 10.

  • follow_M (bool, optional) -- Saves M on each iteration if True. The default is False.

  • n_iters (int, optional) -- Number of iterations during optimization or epoch. Default is 1000.

  • print_inner_itn (int, optional) -- Print every n inner iterations. Does not print if 0. Default is 0.

  • verbose (int, optional) -- Print every n epoch, or n_iters. Does not print if 0. Default is 10.

  • stoptime (float, optional) -- Number of seconds before early stopping. Default is 1e6.

  • tol (float, optional) -- KKT violations tolerance. Default is 1e-4.

  • random_state (int, optional) -- Random seed for the initial M (KRUSKAL Tensor). Default is 42.

  • device (string, optional) --

    Specifies CPU or GPU utilization for factorizing the tensor.

    device='cpu' to use PyTorch with CPU.

    device='gpu' to use PyTorch with GPU. Only if a CUDA device is available.

    Default is device='cpu'.

    Warning

    Only used when method='torch' or method='pytorch'.

  • device_num (int or string, optional) --

    Which GPU to use to compute the KRUSKAL tensor M.

    Default is device_num=0.

    Warning

    Only used when method='torch' and device='gpu'.

  • return_type (string, optional) --

    The return type for the final KRUSKAL tensor M.

    return_type='torch' keep as torch tensors.

    return_type='numpy' convert to Numpy arrays and transfer the tensor back to CPU if GPU was used.

    Default is return_type='numpy'.

    Warning

    Only used when method='torch' or method='pytorch'.

  • dtype --

    Type to be used in torch tensors.

    Default is 'torch.DoubleTensor' when device='cpu'. Default is 'torch.cuda.DoubleTensor' when device='gpu'.

Note

Example

Using the PyTorch backend on GPU 0:

from pyCP_APR.pyCP_APR import CP_APR

# CP-APR Object with PyTorch backend on a GPU. Transfer the latent factors back to Numpy arrays.
cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1, device='gpu', device_num=0, return_type='numpy')

Using the Numpy backend:

cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1, method='numpy')
fit(**parameters)[source]

Takes the decomposition of sparse or dense tensor X and returns the KRUSKAL tensor M.

Here M is latent factors and the weight of each R (rank) component.

If a list of ranks is passed, factorize the tensor for each 2 of the ranks.

The factorized 2 tensors M in this case will follow the weighted lambda calculations during prediction.

Parameters:
  • tensor (PyTorch.sparse or Numpy array) --

    Original dense or sparse tensor X.

    Can be used when Type='sptensor'. In this case, Tensor needs to be a PyTorch Sparse tensor format.

    Or use with Type='tensor' and pass Tensor as a dense Numpy array.

    Warning

    Note that PyTorch backend only supports Type='sptensor'.

  • coords (Numpy array) --

    Array of non-zero coordinates for sparse tensor X. COO format.

    Each entry in this array is a coordinate of a non-zero value in the original tensor X.

    Warning

    • Used when Type='sptensor', and when tensor parameter is not passed.

    • len(coords) is number of total entiries in X, and len(coords[0]) should give the number of dimensions X has.

  • values (Numpy array) --

    List of non-zero values corresponding to each list of non-zero coordinates (coords). Array of non-zero tensor entries. COO format.

    Warning

    • Used when Type='sptensor' and tensor parameter is not passed.

    • Length of values must match the length of coords.

  • rank (int or list) --

    Tensor rank, or list of ranks for two tensors.

    Tensor rank determines the number of components.

    List of ranks will allow using weighted prediction between the two latent factors in KRUSKAL tensor M.

    Pass a single integer or list of length two.

    The default is rank=2.

  • Minit (string or dictionary of latent factors) --

    Initial value of latent factors.

    If Minit='random', initial factors are drawn randomly from uniform distribution between 0 and 1.

    Else, pass a dictionary where the key is the mode number and value is array size d x r where d is the number of elements on the dimension and r is the rank.

    The default is Minit='random'.

    Note

    Example on creating initial M for 3 dimensional tensor shaped 5x5x5 for rank 4 decomposition:

    import numpy as np
    
    num_dimensions = 3
    tensor_shape = [5,5,5]
    rank = 4
    M_init = {"Factors":{}, "Weights":[1,1,1]}
    for d in range(num_dimensions):
            M_init["Factors"][str(d)] = np.random.uniform(low=0, high=1, size=(tensor_shape[d], rank))
    M_init["Factors"]
    
    {
     '0': array([[0.821161  , 0.419537  , 0.62692165, 0.06294969],
            [0.02032657, 0.88625546, 0.74128504, 0.71855629],
            [0.70760879, 0.83813636, 0.35128158, 0.94442011],
            [0.35780608, 0.83703369, 0.84602297, 0.93760842],
            [0.00746915, 0.05974905, 0.49097518, 0.60615737]]),
     '1': array([[0.61902526, 0.78453503, 0.05596952, 0.69149084],
            [0.56300552, 0.82418509, 0.04278352, 0.25716303],
            [0.66221183, 0.13888761, 0.92502242, 0.57817265],
            [0.31738958, 0.87061048, 0.64170398, 0.62236073],
            [0.9110603 , 0.5133135 , 0.89232955, 0.09881775]]),
     '2': array([[0.0580065 , 0.82367217, 0.07616138, 0.93873983],
            [0.89247679, 0.41388867, 0.82089524, 0.10293565],
            [0.13540868, 0.09809637, 0.10844113, 0.90405324],
            [0.91167498, 0.67068632, 0.51705956, 0.82211517],
            [0.80942828, 0.08450466, 0.6306868 , 0.78132797]])
    }
    

  • Type (string) --

    Type of tensor (i.e. sparse or dense).

    Use Type='sptensor' for sparse, and Type='tensor' for dense tensors.

    'sptensor' can be used with method='torch', and method='numpy'.

    If 'sptensor' used, pass the list of non-zero coordinates using the coords parameter and the corresponding list of non-zero elements with values parameter. This is the COO representation of X.

    'sptensor' can also be used with the PyTorch Sparse format. In that case, pass the tensor X that is in torch.sparse format using the tensor parameter.

    'tensor' can be used with method='numpy'. In this case, pass the tensor X using the tensor parameter.

    The default is Type='sptensor'.

Returns:

KRUSKAL tensor M -- KRUSKAL tensor M is returned in dict format.

The latent factors can be found with the key 'Factors'.

The weight of each component can be found with the key 'Weights'.

Return type:

dict

Note

Example

Sparse tensor X in COO format decomposed using a GPU in the below example:

from pyCP_APR.datasets import load_dataset
from pyCP_APR import CP_APR
from collections import OrderedDict
import numpy as np

data = load_dataset(name = "TOY")

# Training set
coords_train = data['train_coords']
nnz_train = data['train_count']

# Test set
coords_test = data['test_coords']
nnz_test = data['test_count']

# CP-APR model
cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1,
    method='torch',
    device='gpu',
    device_num=0
   )

# factorize the tensor for ranks 1 and 4
M = cp_apr.fit(coords=coords_train, values=nnz_train, rank=[1,4])

Above example takes the tensor decomposition of X for ranks 1 and 4. Below is an example showing a single rank decomposition:

M = cp_apr.fit(coords=coords_train, values=nnz_train, rank=4)

An example of factorized X, i.e. M (KRUSKAL tensor). Below example M is rank 2, and has 3 dimensions:

{
   'Factors':
   {
      '0':
         array([[5.88838457e-51, 2.13058370e-01],
         [3.23364716e-04, 1.34610100e-01],
         [2.05013230e-01, 7.12928005e-37],
         [1.48424405e-01, 0.00000000e+00],
         [9.76200219e-02, 1.48477484e-01],
         [2.51566211e-02, 2.06908903e-01],
         [1.43573934e-01, 3.34319439e-88],
         [2.61925420e-01, 4.76257924e-33],
         [9.37106506e-02, 1.87295857e-01],
         [2.42523537e-02, 1.09649287e-01]]),
      '1':
         array([[2.31775360e-241, 5.03672967e-002],
         [7.79309622e-002, 1.00144467e-137],
         [0.00000000e+000, 7.84481789e-002],
         [1.23105143e-001, 9.77480876e-002],
         [3.30736653e-002, 5.64828345e-002],
         [1.56285154e-078, 9.36029407e-003],
         [4.85047483e-002, 0.00000000e+000],
         [3.10430389e-002, 0.00000000e+000],
         [2.39290092e-002, 3.29838934e-002],
         [0.00000000e+000, 0.00000000e+000],
         [6.75832826e-002, 0.00000000e+000]]),
      '2':
         array([[2.71626813e-002, 0.00000000e+000],
         [1.68530286e-003, 4.18040234e-002],
         [0.00000000e+000, 2.00577503e-002],
         [0.00000000e+000, 5.34873341e-002],
         [0.00000000e+000, 2.89723060e-002],
         [4.00972915e-002, 4.85187813e-161],
         [4.49477703e-002, 0.00000000e+000],
         [1.00243229e-002, 0.00000000e+000],
         [0.00000000e+000, 3.69954061e-002],
         [2.29589330e-002, 1.33718335e-002],
         [3.23365254e-002, 0.00000000e+000]])},
  'Weights': array([3092.47820339, 2243.52179661])
}

Below is an example of how torch.sparse format can be used as the tensor X:

import torch

i = torch.LongTensor([[0, 1, 1], [2, 0, 2], [2, 0, 1]])

v = torch.FloatTensor([3, 4, 5])
X = torch.sparse.FloatTensor(i, v, torch.Size([4,4,4]))
from pyCP_APR import CP_APR

cp_apr = CP_APR(n_iters=100, verbose=10, device='gpu')
result = cp_apr.fit(tensor=X, rank=30)
Using TITAN RTX
CP-APR (MU):
Iter=1, Inner Iter=30, KKT Violation=0.425532, obj=4.887921, nViolations=0
Exiting because all subproblems reached KKT tol.
===========================================
 Final log-likelihood = 4.888204
 Final least squares fit = 0.999995
 Final KKT violation = 0.000007
 Total inner iterations = 37
 Total execution time = 0.2447 seconds
Converting the latent factors to Numpy arrays.

Below is an example on using a dense Numpy array as tensor X:

import numpy as np
from pyCP_APR import CP_APR

# X has the shape 10 x 30 x 40
X = np.arange(1, 12001).reshape([10,30,40])

cp_apr = CP_APR(n_iters=100, verbose=10, method='numpy')
result = cp_apr.fit(tensor=X, Type='tensor', rank=2)
CP-APR (MU):
Iter=1, Inner Iter=30, KKT Violation=0.244501, obj=534739600.517348, nViolations=0
Exiting because all subproblems reached KKT tol.
===========================================
 Final log-likelihood = 534841753.347965
 Final least squares fit = 0.971281
 Final KKT violation = 0.000091
 Total inner iterations = 161
 Total execution time = 1.1126 seconds
result.keys()
dict_keys(['Factors', 'Weights'])
M = result['Factors']
Gamma = result['Weights']

M_0 = M['0']
Gamma_0 = Gamma[0]

print('Component 0:', M_0, 'Gamma 0:', Gamma_0)
Component 0:
[[0.01002107 0.0099889 ]
[0.03001639 0.02999136]
[0.05001171 0.04999383]
[0.07000709 0.0699962 ]
[0.09000233 0.08999878]
[0.10999784 0.11000099]
[0.12999289 0.13000382]
[0.14998825 0.15000623]
[0.16998359 0.17000867]
[0.18997884 0.19001122]]
Gamma 0: 41633867.33685632
get_params()[source]

The function call that returns the model parameters in a dictionary where a key is the model variable name and the value is its current value. model is the backend used during factorization (i.e. pyCP_APR.torch_cp.CP_APR_Torch or pyCP_APR.numpy_cp.CP_APR).

Note

Model parameters can also be accessed with a call directly to the model (i.e. pyCP_APR.CP_APR.model).

Note

Example

from pyCP_APR import CP_APR
cp_apr = CP_APR(n_iters=10)
cp_apr.get_params()
{
 'verbose': 10,
 'print_inner_itn': 0,
 'start_time': -1,
 'final_iter': -1,
 'dtype': 'torch.DoubleTensor',
 'device': 'cpu',
 'device_num': '0',
 'return_type': 'numpy',
 'X': None,
 'M': None,
 'tol': 0.0001,
 'stoptime': 1000000.0,
 'exec_time': -1,
 'n_iters': 10,
 'max_inner_iters': 10,
 'random_state': 42,
 'kappa': 0.01,
 'kappa_tol': 1e-10,
 'kktViolations': tensor([-1., -1., -1., -1., -1., -1., -1., -1., -1., -1.]),
 'nInnerIters': tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 'times': tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 'logLikelihoods': tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]),
 'epsilon': tensor(1.0000e-10),
 'obj': 0
 }

The model parameters of a fitted model may look like following:

{
 'verbose': 1,
 'print_inner_itn': 0,
 'start_time': 1621842138.9178197,
 'final_iter': 10,
 'dtype': 'torch.cuda.DoubleTensor',
 'device': device(type='cuda', index=0),
 'device_num': '0',
 'return_type': 'numpy',
 'X': <pyCP_APR.torch_cp.sptensor_Torch.SP_TENSOR at 0x7f759dde0eb0>,
 'M': <pyCP_APR.torch_cp.ktensor_Torch.K_TENSOR at 0x7f759dde0c40>,
 'tol': 0.0001,
 'stoptime': 1000000.0,
 'exec_time': 0.41489696502685547,
 'n_iters': 10,
 'max_inner_iters': 10,
 'random_state': 42,
 'kappa': 0.01,
 'kappa_tol': 1e-10,
 'kktViolations': array([  0.90469806, 170.76687748, 439.2454979 , 171.40815408,
         32.90351131,  49.3060232 , 112.32568973, 156.01606934,
         30.25656979,  24.61207396]),
 'nInnerIters': array([38., 34., 34., 33., 33., 33., 33., 33., 33., 33.]),
 'times': array([0.05231047, 0.09365845, 0.14037013, 0.18132734, 0.22193289,
        0.26159644, 0.29836512, 0.33511233, 0.37501359, 0.41115212]),
 'logLikelihoods': array([9.59298322e+12, 1.02443325e+13, 1.03623011e+13, 1.04159710e+13,
        1.04551136e+13, 1.04828857e+13, 1.05091620e+13, 1.05297763e+13,
        1.05492323e+13, 1.05635448e+13]),
 'epsilon': array(1.e-10),
 'obj': 10563544752580.432
 }
get_prediction()[source]

The function call that returns the predictions.

The model must be already fitted using pyCP_APR.CP_APR.fit(), and links must be already predicted using pyCP_APR.CP_APR.predict_scores().

Warning

The prediction is returned as dictionary with the keys 'objective' and 'lambda'.

The 'objective' key depends on the objective parameter used when predicting the scores in pyCP_APR.CP_APR.predict_scores().

Returns:

predictions -- The prediction contents are based on the objective parameter used during pyCP_APR.CP_APR.predict_scores().

If objective='p_value', returns {'p_value': array, 'lambda': array}.

If objective='log_likelihood', returns {'log_likelihood': array, 'lambda': array}.

Return type:

dict

Note

Example

cp_apr.get_prediction()
{
 'p_value': array([1., 1., 1., ..., 1., 1., 1.]),
 'lambda': array([ 3796.43165171,  2315.69440274,  1001.22377495, ..., 290.35037293,  2952.72557334, 30309.82134089])
}
get_score()[source]

The function call that returns the objective value of the model.

Model is pyCP_APR.CP_APR.model.obj.

If an ensemble of tensors is trained (see pyCP_APR.CP_APR.fit()), a list of scores is provided instead.

Note that the model has to be already fit.

Returns:

score -- Model fit score.

Return type:

float

predict_probas(y, axis_map=None)[source]

Calculates the probabilities of the test tensor entries.

Then returns the prediction scores with ROC-AUC and PR-AUC metrics.

Fusion is performed on the dimensions indicated by axis_map. Dimension fusion uses the Harmonic Mean.

This function is in beta stage.

Note

Anomaly detection can be performed either using predict_scores(), or using transform() and predict_probas(). While using transform() and predict_probas() yields faster computation time and more established dimension fusion results, predict_scores() provides wider range of features for anomaly detection.

Warning

In order to use the predict_probas() function, below has to be done first:

  1. Tensor has to be factorized using fit() function first to extract the KRUSKAL tensor M.

  2. After fit(), Poisson lambda values for the test tensor has to be calculated using the transform() function.

Parameters:
  • y (list or array) -- Labels for each entry in the sparse test tensor.

  • axis_map (OrderedDict, optional) --

    If fusing dimensions, list of dimension numbers can be passed as OrderedDict to identify which dimensions to fuse.

    The tuples in the ordered dictionary map will have 2 entries, where the first entry is the dimension name in string, and the second entry is the dimension number(s) in list. The default is axis_map=None.

Returns:

prediction scores -- If not fusing the dimensions over the transformed tensor, returns a dictionary with prediction scores.

{"roc_auc":float, "pr_auc":float}

If dimensions are being fused using the axis_map parameter, returns a Pandas DataFrame with the prediction scores. The columns for the returned DataFrame in this case are 'fusion', 'method', 'metric', and 'score'.

Return type:

dict or Pandas DataFrame

Note

For a four dimensional tensor with dimension names U x S x D x s, axis_map to fuse to first dimension, and first and second dimensions would be: axis_map=OrderedDict((('U', [0]), ('US', [0, 1]))). Here 'U' and 'US' are the dimension names, and [0] and [0,1] are the dimension numbers respectively.

Another example is illustrated below.

from collections import OrderedDict
axis_map = OrderedDict((('U', [0]), ('S', [1]), ('D', [2]), ('US', [0, 1]), ('UD', [0, 2]), ('SD', [1, 2])))

predict_scores(**parameters)[source]

The function call that can be used for classification of anomalies after fitting the tensor. The model will use the trained latent factors to generate the Poisson lambda scores corresponding to the given new coordinate.

These lambda values are then used to calculate the p-values for classification of the entries.

The lower p-value here is an indicator of an anomaly.

Since the learned or expected behaviour during the training time is represented by the KRUSKAL tensor M, we can calculate how likely a new index to occur in M (i.e. M represents the average number of expected events for each coordinate). If two tensors trained during fitting, the prediction will weight the lambdas before calculating the p-values.

Note

  • We find that using ensemble of tensors during prediction significantly reduces the false positive rates for anomaly detection as shown in [2].

  • Anomaly detection can be performed either using predict_scores(), or using transform() and predict_probas(). While using transform() and predict_probas() yields faster computation time and more established dimension fusion results, predict_scores() provides wider range of features for anomaly detection.

Warning

  • To use predict_scores(), fit() the model first.

Parameters:
  • coords (array) -- Coordinates of the non-zero values.

  • values (list) --

    List of non-zero values.

    Length must match the coords parameter length.

    Example binary links: array([1, 1, 1]).

  • from_matlab (boolean, optional) --

    If the dataset used in MATLAB as well, indices may start at 1 instead of 0.

    This parameter can be used to subtract 1 from the indices.

    The default is False.

  • objective (string, optional) --

    objective='p_value' calculates the Poisson p-value.

    Fusion objective options: 'p_value_fusion_harmonic', 'p_value_fusion_harmonic_observed', 'p_value_fusion_chi2', 'p_value_fusion_chi2_observed', 'p_value_fusion_arithmetic'.

    If fusion is being used, specify the list of dimensions that are being targeted via the p_value_fusion_index parameter.

    Calculate log_likelihood of observing the link with objective='log_likelihood'.

    The default is objective='p_value'.

  • ensemble_significance (list of length two, optional) -- Weight of each tensor, if two is trained. Two is trained when rank=[r1,r2] during pyCP_APR.CP_APR.fit() where r1 and r2 are intiger ranks. The default is ensemble_significance=[0.1, 0.9].

  • p_value_fusion_index (list, optional) --

    Fuses down to the target dimensions.

    List should contain the index of the dimensions to fuse. The default is p_value_fusion_index=[0].

    Warning

    Only used if fusion objective is being used.

  • ignore_dimensions_indx (list, optional) -- If used, the dimension numbers in the list will be ignored during the calculation of the lambdas.

Returns:

predictions -- Returns the prediction objective.

For instance, if parameter was objective='p_value', array of p-values are returned for each entry in the test tensor.

Return type:

array

Note

Example

Sample coordinate and value pair for a four dimensional tensor with 3 entries:

# coordinates of 3 entries of 4 dimensional tensor
coords = array([[    0,   961,     0,     0],
                [    0,   961,  1742,     0],
                [    0,   961,  2588,     0]])
values = [1,2,1]

Extracting the p-values from the test tensor:

from pyCP_APR.datasets import load_dataset
from pyCP_APR import CP_APR
from collections import OrderedDict
import numpy as np

data = load_dataset(name = "TOY")

# Training set
coords_train = data['train_coords']
nnz_train = data['train_count']

# Test set
coords_test = data['test_coords']
nnz_test = data['test_count']

# CP-APR model
cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1, method='torch', device='gpu', device_num=0)

# factorize the tensor for ranks 1 and 4.
M = cp_apr.fit(coords=coords_train, values=nnz_train, rank=[1,4])

# calculate the p-values for the entries in the test set
p_values = cp_apr.predict_scores(coords=coords_test, values=nnz_test)

These p-values are also saved in the class variable, and can be found as follows:

scores = list(cp_apr.prediction['p_value'])
set_params(**parameters)[source]

Sets the model parameters.

Here the model is pyCP_APR.CP_APR.model.

Parameters:

parameters (dict) -- All model parameters. See pyCP_APR.CP_APR class initilization parameters.

Returns:

self -- self is returned.

Return type:

object

transform(indicies, ensemble_significance=[])[source]

Given the sparse test tensor entries (or indicies, i.e. coordinates of non-zero values), calculate the Poisson lambda values.

The Poission lambda value can be used to identify how likely was that coordinate in the test tensor to occur given what we have learned during the training time.

Note

  • If 2 tensors are weighted during the calculation of lambdas, the weight of each can be specified using ensemble_significance. See pyCP_APR.CP_APR.fit() for factorazing ensemble of tensor ranks. For instance, if ensemble_significance=[0.1, 0.9], lambda = (0.1 x lambda_1) + (0.9 x lambda_2).

  • Anomaly detection can be performed either using predict_scores(), or using transform() and predict_probas(). While using transform() and predict_probas() yields faster computation time and more established dimension fusion results, predict_scores() provides wider range of features for anomaly detection.

Warning

To use the transform() function, the model has to be fit() first.

Parameters:
  • indicies (list or array) -- List of non-zero coordinates in the test tensor.

  • ensemble_significance (list, optional) --

    The weight of each tensor during lambda calculations.

    Ensemble significance is automatically applied if multiple tensors are fitted.

    If multiple tensors are fitted, the default is ensemble_significance=[0.1, 0.9].

    If single tensor is fitted, the default is ensemble_significance=[1].

Returns:

lambdas -- List of lambda values for each of the non-zero coordinates.

Return type:

array

Note

Example

from pyCP_APR.datasets import load_dataset
from pyCP_APR import CP_APR
from collections import OrderedDict
import numpy as np

data = load_dataset(name = "TOY")

# Training set
coords_train = data['train_coords']
nnz_train = data['train_count']

# Test set
coords_test = data['test_coords']
nnz_test = data['test_count']

# CP-APR model
cp_apr = CP_APR(n_iters=10, random_state=42, verbose=1, method='numpy')

# factorize X for ranks 1 and 4
M = cp_apr.fit(coords=coords_train, values=nnz_train, rank=[1,4])

# get the lambdas
# returned lambdas are weighted values for rank 1 and rank 4 tensor decomposition Ms
lambdas = cp_apr.transform(coords_test)

pyCP_APR.version module#

pyCP_APR version.

Module contents#

2021. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.