TELF.post_processing.Wolf: Graph centrality and ranking tool#
Graph centrality and ranking tool
Available Functions#
|
|
|
Create a graph from a matrix. |
|
Compute graph statistics for a tensor. |
|
Module Contents#
- class TELF.post_processing.Wolf.wolf.Wolf(n_jobs=-1, parallel_backend='loky', max_nbytes='100M', verbose=False)[source]#
Bases:
object
- PARALLEL_BACKEND_OPTIONS = {'loky', 'multiprocessing', 'threading'}#
- TARGET_STATS = {'betweenness_centrality', 'degree', 'hubs_authorities', 'in_closeness', 'in_degree', 'in_weight', 'out_closeness', 'out_degree', 'out_weight', 'page_rank'}#
- property attributes#
- create_graph(X, node_ids=None, attributes=None, target_nodes=None, alpha=None, use_weighted_value=True, **kwargs)[source]#
Create a graph from a matrix. The matrix can either be sparse or dense. Note that the matrix cannot have entries on the diagonal. A diagonal entry would cause a loop in the graph.
- Parameters:
X (np.ndarray, scipy.sparse, pydata.sparse) – A dense or sparse 2-dimensional matrix. Non-zero entries in the matrix serve as an edge between the row, col entry.
node_ids (dict, optional) – A map that can be used to replace the enumerated node identifiers with custom ids in the graph.
attributes (dict, optional) – A map that contains the attributes for the graph.
target_nodes (list, NoneType) – List of node ids from which to create a subgraph. If the intersection of target_nodes and node_ids is None, an empty graph will be created. If target_nodes is None, no subgraph will be taken. Default=None.
alpha (int, NoneType) – A masking value that is used to remove noise from the matrix prior to creating the graph. For an entry (i,j) into the matrix X, the entry is zeroed if it is less than alpha. Default=None.
use_weighted_value (bool) – Flag that determines if an edge in the graph is given a custom weight. If True, the weight for an edge between nodes i and j in the graph is the value at X(i,j). If false, every edge has a weight of 1. Default=True.
kwargs (dict) – Arbitrary keyword arguments that will be stored as attributes of the graph. If the keyword is a reserved member variable of Graph, a ValueError will be thrown.
- Returns:
graph – The created graph object. The output is also stored in the graphs attribute of the Wolf object.
- Return type:
Graph
- get_community_ranks(X, A, stats=None, node_ids=None, attributes=None, top_n=10, alpha=None)[source]#
- get_ranks(X, stats=None, slice_ids=None, node_ids=None, attributes=None, alpha=None, use_weighted_value=True, n_jobs=-1)[source]#
Compute graph statistics for a tensor.
For each slice of a tensor, create a graph and evaluate said graph.
- Parameters:
X (np.ndarray, pydata.sparse) – A dense or sparse 3-dimensional tensor
stats (list, dict) – The graph stats that should be calculated for the given tensor. If a list, each item in the list should be a valid Graph stat function name. The arguments for this function will be kept as default. If a dict, the keys should be the function names and the values should be dicts of keyword arguments that should be passed to the respective stat function. If None, all stats will be computed using the default arguments. Default=None.
slice_ids (dict, NoneType) – A map of labels for the third dimension of the tensor. The keys should be consecutive integers starting with 0 and counting up to o for a tensor X of shape m by n by o. If slice_ids is None, the enumeration of the slices will be used as the id. >>> slice_ids = {0: 2020, 1: 2021, 2: 2022, 3: 2023}
node_ids (dict, [dict, dict], NoneType) – A map that can be used to replace the enumerated node identifiers with custom ids in the graph. The input is expected to be a dictionary, a list or tuple of two dictionaries, or None. If the input is a dict, it is assumed that both matrix dimensions share the same ids. If the input is a list/tuple the first dict is used for the first dimension and the second for the second. If the input is None, the enumerated ids are not replaced. Note that the keys of the value should be sequential integers representing the original enumerated ids and the values should be the replacement ids. This argument is intended for use cases where the entries in the matrix have unique identifiers that are more specific than the enumerated index values of the matrix. For example, in the case of an documents x documents matrix, the enumerated ids can be replaced with document ids. Default=None.
attributes (dict, [dict, dict], NoneType) – A map that contains the attributes for the graph. The input is expected to be a dictionary, a list or tuple of two dictionaries, or None. If the input is a dict, it is assumed that both matrix dimensions share the same attributes. If the input is a list/tuple the first dict is used for the first dimension and the second for the second. The key in this map is the id of the node that will house the particular attribute(s). The values are dicts where the key is the name of the attribute and the value is the attribute value. The node id for this argument should match the format used by node_ids. If no attributes are provided, no attributes will be set on the graph. Default=None.
alpha (int, NoneType) – A masking value that is used to remove noise from the matrix prior to creating the graph. For an entry (i,j) into the matrix X, the entry is zeroed if it is less than alpha. Default=None.
use_weighted_value (bool) – Flag that determines if an edge in the graph is given a custom weight. If True, the weight for an edge between nodes i and j in the graph is the value at X(i,j). If false, every edge has a weight of 1. Default=True.
- Returns:
A dictionary where the keys are slices (either enumerated ids or more specific ids provided by the dimension map slice_ids). The values are pandas DataFrames that contain graph statistics for the corresponding slice of the tensor.
- Return type:
dict
- property n_jobs#
- property node_ids#
- property parallel_backend#
- property verbose#