TELF.post_processing.ArcticFox: Report generation tool for text data from HNMFk using local LLMs#

Report generation tool for text data from HNMFk using local LLMs.

Available Functions#

ArcticFox.__init__(model[, embedding_model, ...])

ArcticFox.run_full_pipeline(vocab, data_df)

Run any subset of the pipeline while preserving order:

ArcticFox.run_labeling(df, top_words_df, ...)

ArcticFox.run_postprocessing(V, D[, col_name])

ArcticFox.run_stats([process_parents, ...])

Module Contents#

class TELF.post_processing.ArcticFox.arcticfox.ArcticFox(model, embedding_model='SCINCL', distance_metric='cosine', center_metric='centroid', text_cols=None, top_n_words=50, clean_cols_name='clean_title_abstract', col_year='year', col_type='type', col_cluster='cluster', col_cluster_coords='cluster_coordinates', col_similarity='similarity_to_cluster_centroid')[source]#

Bases: object

run_full_pipeline(vocab, data_df, text_column: str | None = None, ollama_model: str = 'llama3.2:3b-instruct-fp16', label_clusters: bool = True, generate_stats: bool = True, generate_visuals: bool = True, process_parents: bool = True, skip_completed: bool = True, label_criteria=None, label_info=None, number_of_labels: int = 5, steps: Sequence[Literal['post', 'label', 'stats']] | None = None)[source]#

Run any subset of the pipeline while preserving order:

‘post’ → post_process_hnmfk ‘label’ → _label_all_clusters (requires ‘post’ artifacts) ‘stats’ → generate_cluster_stats (requires ‘post’ artifacts)

Rules:
  • ‘label’ and/or ‘stats’ can be run without ‘post’ only if artifacts already exist.

  • Order is always post → label → stats, even if you request multiple.

run_labeling(df, top_words_df, ollama_model_name, label_criteria=None, additional_info=None, number_of_labels=5)[source]#
run_postprocessing(V, D, col_name=None, **kwargs)[source]#
run_stats(process_parents=True, skip_completed=True)[source]#