Paper Utilities Module

The paper utilities module provides shared helper functions for formatting papers from various sources (database, search results, ChromaDB) with consistent structure and error handling.

Quick Start

from abstracts_explorer.paper_utils import (
    format_search_results,
    build_context_from_papers,
)
from abstracts_explorer.database import DatabaseManager

db = DatabaseManager()

# Format search results for display
formatted = format_search_results(search_results, db)

# Build a context string for RAG
context = build_context_from_papers(papers)

API Reference

Paper formatting utilities for NeurIPS abstracts.

This module provides shared utilities for formatting papers from various sources (database, search results, ChromaDB) with consistent structure and error handling.

exception abstracts_explorer.paper_utils.PaperFormattingError[source]

Bases: Exception

Exception raised when paper formatting fails.

abstracts_explorer.paper_utils.format_search_results(search_results, database, include_documents=True)[source]

Format ChromaDB search results into complete paper records.

Converts search results from ChromaDB into fully-populated paper dictionaries by fetching complete data from the database. Fails early if required data is missing rather than returning incomplete records.

Parameters:
  • search_results (dict) – Search results from ChromaDB with ‘ids’, ‘distances’, ‘metadatas’, ‘documents’.

  • database (DatabaseManager) – Database instance for fetching complete paper details.

  • include_documents (bool, optional) – Whether to include document text (abstract) from search results, by default True.

Returns:

List of complete paper dictionaries with authors, similarity scores, and all fields.

Return type:

list

Raises:

PaperFormattingError – If search_results format is invalid or required data is missing.

Examples

>>> results = embeddings_manager.search_similar("transformers", n_results=5)
>>> papers = format_search_results(results, database)
>>> for paper in papers:
...     print(paper['title'], paper['similarity'])
abstracts_explorer.paper_utils.build_context_from_papers(papers)[source]

Build a formatted context string from papers for RAG.

Parameters:

papers (list) – List of paper dictionaries with at minimum: title/name, authors, abstract.

Returns:

Formatted context string for LLM consumption.

Return type:

str

Raises:

PaperFormattingError – If papers list is invalid or papers missing required fields.

Examples

>>> papers = format_search_results(results, database)
>>> context = build_context_from_papers(papers)
>>> print(context)
abstracts_explorer.paper_utils.extract_top_keywords(papers, n_keywords=5)[source]

Extract top keywords from a list of papers using TF-IDF.

Parameters:
  • papers (list) – List of paper dicts, each with optional ‘title’ and ‘abstract’ keys.

  • n_keywords (int, optional) – Number of top keywords to return (default: 5).

Returns:

List of keyword strings, ordered by relevance (highest TF-IDF first).

Return type:

list