Paper Utilities Module
The paper utilities module provides shared helper functions for formatting papers from various sources (database, search results, ChromaDB) with consistent structure and error handling.
Quick Start
from abstracts_explorer.paper_utils import (
format_search_results,
build_context_from_papers,
)
from abstracts_explorer.database import DatabaseManager
db = DatabaseManager()
# Format search results for display
formatted = format_search_results(search_results, db)
# Build a context string for RAG
context = build_context_from_papers(papers)
API Reference
Paper formatting utilities for NeurIPS abstracts.
This module provides shared utilities for formatting papers from various sources (database, search results, ChromaDB) with consistent structure and error handling.
- exception abstracts_explorer.paper_utils.PaperFormattingError[source]
Bases:
ExceptionException raised when paper formatting fails.
- abstracts_explorer.paper_utils.format_search_results(search_results, database, include_documents=True)[source]
Format ChromaDB search results into complete paper records.
Converts search results from ChromaDB into fully-populated paper dictionaries by fetching complete data from the database. Fails early if required data is missing rather than returning incomplete records.
- Parameters:
search_results (dict) – Search results from ChromaDB with ‘ids’, ‘distances’, ‘metadatas’, ‘documents’.
database (DatabaseManager) – Database instance for fetching complete paper details.
include_documents (bool, optional) – Whether to include document text (abstract) from search results, by default True.
- Returns:
List of complete paper dictionaries with authors, similarity scores, and all fields.
- Return type:
- Raises:
PaperFormattingError – If search_results format is invalid or required data is missing.
Examples
>>> results = embeddings_manager.search_similar("transformers", n_results=5) >>> papers = format_search_results(results, database) >>> for paper in papers: ... print(paper['title'], paper['similarity'])
- abstracts_explorer.paper_utils.build_context_from_papers(papers)[source]
Build a formatted context string from papers for RAG.
- Parameters:
papers (list) – List of paper dictionaries with at minimum: title/name, authors, abstract.
- Returns:
Formatted context string for LLM consumption.
- Return type:
- Raises:
PaperFormattingError – If papers list is invalid or papers missing required fields.
Examples
>>> papers = format_search_results(results, database) >>> context = build_context_from_papers(papers) >>> print(context)