Usage Guide
This guide covers common usage patterns for Abstracts Explorer.
Basic Workflow
1. Download Papers
Download papers for a specific year:
uv run abstracts-explorer download --year 2025
Options:
--year: Conference year (e.g., 2025)--db-path: Path to SQLite database (will be created if doesn’t exist)--force: Force re-download even if papers already exist
2. Create Embeddings
Generate vector embeddings for semantic search:
uv run abstracts-explorer create-embeddings
Options:
--db-path: Path to SQLite database with papers--embedding-db-path: Path to ChromaDB database (default: from config)--collection-name: Collection name in ChromaDB (default: from config)--force: Recreate embeddings even if they exist
3. Search Papers
Search papers by keyword or semantic similarity:
# Simple search
uv run abstracts-explorer search "transformer architecture"
# Limit results
uv run abstracts-explorer search "reinforcement learning" --limit 10
# Filter by year
uv run abstracts-explorer search "neural networks" --year 2025
4. Chat with Papers (RAG)
Interactive chat interface powered by RAG:
uv run abstracts-explorer chat
In the chat interface:
Ask questions about papers
Get AI-generated responses with paper references
Type
exitorquitto leave
Python API
Database Operations
from abstracts_explorer.database import DatabaseManager
# Open database
db = DatabaseManager()
# Get all papers
papers = db.get_all_papers()
# Search by title
results = db.search_papers(title="transformer")
# Get papers by year
papers_2025 = db.get_papers_by_year(2025)
# Get authors for a paper
authors = db.get_authors_for_paper(paper_id)
Downloading Papers
Use the plugin system to download papers from different conferences:
from abstracts_explorer.plugins import get_plugin
from abstracts_explorer import DatabaseManager
from abstracts_explorer.plugin import LightweightPaper
# Get the NeurIPS plugin
neurips_plugin = get_plugin('neurips')
# Download papers for 2025
papers_data = neurips_plugin.download(year=2025, output_path='neurips_2025.json')
# Convert to LightweightPaper objects
papers = [LightweightPaper(**paper) for paper in papers_data]
# Save to database
db = DatabaseManager()
db.create_tables()
db.add_papers(papers)
Embeddings
from abstracts_explorer.embeddings import EmbeddingsManager
# Initialize embeddings manager
em = EmbeddingsManager(
collection_name="papers"
)
# Create embeddings from database
em.create_embeddings_from_db()
# Search by semantic similarity
results = em.search("transformer attention mechanism", n_results=5)
# Search with metadata filter
results = em.search(
"deep learning",
n_results=10,
where={"year": 2025}
)
RAG Chat
from abstracts_explorer.embeddings import EmbeddingsManager
from abstracts_explorer.rag import RAGChat
# Initialize
em = EmbeddingsManager()
chat = RAGChat(
em,
lm_studio_url="http://localhost:1234",
model="gemma-3-4b-it-qat"
)
# Ask a question
response = chat.query("What are the latest developments in transformers?")
print(response)
# Continue conversation
response = chat.chat("Tell me more about the first paper")
print(response)
# Export conversation
chat.export_conversation("conversation.json")
# Reset conversation
chat.reset_conversation()
Advanced Usage
Batch Processing
Process multiple years:
#!/bin/bash
for year in 2023 2024 2025; do
uv run abstracts-explorer download --year $year
uv run abstracts-explorer create-embeddings
done
Custom Configuration
Use a custom configuration file:
import os
os.environ['PAPER_DB'] = 'custom_papers.db'
os.environ['EMBEDDING_DB_PATH'] = 'custom_embeddings'
from abstracts_explorer.config import get_config
config = get_config()
Programmatic Search
from abstracts_explorer.database import DatabaseManager
db = DatabaseManager()
# Complex search with multiple filters
papers = db.search_papers(
title="learning",
abstract="neural network",
year=2025,
limit=20
)
for paper in papers:
print(f"{paper['title']} - {paper['year']}")