Database Models Module
The database models module defines SQLAlchemy ORM models for all database tables. These models support both SQLite and PostgreSQL backends.
Models
Paper — research paper with metadata (title, authors, abstract, year, conference, etc.)
EmbeddingsMetadata — tracks the embedding model used to generate vector embeddings
ClusteringCache — caches clustering results with visualization coordinates
HierarchicalLabelCache — caches hierarchical cluster labels
ValidationData — stores donated validation data for evaluation
ChatDonation — stores donated chat interaction data
EvalQAPair — evaluation question-answer pairs
EvalResult — evaluation run results
Quick Start
from abstracts_explorer.db_models import Paper, EmbeddingsMetadata, Base
# The models are typically used through DatabaseManager,
# but can be used directly with SQLAlchemy sessions:
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
engine = create_engine("sqlite:///papers.db")
Base.metadata.create_all(engine)
with Session(engine) as session:
paper = Paper(
title="Example Paper",
abstract="An example abstract.",
year=2025,
conference="NeurIPS",
)
session.add(paper)
session.commit()
API Reference
Database Models
This module defines SQLAlchemy ORM models for the database tables. These models support both SQLite and PostgreSQL backends.
- class abstracts_explorer.db_models.Base(**kwargs)[source]
Bases:
DeclarativeBaseBase class for all database models.
- __init__(**kwargs)
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class abstracts_explorer.db_models.Paper(**kwargs)[source]
Bases:
BasePaper model representing a research paper.
This uses the lightweight schema from LightweightPaper model.
- Variables:
uid (str) – Unique identifier (hash-based, primary key).
original_id (str, optional) – Original ID from the source (e.g., OpenReview ID).
title (str) – Paper title.
authors (str, optional) – Semicolon-separated list of author names.
abstract (str, optional) – Paper abstract.
session (str, optional) – Conference session name.
poster_position (str, optional) – Poster position identifier.
paper_pdf_url (str, optional) – URL to paper PDF.
poster_image_url (str, optional) – URL to poster image.
url (str, optional) – General URL for the paper.
room_name (str, optional) – Room name for presentation.
keywords (str, optional) – Comma-separated keywords.
starttime (str, optional) – Start time of presentation.
endtime (str, optional) – End time of presentation.
award (str, optional) – Award received (e.g., “Best Paper”).
year (int, optional) – Publication year.
conference (str, optional) – Conference name (e.g., “NeurIPS”, “ICLR”).
created_at (datetime) – Timestamp when record was created.
- uid
- original_id
- title
- authors
- abstract
- session
- poster_position
- paper_pdf_url
- poster_image_url
- url
- room_name
- keywords
- starttime
- endtime
- award
- year
- conference
- created_at
- __init__(**kwargs)
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class abstracts_explorer.db_models.EmbeddingsMetadata(**kwargs)[source]
Bases:
BaseEmbeddings metadata model.
Tracks which embedding model was used for the vector embeddings.
- Variables:
- id
- embedding_model
- created_at
- updated_at
- __init__(**kwargs)
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class abstracts_explorer.db_models.ClusteringCache(**kwargs)[source]
Bases:
BaseClustering cache model.
Stores cached clustering results including visualization coordinates. When only the dimensionality reduction method changes, the clustering results (assignments, labels, hierarchy) are reused and only the reduction is re-applied, avoiding expensive re-clustering.
- Variables:
id (int) – Auto-incrementing primary key.
embedding_model (str) – Name of the embedding model used.
conference (str, optional) – Conference name this cache entry is scoped to (e.g., ‘NeurIPS’).
year (int, optional) – Conference year this cache entry is scoped to.
reduction_method (str) – Dimensionality reduction method used (e.g., ‘pca’, ‘tsne’).
n_components (int) – Number of dimensions after reduction.
clustering_method (str) – Clustering algorithm used (e.g., ‘kmeans’, ‘dbscan’).
n_clusters (int, optional) – Actual number of clusters in the cached results.
clustering_params (str) – JSON string of additional clustering parameters.
results_json (str) – JSON string containing full clustering results including points with visualization coordinates.
created_at (datetime) – Timestamp when cache was created.
- id
- embedding_model
- conference
- year
- reduction_method
- n_components
- clustering_method
- n_clusters
- clustering_params
- results_json
- created_at
- __init__(**kwargs)
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class abstracts_explorer.db_models.HierarchicalLabelCache(**kwargs)[source]
Bases:
BaseHierarchical label cache model.
Stores cached hierarchical cluster labels for agglomerative clustering. Labels are independent of the number of clusters or distance threshold and are reused for all agglomerative clustering settings that share the same embedding model and linkage method.
- Variables:
id (int) – Auto-incrementing primary key.
embedding_model (str) – Name of the embedding model used.
linkage (str) – Linkage method used in agglomerative clustering (e.g., ‘ward’).
labels_json (str) – JSON string mapping node IDs to their generated labels.
created_at (datetime) – Timestamp when cache was created.
- id
- embedding_model
- linkage
- labels_json
- created_at
- __init__(**kwargs)
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class abstracts_explorer.db_models.ValidationData(**kwargs)[source]
Bases:
BaseValidation data model.
Stores anonymized user-donated data about interesting papers for validation and service improvement purposes.
- Variables:
id (int) – Auto-incrementing primary key.
paper_uid (str) – Paper UID reference (anonymized - no direct user identification).
priority (int) – User-assigned priority/rating (1-5).
search_term (str, optional) – Search term or context associated with this paper.
donated_at (datetime) – Timestamp when data was donated.
- id
- paper_uid
- priority
- search_term
- donated_at
- __init__(**kwargs)
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class abstracts_explorer.db_models.ChatDonation(**kwargs)[source]
Bases:
BaseChat donation model.
Stores anonymized user-donated chat transcripts with thumbs up/down feedback for improving the chat system.
- Variables:
- id
- rating
- transcript
- donated_at
- __init__(**kwargs)
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class abstracts_explorer.db_models.EvalQAPair(**kwargs)[source]
Bases:
BaseEvaluation query/answer pair.
Stores queries and their expected answers for automatic evaluation of the RAG system. Supports multi-turn conversations via
conversation_idandturn_number.- Variables:
id (int) – Auto-incrementing primary key.
conversation_id (str) – Groups related queries in a conversation. All turns in the same conversation share this ID.
turn_number (int) – Position within the conversation (0 = initial query, 1+ = follow-ups).
query (str) – The user query text.
expected_answer (str) – The expected/reference answer.
tool_name (str, optional) – The MCP tool expected to be invoked for this query.
verified (int) – Verification status: 0 = unverified, 1 = verified/approved, -1 = rejected/deleted.
source_info (str, optional) – JSON string with metadata about how the pair was generated (e.g. paper UIDs used, generation model).
created_at (datetime) – Timestamp when the pair was created.
updated_at (datetime) – Timestamp when the pair was last modified.
- id
- conversation_id
- turn_number
- query
- expected_answer
- tool_name
- verified
- source_info
- created_at
- updated_at
- __init__(**kwargs)
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class abstracts_explorer.db_models.EvalResult(**kwargs)[source]
Bases:
BaseEvaluation run result for a single Q/A pair.
Stores the actual output from the RAG system when evaluated against a stored
EvalQAPair, together with scoring metrics.- Variables:
id (int) – Auto-incrementing primary key.
run_id (str) – Identifier grouping results from the same evaluation run.
qa_pair_id (int) – ID of the
EvalQAPairthat was evaluated.actual_answer (str, optional) – The answer produced by the RAG system.
actual_tool_name (str, optional) – The MCP tool actually invoked by the RAG system.
answer_score (float, optional) – LLM-judged quality score (1–5 scale).
tool_correct (int, optional) – Whether the correct tool was used (1 = yes, 0 = no).
latency_ms (int, optional) – Wall-clock time for the query in milliseconds.
error (str, optional) – Error message if the query failed.
judge_reasoning (str, optional) – The LLM judge’s reasoning for the assigned score.
created_at (datetime) – Timestamp when the result was recorded.
- id
- run_id
- qa_pair_id
- actual_answer
- actual_tool_name
- answer_score
- tool_correct
- latency_ms
- error
- judge_reasoning
- created_at
- __init__(**kwargs)
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.