Database Models Module

The database models module defines SQLAlchemy ORM models for all database tables. These models support both SQLite and PostgreSQL backends.

Models

Paper — research paper with metadata (title, authors, abstract, year, conference, etc.)
EmbeddingsMetadata — tracks the embedding model used to generate vector embeddings
ClusteringCache — caches clustering results with visualization coordinates
HierarchicalLabelCache — caches hierarchical cluster labels
ValidationData — stores donated validation data for evaluation
ChatDonation — stores donated chat interaction data
EvalQAPair — evaluation question-answer pairs
EvalResult — evaluation run results

Quick Start

from abstracts_explorer.db_models import Paper, EmbeddingsMetadata, Base

# The models are typically used through DatabaseManager,
# but can be used directly with SQLAlchemy sessions:
from sqlalchemy import create_engine
from sqlalchemy.orm import Session

engine = create_engine("sqlite:///papers.db")
Base.metadata.create_all(engine)

with Session(engine) as session:
    paper = Paper(
        title="Example Paper",
        abstract="An example abstract.",
        year=2025,
        conference="NeurIPS",
    )
    session.add(paper)
    session.commit()

API Reference

Database Models

This module defines SQLAlchemy ORM models for the database tables. These models support both SQLite and PostgreSQL backends.

class abstracts_explorer.db_models.Base(**kwargs)[source]

Bases: DeclarativeBase

Base class for all database models.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.Paper(**kwargs)[source]

Bases: Base

Paper model representing a research paper.

This uses the lightweight schema from LightweightPaper model.

Variables:

uid (str) – Unique identifier (hash-based, primary key).
original_id (str, optional) – Original ID from the source (e.g., OpenReview ID).
title (str) – Paper title.
authors (str, optional) – Semicolon-separated list of author names.
abstract (str, optional) – Paper abstract.
session (str, optional) – Conference session name.
poster_position (str, optional) – Poster position identifier.
paper_pdf_url (str, optional) – URL to paper PDF.
poster_image_url (str, optional) – URL to poster image.
url (str, optional) – General URL for the paper.
room_name (str, optional) – Room name for presentation.
keywords (str, optional) – Comma-separated keywords.
starttime (str, optional) – Start time of presentation.
endtime (str, optional) – End time of presentation.
award (str, optional) – Award received (e.g., “Best Paper”).
year (int, optional) – Publication year.
conference (str, optional) – Conference name (e.g., “NeurIPS”, “ICLR”).
created_at (datetime) – Timestamp when record was created.

uid

original_id

title

authors

abstract

session

poster_position

paper_pdf_url

poster_image_url

url

room_name

keywords

starttime

endtime

award

year

conference

created_at

__repr__()[source]

String representation of Paper.

Return type:: str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.EmbeddingsMetadata(**kwargs)[source]

Bases: Base

Embeddings metadata model.

Tracks which embedding model was used for the vector embeddings.

Variables:

id (int) – Auto-incrementing primary key.
embedding_model (str) – Name of the embedding model used.
created_at (datetime) – Timestamp when record was created.
updated_at (datetime) – Timestamp when record was last updated.

id

embedding_model

created_at

updated_at

__repr__()[source]

String representation of EmbeddingsMetadata.

Return type:: str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.ClusteringCache(**kwargs)[source]

Bases: Base

Clustering cache model.

Stores cached clustering results including visualization coordinates. When only the dimensionality reduction method changes, the clustering results (assignments, labels, hierarchy) are reused and only the reduction is re-applied, avoiding expensive re-clustering.

Variables:

id (int) – Auto-incrementing primary key.
embedding_model (str) – Name of the embedding model used.
conference (str, optional) – Conference name this cache entry is scoped to (e.g., ‘NeurIPS’).
year (int, optional) – Conference year this cache entry is scoped to.
reduction_method (str) – Dimensionality reduction method used (e.g., ‘pca’, ‘tsne’).
n_components (int) – Number of dimensions after reduction.
clustering_method (str) – Clustering algorithm used (e.g., ‘kmeans’, ‘dbscan’).
n_clusters (int, optional) – Actual number of clusters in the cached results.
clustering_params (str) – JSON string of additional clustering parameters.
results_json (str) – JSON string containing full clustering results including points with visualization coordinates.
created_at (datetime) – Timestamp when cache was created.

id

embedding_model

conference

year

reduction_method

n_components

clustering_method

n_clusters

clustering_params

results_json

created_at

__repr__()[source]

String representation of ClusteringCache.

Return type:: str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.HierarchicalLabelCache(**kwargs)[source]

Bases: Base

Hierarchical label cache model.

Stores cached hierarchical cluster labels for agglomerative clustering. Labels are independent of the number of clusters or distance threshold and are reused for all agglomerative clustering settings that share the same embedding model and linkage method.

Variables:

id (int) – Auto-incrementing primary key.
embedding_model (str) – Name of the embedding model used.
linkage (str) – Linkage method used in agglomerative clustering (e.g., ‘ward’).
labels_json (str) – JSON string mapping node IDs to their generated labels.
created_at (datetime) – Timestamp when cache was created.

id

embedding_model

linkage

labels_json

created_at

__repr__()[source]

String representation of HierarchicalLabelCache.

Return type:: str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.ValidationData(**kwargs)[source]

Bases: Base

Validation data model.

Stores anonymized user-donated data about interesting papers for validation and service improvement purposes.

Variables:

id (int) – Auto-incrementing primary key.
paper_uid (str) – Paper UID reference (anonymized - no direct user identification).
priority (int) – User-assigned priority/rating (1-5).
search_term (str, optional) – Search term or context associated with this paper.
donated_at (datetime) – Timestamp when data was donated.

id

paper_uid

priority

search_term

donated_at

__repr__()[source]

String representation of ValidationData.

Return type:: str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.ChatDonation(**kwargs)[source]

Bases: Base

Chat donation model.

Stores anonymized user-donated chat transcripts with thumbs up/down feedback for improving the chat system.

Variables:

id (int) – Auto-incrementing primary key.
rating (str) – User feedback rating (‘up’ or ‘down’).
transcript (str) – JSON string containing the chat transcript (list of messages).
donated_at (datetime) – Timestamp when data was donated.

id

rating

transcript

donated_at

__repr__()[source]

String representation of ChatDonation.

Return type:: str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.EvalQAPair(**kwargs)[source]

Bases: Base

Evaluation query/answer pair.

Stores queries and their expected answers for automatic evaluation of the RAG system. Supports multi-turn conversations via conversation_id and turn_number.

Variables:

id (int) – Auto-incrementing primary key.
conversation_id (str) – Groups related queries in a conversation. All turns in the same conversation share this ID.
turn_number (int) – Position within the conversation (0 = initial query, 1+ = follow-ups).
query (str) – The user query text.
expected_answer (str) – The expected/reference answer.
tool_name (str, optional) – The MCP tool expected to be invoked for this query.
verified (int) – Verification status: 0 = unverified, 1 = verified/approved, -1 = rejected/deleted.
source_info (str, optional) – JSON string with metadata about how the pair was generated (e.g. paper UIDs used, generation model).
created_at (datetime) – Timestamp when the pair was created.
updated_at (datetime) – Timestamp when the pair was last modified.

id

conversation_id

turn_number

query

expected_answer

tool_name

verified

source_info

created_at

updated_at

__repr__()[source]

String representation of EvalQAPair.

Return type:: str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.EvalResult(**kwargs)[source]

Bases: Base

Evaluation run result for a single Q/A pair.

Stores the actual output from the RAG system when evaluated against a stored EvalQAPair, together with scoring metrics.

Variables:

id (int) – Auto-incrementing primary key.
run_id (str) – Identifier grouping results from the same evaluation run.
qa_pair_id (int) – ID of the EvalQAPair that was evaluated.
actual_answer (str, optional) – The answer produced by the RAG system.
actual_tool_name (str, optional) – The MCP tool actually invoked by the RAG system.
answer_score (float, optional) – LLM-judged quality score (1–5 scale).
tool_correct (int, optional) – Whether the correct tool was used (1 = yes, 0 = no).
latency_ms (int, optional) – Wall-clock time for the query in milliseconds.
error (str, optional) – Error message if the query failed.
judge_reasoning (str, optional) – The LLM judge’s reasoning for the assigned score.
created_at (datetime) – Timestamp when the result was recorded.

id

run_id

qa_pair_id

actual_answer

actual_tool_name

answer_score

tool_correct

latency_ms

error

judge_reasoning

created_at

__repr__()[source]

String representation of EvalResult.

Return type:: str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.