Database Models Module

The database models module defines SQLAlchemy ORM models for all database tables. These models support both SQLite and PostgreSQL backends.

Models

  • Paper — research paper with metadata (title, authors, abstract, year, conference, etc.)

  • EmbeddingsMetadata — tracks the embedding model used to generate vector embeddings

  • ClusteringCache — caches clustering results with visualization coordinates

  • HierarchicalLabelCache — caches hierarchical cluster labels

  • ValidationData — stores donated validation data for evaluation

  • ChatDonation — stores donated chat interaction data

  • EvalQAPair — evaluation question-answer pairs

  • EvalResult — evaluation run results

Quick Start

from abstracts_explorer.db_models import Paper, EmbeddingsMetadata, Base

# The models are typically used through DatabaseManager,
# but can be used directly with SQLAlchemy sessions:
from sqlalchemy import create_engine
from sqlalchemy.orm import Session

engine = create_engine("sqlite:///papers.db")
Base.metadata.create_all(engine)

with Session(engine) as session:
    paper = Paper(
        title="Example Paper",
        abstract="An example abstract.",
        year=2025,
        conference="NeurIPS",
    )
    session.add(paper)
    session.commit()

API Reference

Database Models

This module defines SQLAlchemy ORM models for the database tables. These models support both SQLite and PostgreSQL backends.

class abstracts_explorer.db_models.Base(**kwargs)[source]

Bases: DeclarativeBase

Base class for all database models.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.Paper(**kwargs)[source]

Bases: Base

Paper model representing a research paper.

This uses the lightweight schema from LightweightPaper model.

Variables:
  • uid (str) – Unique identifier (hash-based, primary key).

  • original_id (str, optional) – Original ID from the source (e.g., OpenReview ID).

  • title (str) – Paper title.

  • authors (str, optional) – Semicolon-separated list of author names.

  • abstract (str, optional) – Paper abstract.

  • session (str, optional) – Conference session name.

  • poster_position (str, optional) – Poster position identifier.

  • paper_pdf_url (str, optional) – URL to paper PDF.

  • poster_image_url (str, optional) – URL to poster image.

  • url (str, optional) – General URL for the paper.

  • room_name (str, optional) – Room name for presentation.

  • keywords (str, optional) – Comma-separated keywords.

  • starttime (str, optional) – Start time of presentation.

  • endtime (str, optional) – End time of presentation.

  • award (str, optional) – Award received (e.g., “Best Paper”).

  • year (int, optional) – Publication year.

  • conference (str, optional) – Conference name (e.g., “NeurIPS”, “ICLR”).

  • created_at (datetime) – Timestamp when record was created.

uid
original_id
title
authors
abstract
session
poster_position
paper_pdf_url
poster_image_url
url
room_name
keywords
starttime
endtime
award
year
conference
created_at
__repr__()[source]

String representation of Paper.

Return type:

str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.EmbeddingsMetadata(**kwargs)[source]

Bases: Base

Embeddings metadata model.

Tracks which embedding model was used for the vector embeddings.

Variables:
  • id (int) – Auto-incrementing primary key.

  • embedding_model (str) – Name of the embedding model used.

  • created_at (datetime) – Timestamp when record was created.

  • updated_at (datetime) – Timestamp when record was last updated.

id
embedding_model
created_at
updated_at
__repr__()[source]

String representation of EmbeddingsMetadata.

Return type:

str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.ClusteringCache(**kwargs)[source]

Bases: Base

Clustering cache model.

Stores cached clustering results including visualization coordinates. When only the dimensionality reduction method changes, the clustering results (assignments, labels, hierarchy) are reused and only the reduction is re-applied, avoiding expensive re-clustering.

Variables:
  • id (int) – Auto-incrementing primary key.

  • embedding_model (str) – Name of the embedding model used.

  • conference (str, optional) – Conference name this cache entry is scoped to (e.g., ‘NeurIPS’).

  • year (int, optional) – Conference year this cache entry is scoped to.

  • reduction_method (str) – Dimensionality reduction method used (e.g., ‘pca’, ‘tsne’).

  • n_components (int) – Number of dimensions after reduction.

  • clustering_method (str) – Clustering algorithm used (e.g., ‘kmeans’, ‘dbscan’).

  • n_clusters (int, optional) – Actual number of clusters in the cached results.

  • clustering_params (str) – JSON string of additional clustering parameters.

  • results_json (str) – JSON string containing full clustering results including points with visualization coordinates.

  • created_at (datetime) – Timestamp when cache was created.

id
embedding_model
conference
year
reduction_method
n_components
clustering_method
n_clusters
clustering_params
results_json
created_at
__repr__()[source]

String representation of ClusteringCache.

Return type:

str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.HierarchicalLabelCache(**kwargs)[source]

Bases: Base

Hierarchical label cache model.

Stores cached hierarchical cluster labels for agglomerative clustering. Labels are independent of the number of clusters or distance threshold and are reused for all agglomerative clustering settings that share the same embedding model and linkage method.

Variables:
  • id (int) – Auto-incrementing primary key.

  • embedding_model (str) – Name of the embedding model used.

  • linkage (str) – Linkage method used in agglomerative clustering (e.g., ‘ward’).

  • labels_json (str) – JSON string mapping node IDs to their generated labels.

  • created_at (datetime) – Timestamp when cache was created.

id
embedding_model
linkage
labels_json
created_at
__repr__()[source]

String representation of HierarchicalLabelCache.

Return type:

str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.ValidationData(**kwargs)[source]

Bases: Base

Validation data model.

Stores anonymized user-donated data about interesting papers for validation and service improvement purposes.

Variables:
  • id (int) – Auto-incrementing primary key.

  • paper_uid (str) – Paper UID reference (anonymized - no direct user identification).

  • priority (int) – User-assigned priority/rating (1-5).

  • search_term (str, optional) – Search term or context associated with this paper.

  • donated_at (datetime) – Timestamp when data was donated.

id
paper_uid
priority
search_term
donated_at
__repr__()[source]

String representation of ValidationData.

Return type:

str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.ChatDonation(**kwargs)[source]

Bases: Base

Chat donation model.

Stores anonymized user-donated chat transcripts with thumbs up/down feedback for improving the chat system.

Variables:
  • id (int) – Auto-incrementing primary key.

  • rating (str) – User feedback rating (‘up’ or ‘down’).

  • transcript (str) – JSON string containing the chat transcript (list of messages).

  • donated_at (datetime) – Timestamp when data was donated.

id
rating
transcript
donated_at
__repr__()[source]

String representation of ChatDonation.

Return type:

str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.EvalQAPair(**kwargs)[source]

Bases: Base

Evaluation query/answer pair.

Stores queries and their expected answers for automatic evaluation of the RAG system. Supports multi-turn conversations via conversation_id and turn_number.

Variables:
  • id (int) – Auto-incrementing primary key.

  • conversation_id (str) – Groups related queries in a conversation. All turns in the same conversation share this ID.

  • turn_number (int) – Position within the conversation (0 = initial query, 1+ = follow-ups).

  • query (str) – The user query text.

  • expected_answer (str) – The expected/reference answer.

  • tool_name (str, optional) – The MCP tool expected to be invoked for this query.

  • verified (int) – Verification status: 0 = unverified, 1 = verified/approved, -1 = rejected/deleted.

  • source_info (str, optional) – JSON string with metadata about how the pair was generated (e.g. paper UIDs used, generation model).

  • created_at (datetime) – Timestamp when the pair was created.

  • updated_at (datetime) – Timestamp when the pair was last modified.

id
conversation_id
turn_number
query
expected_answer
tool_name
verified
source_info
created_at
updated_at
__repr__()[source]

String representation of EvalQAPair.

Return type:

str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class abstracts_explorer.db_models.EvalResult(**kwargs)[source]

Bases: Base

Evaluation run result for a single Q/A pair.

Stores the actual output from the RAG system when evaluated against a stored EvalQAPair, together with scoring metrics.

Variables:
  • id (int) – Auto-incrementing primary key.

  • run_id (str) – Identifier grouping results from the same evaluation run.

  • qa_pair_id (int) – ID of the EvalQAPair that was evaluated.

  • actual_answer (str, optional) – The answer produced by the RAG system.

  • actual_tool_name (str, optional) – The MCP tool actually invoked by the RAG system.

  • answer_score (float, optional) – LLM-judged quality score (1–5 scale).

  • tool_correct (int, optional) – Whether the correct tool was used (1 = yes, 0 = no).

  • latency_ms (int, optional) – Wall-clock time for the query in milliseconds.

  • error (str, optional) – Error message if the query failed.

  • judge_reasoning (str, optional) – The LLM judge’s reasoning for the assigned score.

  • created_at (datetime) – Timestamp when the result was recorded.

id
run_id
qa_pair_id
actual_answer
actual_tool_name
answer_score
tool_correct
latency_ms
error
judge_reasoning
created_at
__repr__()[source]

String representation of EvalResult.

Return type:

str

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.