MCP Server Module

The MCP server module provides a Model Context Protocol (MCP) server that exposes tools for analyzing clustered conference paper embeddings. It enables LLM-based assistants to answer questions about conference paper topics, trends, and developments.

Features

  • Analyze frequently mentioned topics from clustered papers

  • Track topic evolution across conference years

  • Search for papers related to specific topics

  • Generate cluster visualization data

  • Analyze topic relevance across conferences

Quick Start

from abstracts_explorer.mcp_server import (
    get_conference_topics,
    get_topic_evolution,
    search_papers,
)

# Get main topics across conferences
topics = get_conference_topics(n_clusters=8)

# Track evolution of a topic
evolution = get_topic_evolution(
    topic_keywords="transformer attention",
    start_year=2020,
    end_year=2025,
)

# Search for papers on a topic
results = search_papers(
    topic_keywords="reinforcement learning",
    n_results=10,
)

See the MCP Server Guide for usage with LLM assistants.

API Reference

MCP Server for Cluster Analysis

This module provides a Model Context Protocol (MCP) server that exposes tools for analyzing clustered embeddings. The server enables LLM-based assistants to answer questions about conference paper topics, trends, and developments.

Features: - Get most frequently mentioned topics from clusters - Analyze topic evolution over years - Find recent developments in specific topics - Generate cluster visualizations

exception abstracts_explorer.mcp_server.ClusterAnalysisError[source]

Bases: Exception

Exception raised for cluster analysis errors.

abstracts_explorer.mcp_server.load_clustering_data(collection_name=None)[source]

Load clustering data and database.

Parameters:

collection_name (str, optional) – Name of the ChromaDB collection

Returns:

Clustering manager and database manager instances

Return type:

tuple[ClusteringManager, DatabaseManager]

Raises:

ClusterAnalysisError – If loading fails

abstracts_explorer.mcp_server.analyze_cluster_topics(cm, db, cluster_id, use_llm=False)[source]

Analyze a single topic (cluster) and return a concise summary.

Each cluster represents a conference topic. The returned dictionary is designed to be consumed directly by an LLM — field names use the word topic instead of cluster so the model does not need to know about the underlying clustering implementation.

Parameters:
  • cm (ClusteringManager) – Clustering manager with loaded data

  • db (DatabaseManager) – Database manager for paper metadata

  • cluster_id (int) – Internal cluster ID to analyze

  • use_llm (bool, optional) – Whether to use LLM for topic extraction (default: False)

Returns:

Dictionary containing: - topic: Human-readable topic name (or None) - paper_count: Number of papers in this topic - keywords: Representative keywords for the topic - sample_titles: A few example paper titles

Return type:

dict

abstracts_explorer.mcp_server.get_conference_topics(conferences=None, years=None, collection_name=None, **kwargs)[source]

Get the main research topics of a conference.

Returns the key research topics covered at the conference, each with a descriptive name, representative keywords, paper count, and example paper titles. A conference must be specified.

When multiple conferences are provided, each conference is analyzed individually and results are combined.

Parameters:
  • conferences (list of str, optional) – Conference names (e.g. [“NeurIPS”]). Required – returns an error when not provided.

  • years (list of int, optional) – Filter by publication years.

  • collection_name (str, optional) – Name of ChromaDB collection (uses config default if not provided).

  • **kwargs – Ignored (for backwards compatibility with old tool schemas).

Returns:

JSON string containing the conference topics analysis.

Return type:

str

abstracts_explorer.mcp_server.merge_where_clause_with_conference(where, conference)[source]

Merge a WHERE clause with a conference filter.

This helper function properly combines custom WHERE clauses with conference filters, avoiding duplicates and handling nested operators correctly.

Parameters:
  • where (dict, optional) – Custom WHERE clause from user

  • conference (str, optional) – Conference name to filter by

Returns:

Merged WHERE clause, or None if both inputs are None

Return type:

dict or None

Raises:

ValueError – If WHERE clause is not a dict

abstracts_explorer.mcp_server.merge_where_clause_with_years(where, years)[source]

Merge a WHERE clause with a years filter.

This helper function properly combines custom WHERE clauses with a years filter, avoiding duplicates and handling nested operators correctly.

Parameters:
  • where (dict, optional) – Custom WHERE clause from user

  • years (list of int, optional) – List of years to filter by

Returns:

Merged WHERE clause, or None if both inputs are None

Return type:

dict or None

Raises:

ValueError – If WHERE clause is not a dict

abstracts_explorer.mcp_server.get_topic_evolution(topic_keywords, conferences=None, start_year=None, end_year=None, distance_threshold=1.1, collection_name=None)[source]

Analyze how topics have evolved over the years for one or more conferences.

For each conference and year in the given range, this tool uses EmbeddingsManager.find_papers_within_distance() to count how many papers are semantically close to the topic keywords. It also computes the relative percentage of matching papers with respect to the total number of papers for that conference and year. At least one conference must be specified.

The chat frontend can use the returned data to generate a plot with plotly.js showing the topic evolution over time.

Parameters:
  • topic_keywords (str) – Keywords describing the topic to analyze (e.g., “transformers attention”)

  • conferences (list of str, optional) – Conference names to analyze (e.g., [“NeurIPS”, “ICLR”]). Required – returns an error when not provided.

  • start_year (int, optional) – Start year for analysis (inclusive)

  • end_year (int, optional) – End year for analysis (inclusive)

  • distance_threshold (float, optional) – Maximum Euclidean distance in embedding space to consider papers relevant (default: 1.1). Lower values mean stricter matching.

  • collection_name (str, optional) – Name of ChromaDB collection

Returns:

JSON string containing topic evolution analysis with per-conference year_counts, year_relative (percentage), and year_totals.

Return type:

str

abstracts_explorer.mcp_server.search_papers(topic_keywords, years=None, n_results=10, conference=None, where=None, collection_name=None)[source]

Search for papers on a specific topic.

This tool searches for the most relevant papers about a topic, optionally filtered by specific years. A conference must be specified.

Parameters:
  • topic_keywords (str) – Keywords describing the topic (e.g., “large language models”)

  • years (list of int, optional) – List of specific years to filter by (e.g., [2024, 2025]). If None, searches all years.

  • n_results (int, optional) – Number of papers to return (default: 10)

  • conference (str, optional) – Conference name to filter by (e.g., “NeurIPS”, “ICLR”). Required – returns an error when not provided.

  • where (dict, optional) – Custom ChromaDB WHERE clause for filtering results by metadata. Supports ChromaDB query operators like $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin. Logical operators $and, $or are also supported. Examples: {"year": 2025}, {"session": {"$in": ["Oral Session 1", "Oral Session 2"]}}, {"$and": [{"year": {"$gte": 2024}}, {"conference": "NeurIPS"}]}. Note: If ‘conference’ parameter is provided, it will be merged with this WHERE clause.

  • collection_name (str, optional) – Name of ChromaDB collection

Returns:

JSON string containing search results

Return type:

str

abstracts_explorer.mcp_server.get_paper_details(title=None, paper_id=None, conference=None, year=None, limit=5)[source]

Get detailed information about papers from the database. Use for folow-up questions after searching for papers using semantic search.

Returns full paper metadata including authors, URLs, PDF links, session info, keywords, awards, and other details stored in the database.

At least one of title or paper_id must be provided.

Parameters:
  • title (str, optional) – Title or partial title to search for (case-insensitive).

  • paper_id (str, optional) – Unique paper identifier (uid or original conference/OpenReview ID). When provided, performs an exact lookup and ignores title.

  • conference (str, optional) – Filter results by conference name (e.g., “NeurIPS”, “ICLR”). Only applied when searching by title.

  • year (int, optional) – Filter results by publication year. Only applied when searching by title.

  • limit (int, optional) – Maximum number of papers to return when searching by title (default: 5).

Returns:

JSON string with fields:

  • papers_found – number of papers returned

  • papers – list of paper dicts, each containing: title, authors (list), abstract, url, paper_pdf_url, poster_image_url, session, room_name, starttime, endtime, poster_position, keywords, award, year, conference, original_id

Return type:

str

abstracts_explorer.mcp_server.analyze_topic_relevance(topic, distance_threshold=1.1, conferences=None, years=None, collection_name=None)[source]

Analyze the relevance of a topic by counting papers within a specified distance in embedding space.

This tool measures topic relevance by finding papers semantically similar to the topic within a specified Euclidean distance threshold. It’s useful for identifying how prevalent or relevant a research topic is at a conference. A conference must be specified.

Parameters:
  • topic (str) – The topic or research question to analyze (e.g., “Uncertainty quantification”, “Graph neural networks”, “Transformer architectures”)

  • distance_threshold (float, optional) – Maximum Euclidean distance in embedding space to consider papers relevant (default: 1.1). Lower values mean stricter matching. Typical range: 0.5-2.0 for normalized embeddings.

  • conferences (list of str, optional) – Conference names to filter by (e.g., [“NeurIPS”, “ICLR”]). Required – returns an error when not provided.

  • years (list of int, optional) – Filter results to specific years (e.g., [2024, 2025])

  • collection_name (str, optional) – Name of ChromaDB collection (uses config default if not provided)

Returns:

JSON string containing: - topic: The topic analyzed - distance_threshold: Distance threshold applied - total_papers: Number of papers found within distance - total_considered: Total number of filtered papers considered - conferences: Conferences represented (with counts) - years: Years represented (with counts) - sample_papers: Sample of closest papers with titles and distances - relevance_score: Percentage of filtered papers within distance (0-100)

Return type:

str

Examples

Topic: “Uncertainty quantification” Result: 75 papers found within distance 1.1 Interpretation: High relevance - this is a significant topic at the conference

Query: “Quantum machine learning” Result: 3 papers found within distance 1.1 Interpretation: Low relevance - emerging or niche topic

abstracts_explorer.mcp_server.get_cluster_visualization(conferences=None, years=None, output_path=None, collection_name=None, **kwargs)[source]

Retrieve pre-computed visualization data for clustered embeddings.

This tool looks up cached clustering results (pre-generated via CLI) and returns data suitable for visualization. A conference must be specified.

When multiple conferences are provided, each conference is looked up individually and results are combined.

The chat frontend can use the returned data to generate a plot with plotly.js showing the clusters.

Parameters:
  • conferences (list of str, optional) – Conference names to retrieve clusters for (e.g. [“NeurIPS”]). Required – returns an error when not provided.

  • years (list of int, optional) – Filter by publication years.

  • output_path (str, optional) – Path to save visualization JSON file (optional).

  • collection_name (str, optional) – Name of ChromaDB collection.

  • **kwargs – Ignored (for backwards compatibility with old tool schemas).

Returns:

JSON string containing visualization data with points, clusters, and statistics.

Return type:

str

abstracts_explorer.mcp_server.run_mcp_server(host='127.0.0.1', port=8000, transport='sse')[source]

Run the MCP server.

Parameters:
  • host (str, optional) – Host address to bind to (default: “127.0.0.1”)

  • port (int, optional) – Port to listen on (default: 8000)

  • transport (str, optional) – Transport method: ‘sse’ or ‘stdio’ (default: ‘sse’)

Return type:

None

Examples

>>> run_mcp_server(host="0.0.0.0", port=8000)