MCP Server for Cluster Analysis
The Abstracts Explorer includes a Model Context Protocol (MCP) server that provides tools for analyzing clustered paper embeddings. This enables LLM-based assistants to answer sophisticated questions about research topics, trends, and developments.
NEW: MCP tools are automatically integrated into the RAG chat system, allowing the LLM to decide when to use clustering analysis versus paper retrieval.
What is MCP?
Model Context Protocol (MCP) is a protocol that allows tools and servers to provide context and capabilities to LLM-based applications. The MCP server exposes tools that can be called by LLM assistants to perform specific tasks.
Features
The MCP server provides four main tools:
1. get_cluster_topics
Analyzes clustered embeddings to identify the most frequently mentioned topics in each cluster.
Parameters:
n_clusters(int): Number of clusters to create (default: 8)reduction_method(str): Dimensionality reduction method - ‘pca’ or ‘tsne’ (default: ‘pca’)clustering_method(str): Clustering algorithm - ‘kmeans’, ‘dbscan’, ‘agglomerative’, ‘spectral’, or ‘fuzzy_cmeans’ (default: ‘kmeans’)embeddings_path(str, optional): Path to ChromaDB embeddings databasecollection_name(str, optional): Name of ChromaDB collectiondb_path(str, optional): Path to SQLite database
Returns: JSON with cluster statistics and topics for each cluster, including:
Keywords and their frequencies
Common sessions
Year distribution
Sample paper titles
Example use case: “What are the most frequently mentioned topics in the conference papers?”
2. get_topic_evolution
Analyzes how specific topics have evolved over the years.
Parameters:
topic_keywords(str): Keywords describing the topic (e.g., “transformers attention”)conference(str, optional): Filter by conference namestart_year(int, optional): Start year for analysisend_year(int, optional): End year for analysisembeddings_path(str, optional): Path to ChromaDB embeddings databasecollection_name(str, optional): Name of ChromaDB collectiondb_path(str, optional): Path to SQLite database
Returns: JSON with topic evolution data including:
Year-by-year paper counts
Sample papers from each year
Relevance scores
Example use case: “How have topics related to ‘transformer architectures’ evolved over the years at NeurIPS?”
3. get_recent_developments
Finds the most important recent developments in a specific topic.
Parameters:
topic_keywords(str): Keywords describing the topicn_years(int): Number of recent years to consider (default: 2)n_results(int): Number of papers to return (default: 10)conference(str, optional): Filter by conference nameembeddings_path(str, optional): Path to ChromaDB embeddings databasecollection_name(str, optional): Name of ChromaDB collectiondb_path(str, optional): Path to SQLite database
Returns: JSON with recent papers including:
Paper titles and abstracts
Years and conferences
Relevance scores
Example use case: “What are the most important recent developments in large language models?”
4. get_cluster_visualization
Generates visualization data for clustered embeddings.
Parameters:
n_clusters(int): Number of clusters (default: 8)reduction_method(str): Reduction method - ‘pca’ or ‘tsne’ (default: ‘tsne’)clustering_method(str): Clustering method - ‘kmeans’, ‘dbscan’, ‘agglomerative’, ‘spectral’, or ‘fuzzy_cmeans’ (default: ‘kmeans’)n_components(int): Number of dimensions - 2 or 3 (default: 2)output_path(str, optional): Path to save visualization JSONembeddings_path(str, optional): Path to ChromaDB embeddings databasecollection_name(str, optional): Name of ChromaDB collectiondb_path(str, optional): Path to SQLite database
Returns: JSON with visualization data including:
Point coordinates (x, y, optionally z)
Cluster assignments
Paper metadata
Statistics
Example use case: “Display a graphical representation of the paper clusters.”
Starting the MCP Server
Basic Usage
Start the server with default settings:
abstracts-explorer mcp-server
This starts the server on http://127.0.0.1:8000 with SSE transport.
Custom Host and Port
Start on a custom host and port:
abstracts-explorer mcp-server --host 0.0.0.0 --port 8080
STDIO Transport
For local CLI integration, use stdio transport:
abstracts-explorer mcp-server --transport stdio
Configuration
The MCP server uses the same configuration as the rest of Abstracts Explorer. Configure via .env file:
# Database Configuration
PAPER_DB=abstracts.db
EMBEDDING_DB_PATH=chroma_db
COLLECTION_NAME=papers
# LLM Backend (for embeddings in tools)
LLM_BACKEND_URL=http://localhost:1234
EMBEDDING_MODEL=text-embedding-qwen3-embedding-4b
Integration with LLM Assistants
The MCP server can be integrated with any MCP-compatible LLM assistant or client. It’s now automatically integrated into the RAG chat system.
RAG Chat Integration (Recommended)
The easiest way to use MCP tools is through the RAG chat system, which automatically calls the appropriate tools:
from abstracts_explorer.rag import RAGChat
from abstracts_explorer.embeddings import EmbeddingsManager
from abstracts_explorer.database import DatabaseManager
# Initialize components
em = EmbeddingsManager()
em.connect()
db = DatabaseManager()
db.connect()
# Create RAG chat (MCP tools enabled by default)
chat = RAGChat(em, db, enable_mcp_tools=True)
# The LLM automatically uses clustering tools when appropriate
response = chat.query("What are the main topics at NeurIPS?")
# Internally calls get_cluster_topics() and uses results
response = chat.query("How have transformers evolved over time?")
# Internally calls get_topic_evolution(topic_keywords="transformers")
See the RAG API documentation for more details.
Claude Desktop Integration
Add to Claude Desktop MCP configuration:
{
"mcpServers": {
"abstracts-explorer": {
"command": "abstracts-explorer",
"args": ["mcp-server", "--transport", "stdio"]
}
}
}
Example Tool Call
When an LLM assistant needs to analyze topics, it can call:
{
"tool": "get_cluster_topics",
"arguments": {
"n_clusters": 8,
"reduction_method": "tsne",
"clustering_method": "kmeans"
}
}
The server will:
Load embeddings from ChromaDB
Perform clustering
Analyze topics in each cluster
Return structured JSON results
API Reference
Tool Response Format
All tools return JSON strings with the following general structure:
{
"statistics": {
"n_clusters": 8,
"total_papers": 1000
},
"clusters": [
{
"cluster_id": 0,
"paper_count": 150,
"keywords": [
{"keyword": "transformer", "count": 45},
{"keyword": "attention", "count": 38}
],
"sessions": [
{"session": "Deep Learning", "count": 100}
],
"years": {"2023": 80, "2024": 70},
"sample_titles": ["Paper 1", "Paper 2", "Paper 3"]
}
]
}
Error Handling
If an error occurs, tools return JSON with an error field:
{
"error": "Failed to load clustering data: Database not found"
}
Requirements
Before using the MCP server, ensure:
Embeddings are created: Run
abstracts-explorer create-embeddingsfirstDatabase exists: Download papers with
abstracts-explorer downloadMCP package installed:
uv syncorpip install mcp>=1.0.0
Troubleshooting
“No embeddings found”
Make sure to create embeddings first:
abstracts-explorer create-embeddings
“Failed to connect to database”
Check that the database paths in .env are correct:
# .env
PAPER_DB=data/abstracts.db
EMBEDDING_DB_PATH=chroma_db
Port already in use
Change the port:
abstracts-explorer mcp-server --port 8001
Advanced Usage
Custom Clustering Parameters
Each tool accepts clustering parameters. Examples:
For DBSCAN:
{
"clustering_method": "dbscan",
"eps": 0.5,
"min_samples": 5
}
For Agglomerative Clustering:
{
"clustering_method": "agglomerative",
"n_clusters": 10,
"linkage": "ward"
}
Or with distance threshold:
{
"clustering_method": "agglomerative",
"distance_threshold": 5.0,
"linkage": "average"
}
For Spectral Clustering:
{
"clustering_method": "spectral",
"n_clusters": 8,
"affinity": "nearest_neighbors",
"n_neighbors": 10
}
For Fuzzy C-Means:
{
"clustering_method": "fuzzy_cmeans",
"n_clusters": 8,
"m": 2.0 # Fuzziness parameter
}
Filtering by Conference
Analyze specific conferences:
# get_topic_evolution arguments
{
"topic_keywords": "neural networks",
"conference": "neurips",
"start_year": 2020,
"end_year": 2024
}
See Also
CLI Reference - Command-line interface documentation
Clustering Guide - Clustering and visualization guide
RAG Chat - Using RAG chat with MCP tools
Configuration - Environment configuration options