# RAG Module The RAG (Retrieval-Augmented Generation) module provides a chat interface for querying papers using LLMs. ## Overview The `RAGChat` class implements: - Retrieval-Augmented Generation for paper queries - Conversation history management - Integration with LM Studio LLM backend - Context building from relevant papers - **NEW: MCP Clustering Tools Integration** - Automatic tool calling for topic analysis ## Class Reference ```{eval-rst} .. automodule:: abstracts_explorer.rag :members: :undoc-members: :show-inheritance: :special-members: __init__ ``` ## Usage Examples ### Basic Setup ```python from abstracts_explorer.embeddings import EmbeddingsManager from abstracts_explorer.rag import RAGChat # Initialize embeddings manager em = EmbeddingsManager() # Initialize RAG chat chat = RAGChat( embeddings_manager=em, lm_studio_url="http://localhost:1234", model="gemma-3-4b-it-qat" ) ``` ### Simple Query ```python # Ask a question response = chat.query( "What are the latest developments in transformer architectures?" ) print(response) ``` ### Conversation ```python # Start a conversation response1 = chat.query("Tell me about vision transformers") print(response1) # Continue the conversation (maintains context) response2 = chat.chat("What are their main advantages?") print(response2) response3 = chat.chat("Can you explain the first paper in more detail?") print(response3) ``` ### Custom Parameters ```python # Query with custom settings response = chat.query( query="Explain self-attention mechanisms", n_results=10, # Use 10 papers for context temperature=0.8, # More creative responses max_tokens=2000, # Longer responses system_prompt="You are a helpful research assistant." ) ``` ## MCP Clustering Tools Integration **Coming in v0.3.0**: RAGChat now integrates with MCP clustering tools, allowing the LLM to automatically analyze conference topics, trends, and developments. ### What Are MCP Tools? MCP (Model Context Protocol) tools are specialized functions that the LLM can call to perform specific tasks. The RAG system includes four clustering tools: 1. **get_cluster_topics** - Analyze overall conference topics 2. **get_topic_evolution** - Track how topics evolve over time 3. **get_recent_developments** - Find recent papers in specific areas 4. **get_cluster_visualization** - Generate cluster visualizations ### How It Works When MCP tools are enabled (default), the LLM automatically decides when to use clustering tools based on the user's question: ```python from abstracts_explorer.embeddings import EmbeddingsManager from abstracts_explorer.database import DatabaseManager from abstracts_explorer.rag import RAGChat # Initialize components em = EmbeddingsManager() em.connect() db = DatabaseManager() db.connect() # Create RAG chat with MCP tools enabled (default) chat = RAGChat(em, db, enable_mcp_tools=True) # Ask about general topics - LLM will use clustering tools response = chat.query("What are the main research topics at NeurIPS?") print(response) # The LLM automatically calls get_cluster_topics() and analyzes the results # Ask about trends - LLM uses topic evolution tool response = chat.query("How have transformers evolved at NeurIPS over the years?") print(response) # The LLM calls get_topic_evolution(topic_keywords="transformers") # Ask about recent work - LLM uses recent developments tool response = chat.query("What are the latest papers on reinforcement learning?") print(response) # The LLM calls get_recent_developments(topic_keywords="reinforcement learning") ``` ### Tool Selection The LLM automatically decides which tool(s) to use based on the question: - **Questions about "main topics", "themes", "areas"** → Uses `get_cluster_topics` - **Questions about "evolution", "trends", "over time"** → Uses `get_topic_evolution` - **Questions about "recent", "latest", "new"** → Uses `get_recent_developments` - **Questions about specific papers** → Uses standard RAG (no tools) ### Disabling MCP Tools If you want to disable MCP tools and use only standard RAG: ```python # Disable MCP tools chat = RAGChat(em, db, enable_mcp_tools=False) # Now all queries use only paper embeddings search response = chat.query("What are the main topics?") # Will search for relevant papers instead of clustering ``` ### Combining Tools with RAG The LLM can use both tools AND paper retrieval in the same query: ```python # Complex query that might use both response = chat.query( "What are the main topics at NeurIPS, and can you explain " "the attention mechanism paper in detail?" ) # The LLM might: # 1. Call get_cluster_topics() to identify main topics # 2. Search for "attention mechanism" papers # 3. Combine both to generate comprehensive answer ``` ### Tool Call Examples Here are example questions that trigger different tools: ```python # Cluster analysis chat.query("What topics are covered in the conference?") chat.query("Show me the main research areas") chat.query("Analyze the distribution of papers by topic") # Topic evolution chat.query("How has deep learning research changed over time?") chat.query("Show me the trend for transformer papers from 2020-2025") chat.query("Has interest in GANs increased or decreased?") # Recent developments chat.query("What are the latest papers on LLMs?") chat.query("Show me recent work in computer vision") chat.query("What's new in the last 2 years for neural architecture search?") # Specific papers (no tools, standard RAG) chat.query("Explain the Vision Transformer paper") chat.query("What does the BERT paper say about pre-training?") chat.query("Find papers about attention mechanisms") ``` ### Advanced: Tool Call Debugging To see which tools the LLM is calling, check the logs: ```python import logging # Enable debug logging logging.basicConfig(level=logging.INFO) # Now tool calls will be logged chat = RAGChat(em, db, enable_mcp_tools=True) response = chat.query("What are the main topics?") # Logs: "LLM requested tool: get_cluster_topics with args: {'n_clusters': 8}" ``` ### Requirements MCP tools require: - **Embeddings created**: Run `abstracts-explorer create-embeddings` first - **Papers downloaded**: Have conference data in database - **Tool-capable LLM**: Model must support function calling (e.g., GPT-3.5+, Gemma 3, Claude) If your LLM doesn't support function calling, disable MCP tools: ```python chat = RAGChat(em, db, enable_mcp_tools=False) ``` ### Metadata Filtering ```python # Query papers from specific year response = chat.query( "What are the main themes in 2025?", where={"year": 2025} ) # Multiple filters response = chat.query( "Explain recent attention mechanisms", where={ "year": {"$gte": 2024}, }, n_results=5 ) ``` ## Conversation Management ### Reset Conversation ```python # Clear conversation history chat.reset_conversation() # Start fresh conversation response = chat.query("New topic...") ``` ### Export Conversation ```python # Export to JSON file chat.export_conversation("conversation.json") # Export returns the conversation data conversation_data = chat.export_conversation("chat_history.json") ``` ### Conversation Format Exported conversations include: ```python { "timestamp": "2025-11-26T10:00:00", "model": "gemma-3-4b-it-qat", "messages": [ { "role": "user", "content": "What is a transformer?", "timestamp": "2025-11-26T10:00:00" }, { "role": "assistant", "content": "A transformer is...", "papers_used": [ {"id": "123", "title": "Attention Is All You Need"} ], "timestamp": "2025-11-26T10:00:05" } ] } ``` ## LLM Backend Configuration ### Supported Backends The module is designed for LM Studio but works with any OpenAI-compatible API: - **LM Studio** (default) - OpenAI API - LocalAI - Ollama (with OpenAI compatibility) ### Authentication ```python # With authentication token chat = RAGChat( em, lm_studio_url="https://api.example.com", model="gpt-4", auth_token="sk-..." ) ``` ### Custom Endpoints ```python # Different backend URL chat = RAGChat( em, lm_studio_url="http://localhost:8080", model="custom-model" ) ``` ## Response Generation ### How RAG Works 1. **Retrieve**: Search for relevant papers using embeddings 2. **Augment**: Build context from paper abstracts 3. **Generate**: Send context + query to LLM 4. **Return**: Get AI-generated response with citations ### Context Building The RAG system builds context from retrieved papers: ``` Context includes: - Paper titles - Paper abstracts - Paper years - Relevance scores Formatted for optimal LLM comprehension ``` ### System Prompts Default system prompt: ``` You are a helpful research assistant specializing in NeurIPS papers. Answer questions based on the provided paper abstracts. Cite papers by title when referencing them. ``` Custom system prompt: ```python chat.query( "Your question", system_prompt="You are an expert in computer vision..." ) ``` ## Error Handling ```python try: response = chat.query("What is deep learning?") except requests.RequestException: print("LLM backend connection failed") except ValueError as e: print(f"Invalid response: {e}") except Exception as e: print(f"Error: {e}") ``` ## Performance Considerations ### Response Time Factors affecting response time: - Number of papers retrieved (n_results) - LLM model size and speed - Token generation length (max_tokens) - Network latency to LLM backend ### Memory Usage - Conversation history stored in memory - Each message adds to context - Use `reset_conversation()` for long sessions ### Optimization Tips ```python # Faster responses chat.query(query, n_results=3, max_tokens=500) # More comprehensive but slower chat.query(query, n_results=10, max_tokens=2000) # Balance quality and speed chat.query(query, n_results=5, max_tokens=1000) ``` ## Best Practices 1. **Start specific** - Focused queries get better results 2. **Use filters** - Narrow search space with metadata filters 3. **Manage history** - Reset conversation when changing topics 4. **Export important conversations** - Save valuable interactions 5. **Adjust parameters** - Tune n_results and temperature for your needs 6. **Monitor backend** - Ensure LM Studio/LLM is running and responsive 7. **Handle errors** - Wrap calls in try-except for production use