Contributing

Thank you for your interest in contributing to Abstracts Explorer!

Development Setup

1. Clone Repository

git clone <repository-url>
cd abstracts

2. Install uv

If you don’t have uv installed yet:

# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or with pip
pip install uv

3. Install Dependencies

# Install all dependencies including dev, web, and docs
uv sync --all-extras

# Activate the virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

4. Configure Environment

cp .env.example .env
# Edit .env with your settings

Docker Development Setup

For containerized development with Podman or Docker:

Note: The Docker image uses pre-built static vendor files (CSS/JS libraries) that are committed to the repository. Node.js is not required for production containers - it’s only needed for local development if you want to rebuild these vendor files.

Quick Start

# Build and start with Podman
podman-compose up -d

# Or with Docker
docker compose up -d

# Access at http://localhost:5000

Running Commands in Container

# Execute CLI commands
podman-compose exec abstracts-explorer abstracts-explorer download --year 2025

# Run tests
podman-compose exec abstracts-explorer pytest

# Interactive shell
podman-compose exec -it abstracts-explorer /bin/bash

Development Workflow

Make code changes locally
Rebuild container: podman-compose build
Restart services: podman-compose up -d
Test changes in container

See the Docker Guide for more details.

Code Style

Python Style

Follow PEP 8 style guide
Use NumPy-style docstrings
Maximum line length: 88 characters (Black default)
Use type hints where appropriate

Example Function

def search_papers(
    query: str,
    limit: int = 10,
    year: int | None = None
) -> list[dict]:
    """
    Search for papers matching the query.

    Parameters
    ----------
    query : str
        Search query string
    limit : int, optional
        Maximum number of results (default: 10)
    year : int or None, optional
        Filter by conference year (default: None)

    Returns
    -------
    list of dict
        List of paper dictionaries matching the query

    Examples
    --------
    >>> results = search_papers("transformer", limit=5)
    >>> print(len(results))
    5
    """
    # Implementation here
    pass

Testing

Python Testing

Running Python Tests

# Run all tests (excludes slow tests by default)
uv run pytest

# Run with coverage
uv run pytest --cov=src/abstracts_explorer

# Run specific test file
uv run pytest tests/test_database.py

# Run specific test
uv run pytest tests/test_database.py::test_add_paper

# Verbose output
uv run pytest -v

# Show print statements
uv run pytest -s

# Run only slow tests (requires LM Studio)
uv run pytest -m slow

# Run all tests including slow ones
uv run pytest -m ""

Note about slow tests: Tests requiring LM Studio are marked as slow and skipped by default. This allows for faster development cycles. To run slow tests, you need:

LM Studio running at the configured URL (default: http://localhost:1234)
A chat model loaded in LM Studio
Use pytest -m slow to run only slow tests

Python Test Organization

One test file per module: Each source module should have exactly one corresponding test file. This makes tests easy to find and maintain.

Examples:

src/abstracts_explorer/database.py → tests/test_database.py
src/abstracts_explorer/plugin.py → tests/test_plugin.py
src/abstracts_explorer/web_ui/app.py → tests/test_web_ui.py

Shared test code:

tests/conftest.py - Shared pytest fixtures
tests/helpers.py - Shared helper functions

Exception for test types: Different test types may have separate files:

test_integration.py - Cross-module integration tests
test_web_integration.py - Web UI integration tests
test_web_e2e.py - End-to-end browser tests

Writing Python Tests

Use pytest framework
Follow the one test file per module principle
Create unit tests for all new functions
Use fixtures for common setup (defined in conftest.py)
Mock external dependencies (API calls, LLM backends)
Aim for >80% code coverage

Example Python Test

import pytest
from abstracts_explorer.database import DatabaseManager

@pytest.fixture
def db(tmp_path):
    """Create a temporary test database."""
    db_path = tmp_path / "test.db"
    return DatabaseManager(str(db_path))

def test_add_paper(db):
    """Test adding a paper to the database."""
    paper_data = {
        'openreview_id': 'test123',
        'title': 'Test Paper',
        'abstract': 'Test abstract',
        'year': 2025,
    }
    
    paper_id = db.add_paper(paper_data)
    assert paper_id is not None
    
    # Verify paper was added
    paper = db.get_paper_by_id(paper_id)
    assert paper['title'] == 'Test Paper'

JavaScript Testing

The web UI uses Jest for JavaScript unit testing with jsdom for DOM simulation.

Running JavaScript Tests

# Install Node.js dependencies (first time only)
npm install

# Run all JavaScript tests
npm test

# Run with coverage report
npm run test:coverage

# Run tests in watch mode (for development)
npm run test:watch

# Run specific test file
npm test -- chat.test.js

# Run with verbose output
npm test -- --verbose

JavaScript Test Coverage

Current JavaScript test coverage (excluding vendor files):

Overall: ~86% line coverage
Target: >90% coverage for all modules

Coverage by module:

Utility modules: 100% coverage (api-utils, cluster-utils, constants, dom-utils, sort-utils)
Core modules: 70-100% coverage (state, search, chat, tabs, clustering, filters)
Integration modules: Tested via end-to-end browser tests

JavaScript Test Organization

Test files are located in src/abstracts_explorer/web_ui/tests/:

setup.js - Jest configuration and global mocks
app.test.js - Tests for main app initialization
chat.test.js - Chat module tests
clustering.test.js - Clustering visualization tests
clustering-hierarchy.test.js - Hierarchical clustering tests
filters.test.js - Filter panel tests
interesting-papers.test.js - Interesting papers management tests
modules.test.js - Module loading tests
paper-card.test.js - Paper card component tests
search.test.js - Search functionality tests
state.test.js - State management tests
tabs.test.js - Tab navigation tests
utils.test.js - Utility function tests

Writing JavaScript Tests

Test Structure:

import { jest, expect, describe, test, beforeEach } from '@jest/globals';
import { myFunction } from '../static/modules/my-module.js';

describe('My Module', () => {
    beforeEach(() => {
        // Setup DOM
        document.body.innerHTML = `
            <div id="test-element"></div>
        `;
        
        // Mock fetch
        global.fetch = jest.fn();
        
        // Reset mocks
        jest.clearAllMocks();
    });

    test('should perform expected behavior', () => {
        // Arrange
        const input = 'test input';
        
        // Act
        const result = myFunction(input);
        
        // Assert
        expect(result).toBe('expected output');
    });
});

Best Practices:

Mock external dependencies:
- Use global.fetch = jest.fn() to mock API calls
- Mock Plotly for visualization tests
- Mock localStorage for state tests
Setup DOM elements:
- Create minimal DOM structure needed for tests
- Use document.body.innerHTML in beforeEach()
- Clean up with jest.clearAllMocks()
Test behavior, not implementation:
- Focus on what the function does, not how
- Test user-facing behavior
- Verify DOM changes and state updates

Async testing:

test('should load data', async () => {
    global.fetch.mockResolvedValueOnce({
        json: async () => ({ data: 'test' })
    });
    
    await loadData();
    
    expect(fetch).toHaveBeenCalled();
});

Event handling:

test('should handle click event', () => {
    const button = document.getElementById('my-button');
    const mockHandler = jest.fn();
    
    button.addEventListener('click', mockHandler);
    button.click();
    
    expect(mockHandler).toHaveBeenCalled();
});

Example JavaScript Test

import { jest, expect, describe, test, beforeEach } from '@jest/globals';
import { searchPapers } from '../static/modules/search.js';

describe('Search Module', () => {
    beforeEach(() => {
        document.body.innerHTML = `
            <input id="search-input" value="transformers" />
            <div id="search-results"></div>
        `;
        
        global.fetch = jest.fn();
        jest.clearAllMocks();
    });

    test('should search for papers', async () => {
        global.fetch.mockResolvedValueOnce({
            json: async () => ({
                papers: [
                    { id: 1, title: 'Test Paper' }
                ],
                count: 1
            })
        });

        await searchPapers();

        expect(fetch).toHaveBeenCalledWith(
            expect.stringContaining('/api/search'),
            expect.any(Object)
        );
        
        const results = document.getElementById('search-results');
        expect(results.innerHTML).toContain('Test Paper');
    });
});

Viewing Coverage Reports

After running npm run test:coverage, view the detailed HTML report:

# Coverage report is generated in ./coverage directory
open coverage/index.html

The report shows:

Line-by-line coverage highlighting
Branch coverage details
Function coverage
Uncovered code sections

Documentation

Docstrings

All public functions, classes, and methods must have docstrings:

def my_function(param1: str, param2: int = 10) -> bool:
    """
    Brief description of the function.

    More detailed description if needed.

    Parameters
    ----------
    param1 : str
        Description of param1
    param2 : int, optional
        Description of param2 (default: 10)

    Returns
    -------
    bool
        Description of return value

    Raises
    ------
    ValueError
        When param1 is empty
    RuntimeError
        When operation fails

    Examples
    --------
    >>> my_function("test")
    True

    Notes
    -----
    Additional notes about the function.

    See Also
    --------
    related_function : Related functionality
    """
    pass

Building Documentation

# Build HTML documentation
cd docs
uv run make html

# View documentation
open _build/html/index.html

# Clean build
make clean

Updating Documentation

Update docstrings in source code
Update Markdown files in docs/
Rebuild documentation
Review changes in browser

Pull Request Process

1. Create Branch

git checkout -b feature/my-new-feature

2. Make Changes

Write code following style guidelines
Add tests for new functionality
Update documentation
Ensure all tests pass

3. Commit Changes

# Stage changes
git add .

# Commit with descriptive message
git commit -m "Add feature: description of changes"

4. Push Branch

git push origin feature/my-new-feature

5. Create Pull Request

Provide clear description of changes
Reference related issues
Include test results

Code Review

What We Look For

Correct functionality
Adequate test coverage
Clear documentation
Code style compliance
Performance considerations
Error handling

Review Process

Automated tests must pass
Code review by maintainer
Address feedback
Final approval and merge

Database Backend Support

Abstracts Explorer supports both SQLite and PostgreSQL backends through SQLAlchemy.

Architecture

Core Components:

db_models.py - SQLAlchemy ORM models (Paper, EmbeddingsMetadata)
database.py - DatabaseManager with SQLAlchemy session management
config.py - Database URL configuration

Configuration:

# SQLite (default)
PAPER_DB=data/abstracts.db

# PostgreSQL
PAPER_DB=postgresql://user:pass@localhost/abstracts

Working with Databases

Using DatabaseManager:

from abstracts_explorer.database import DatabaseManager

# SQLite (legacy)
db = DatabaseManager(db_path="abstracts.db")

# Any backend via URL
db = DatabaseManager(database_url="postgresql://...")

with db:
    db.create_tables()
    # Database operations...

Adding Database Fields:

Update ORM model in db_models.py
Create migration if needed (manual for now)
Update DatabaseManager methods if necessary
Add tests for new fields

Testing Different Backends:

# SQLite tests (always run)
uv run pytest tests/test_database.py

# PostgreSQL tests (requires server)
export POSTGRES_TEST_URL=postgresql://localhost/test_db
uv run pytest tests/test_multi_database.py

Best Practices:

Use SQLAlchemy ORM for new queries
Maintain backward compatibility with existing API
Test with both SQLite and PostgreSQL when possible
Use timezone-aware datetime (Python 3.12+)

Development Guidelines

Adding New Features

Discuss first - Open an issue to discuss major changes
Write tests first - TDD when possible
Document thoroughly - Code and user documentation
Consider backward compatibility - Avoid breaking changes

Fixing Bugs

Add failing test - Reproduce the bug
Fix the bug - Make the test pass
Add regression test - Prevent future recurrence
Document the fix - Update relevant docs

Refactoring

Ensure tests pass - Before starting
Make small changes - Incremental improvements
Run tests frequently - Catch issues early
Update documentation - If interfaces change

Performance

Benchmarking

import time

def benchmark():
    start = time.time()
    # Code to benchmark
    end = time.time()
    print(f"Execution time: {end - start:.2f}s")

Profiling

# Profile code
python -m cProfile -o profile.stats script.py

# View results
python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative'); p.print_stats(20)"

Security Considerations

Debug Mode and Production Server

The web UI uses a production-ready WSGI server (Waitress) by default to avoid security risks associated with Flask’s development server.

Key Points:

Production mode (default): Uses Waitress WSGI server (secure)
Development mode: Use --dev flag to enable Flask development server for easier debugging
Debug mode: Use -vv (double verbose) to enable Flask debug mode, which works with either server
Debug mode should only be used during development, never in production

Example:

# Production (secure, default)
abstracts-explorer web-ui

# Development with debug mode  
abstracts-explorer web-ui --dev -vv

# Production with debug logging (still secure)
abstracts-explorer web-ui -vv

Security Note: Flask’s debug mode includes an interactive debugger that could allow arbitrary code execution if exposed. The production server (Waitress) is always recommended for deployed applications, even when debugging is needed.

Questions?

Open an issue for questions
Check existing documentation
Review test files for examples

Thank you for contributing!