# Contributing Thank you for your interest in contributing to Abstracts Explorer! ## Development Setup ### 1. Clone Repository ```bash git clone cd abstracts ``` ### 2. Install uv If you don't have uv installed yet: ```bash # macOS and Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Or with pip pip install uv ``` ### 3. Install Dependencies ```bash # Install all dependencies including dev, web, and docs uv sync --all-extras # Activate the virtual environment source .venv/bin/activate # On Windows: .venv\Scripts\activate ``` ### 4. Configure Environment ```bash cp .env.example .env # Edit .env with your settings ``` ## Docker Development Setup For containerized development with Podman or Docker: **Note:** The Docker image uses pre-built static vendor files (CSS/JS libraries) that are committed to the repository. Node.js is **not required** for production containers - it's only needed for local development if you want to rebuild these vendor files. ### Quick Start ```bash # Build and start with Podman podman-compose up -d # Or with Docker docker compose up -d # Access at http://localhost:5000 ``` ### Running Commands in Container ```bash # Execute CLI commands podman-compose exec abstracts-explorer abstracts-explorer download --year 2025 # Run tests podman-compose exec abstracts-explorer pytest # Interactive shell podman-compose exec -it abstracts-explorer /bin/bash ``` ### Development Workflow 1. Make code changes locally 2. Rebuild container: `podman-compose build` 3. Restart services: `podman-compose up -d` 4. Test changes in container See the [Docker Guide](docker.md) for more details. ## Code Style ### Python Style - Follow PEP 8 style guide - Use NumPy-style docstrings - Maximum line length: 88 characters (Black default) - Use type hints where appropriate ### Example Function ```python def search_papers( query: str, limit: int = 10, year: int | None = None ) -> list[dict]: """ Search for papers matching the query. Parameters ---------- query : str Search query string limit : int, optional Maximum number of results (default: 10) year : int or None, optional Filter by conference year (default: None) Returns ------- list of dict List of paper dictionaries matching the query Examples -------- >>> results = search_papers("transformer", limit=5) >>> print(len(results)) 5 """ # Implementation here pass ``` ## Testing ### Python Testing #### Running Python Tests ```bash # Run all tests (excludes slow tests by default) uv run pytest # Run with coverage uv run pytest --cov=src/abstracts_explorer # Run specific test file uv run pytest tests/test_database.py # Run specific test uv run pytest tests/test_database.py::test_add_paper # Verbose output uv run pytest -v # Show print statements uv run pytest -s # Run only slow tests (requires LM Studio) uv run pytest -m slow # Run all tests including slow ones uv run pytest -m "" ``` **Note about slow tests:** Tests requiring LM Studio are marked as `slow` and skipped by default. This allows for faster development cycles. To run slow tests, you need: - LM Studio running at the configured URL (default: http://localhost:1234) - A chat model loaded in LM Studio - Use `pytest -m slow` to run only slow tests #### Python Test Organization **One test file per module**: Each source module should have exactly one corresponding test file. This makes tests easy to find and maintain. Examples: - `src/abstracts_explorer/database.py` → `tests/test_database.py` - `src/abstracts_explorer/plugin.py` → `tests/test_plugin.py` - `src/abstracts_explorer/web_ui/app.py` → `tests/test_web_ui.py` **Shared test code**: - `tests/conftest.py` - Shared pytest fixtures - `tests/helpers.py` - Shared helper functions **Exception for test types**: Different test types may have separate files: - `test_integration.py` - Cross-module integration tests - `test_web_integration.py` - Web UI integration tests - `test_web_e2e.py` - End-to-end browser tests #### Writing Python Tests - Use pytest framework - Follow the one test file per module principle - Create unit tests for all new functions - Use fixtures for common setup (defined in `conftest.py`) - Mock external dependencies (API calls, LLM backends) - Aim for >80% code coverage #### Example Python Test ```python import pytest from abstracts_explorer.database import DatabaseManager @pytest.fixture def db(tmp_path): """Create a temporary test database.""" db_path = tmp_path / "test.db" return DatabaseManager(str(db_path)) def test_add_paper(db): """Test adding a paper to the database.""" paper_data = { 'openreview_id': 'test123', 'title': 'Test Paper', 'abstract': 'Test abstract', 'year': 2025, } paper_id = db.add_paper(paper_data) assert paper_id is not None # Verify paper was added paper = db.get_paper_by_id(paper_id) assert paper['title'] == 'Test Paper' ``` ### JavaScript Testing The web UI uses Jest for JavaScript unit testing with jsdom for DOM simulation. #### Running JavaScript Tests ```bash # Install Node.js dependencies (first time only) npm install # Run all JavaScript tests npm test # Run with coverage report npm run test:coverage # Run tests in watch mode (for development) npm run test:watch # Run specific test file npm test -- chat.test.js # Run with verbose output npm test -- --verbose ``` #### JavaScript Test Coverage Current JavaScript test coverage (excluding vendor files): - **Overall**: ~86% line coverage - **Target**: >90% coverage for all modules Coverage by module: - **Utility modules**: 100% coverage (api-utils, cluster-utils, constants, dom-utils, sort-utils) - **Core modules**: 70-100% coverage (state, search, chat, tabs, clustering, filters) - **Integration modules**: Tested via end-to-end browser tests #### JavaScript Test Organization Test files are located in `src/abstracts_explorer/web_ui/tests/`: - `setup.js` - Jest configuration and global mocks - `app.test.js` - Tests for main app initialization - `chat.test.js` - Chat module tests - `clustering.test.js` - Clustering visualization tests - `clustering-hierarchy.test.js` - Hierarchical clustering tests - `filters.test.js` - Filter panel tests - `interesting-papers.test.js` - Interesting papers management tests - `modules.test.js` - Module loading tests - `paper-card.test.js` - Paper card component tests - `search.test.js` - Search functionality tests - `state.test.js` - State management tests - `tabs.test.js` - Tab navigation tests - `utils.test.js` - Utility function tests #### Writing JavaScript Tests **Test Structure:** ```javascript import { jest, expect, describe, test, beforeEach } from '@jest/globals'; import { myFunction } from '../static/modules/my-module.js'; describe('My Module', () => { beforeEach(() => { // Setup DOM document.body.innerHTML = `
`; // Mock fetch global.fetch = jest.fn(); // Reset mocks jest.clearAllMocks(); }); test('should perform expected behavior', () => { // Arrange const input = 'test input'; // Act const result = myFunction(input); // Assert expect(result).toBe('expected output'); }); }); ``` **Best Practices:** 1. **Mock external dependencies:** - Use `global.fetch = jest.fn()` to mock API calls - Mock Plotly for visualization tests - Mock localStorage for state tests 2. **Setup DOM elements:** - Create minimal DOM structure needed for tests - Use `document.body.innerHTML` in `beforeEach()` - Clean up with `jest.clearAllMocks()` 3. **Test behavior, not implementation:** - Focus on what the function does, not how - Test user-facing behavior - Verify DOM changes and state updates 4. **Async testing:** ```javascript test('should load data', async () => { global.fetch.mockResolvedValueOnce({ json: async () => ({ data: 'test' }) }); await loadData(); expect(fetch).toHaveBeenCalled(); }); ``` 5. **Event handling:** ```javascript test('should handle click event', () => { const button = document.getElementById('my-button'); const mockHandler = jest.fn(); button.addEventListener('click', mockHandler); button.click(); expect(mockHandler).toHaveBeenCalled(); }); ``` #### Example JavaScript Test ```javascript import { jest, expect, describe, test, beforeEach } from '@jest/globals'; import { searchPapers } from '../static/modules/search.js'; describe('Search Module', () => { beforeEach(() => { document.body.innerHTML = `
`; global.fetch = jest.fn(); jest.clearAllMocks(); }); test('should search for papers', async () => { global.fetch.mockResolvedValueOnce({ json: async () => ({ papers: [ { id: 1, title: 'Test Paper' } ], count: 1 }) }); await searchPapers(); expect(fetch).toHaveBeenCalledWith( expect.stringContaining('/api/search'), expect.any(Object) ); const results = document.getElementById('search-results'); expect(results.innerHTML).toContain('Test Paper'); }); }); ``` #### Viewing Coverage Reports After running `npm run test:coverage`, view the detailed HTML report: ```bash # Coverage report is generated in ./coverage directory open coverage/index.html ``` The report shows: - Line-by-line coverage highlighting - Branch coverage details - Function coverage - Uncovered code sections ## Documentation ### Docstrings All public functions, classes, and methods must have docstrings: ```python def my_function(param1: str, param2: int = 10) -> bool: """ Brief description of the function. More detailed description if needed. Parameters ---------- param1 : str Description of param1 param2 : int, optional Description of param2 (default: 10) Returns ------- bool Description of return value Raises ------ ValueError When param1 is empty RuntimeError When operation fails Examples -------- >>> my_function("test") True Notes ----- Additional notes about the function. See Also -------- related_function : Related functionality """ pass ``` ### Building Documentation ```bash # Build HTML documentation cd docs uv run make html # View documentation open _build/html/index.html # Clean build make clean ``` ### Updating Documentation 1. Update docstrings in source code 2. Update Markdown files in `docs/` 3. Rebuild documentation 4. Review changes in browser ## Pull Request Process ### 1. Create Branch ```bash git checkout -b feature/my-new-feature ``` ### 2. Make Changes - Write code following style guidelines - Add tests for new functionality - Update documentation - Ensure all tests pass ### 3. Commit Changes ```bash # Stage changes git add . # Commit with descriptive message git commit -m "Add feature: description of changes" ``` ### 4. Push Branch ```bash git push origin feature/my-new-feature ``` ### 5. Create Pull Request - Provide clear description of changes - Reference related issues - Include test results ## Code Review ### What We Look For - Correct functionality - Adequate test coverage - Clear documentation - Code style compliance - Performance considerations - Error handling ### Review Process 1. Automated tests must pass 2. Code review by maintainer 3. Address feedback 4. Final approval and merge ## Database Backend Support Abstracts Explorer supports both SQLite and PostgreSQL backends through SQLAlchemy. ### Architecture **Core Components:** - `db_models.py` - SQLAlchemy ORM models (Paper, EmbeddingsMetadata) - `database.py` - DatabaseManager with SQLAlchemy session management - `config.py` - Database URL configuration **Configuration:** ```bash # SQLite (default) PAPER_DB=data/abstracts.db # PostgreSQL PAPER_DB=postgresql://user:pass@localhost/abstracts ``` ### Working with Databases **Using DatabaseManager:** ```python from abstracts_explorer.database import DatabaseManager # SQLite (legacy) db = DatabaseManager(db_path="abstracts.db") # Any backend via URL db = DatabaseManager(database_url="postgresql://...") with db: db.create_tables() # Database operations... ``` **Adding Database Fields:** 1. Update ORM model in `db_models.py` 2. Create migration if needed (manual for now) 3. Update `DatabaseManager` methods if necessary 4. Add tests for new fields **Testing Different Backends:** ```bash # SQLite tests (always run) uv run pytest tests/test_database.py # PostgreSQL tests (requires server) export POSTGRES_TEST_URL=postgresql://localhost/test_db uv run pytest tests/test_multi_database.py ``` **Best Practices:** - Use SQLAlchemy ORM for new queries - Maintain backward compatibility with existing API - Test with both SQLite and PostgreSQL when possible - Use timezone-aware datetime (Python 3.12+) ## Development Guidelines ### Adding New Features 1. **Discuss first** - Open an issue to discuss major changes 2. **Write tests first** - TDD when possible 3. **Document thoroughly** - Code and user documentation 4. **Consider backward compatibility** - Avoid breaking changes ### Fixing Bugs 1. **Add failing test** - Reproduce the bug 2. **Fix the bug** - Make the test pass 3. **Add regression test** - Prevent future recurrence 4. **Document the fix** - Update relevant docs ### Refactoring 1. **Ensure tests pass** - Before starting 2. **Make small changes** - Incremental improvements 3. **Run tests frequently** - Catch issues early 4. **Update documentation** - If interfaces change ## Performance ### Benchmarking ```python import time def benchmark(): start = time.time() # Code to benchmark end = time.time() print(f"Execution time: {end - start:.2f}s") ``` ### Profiling ```bash # Profile code python -m cProfile -o profile.stats script.py # View results python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative'); p.print_stats(20)" ``` ## Security Considerations ### Debug Mode and Production Server The web UI uses a production-ready WSGI server (Waitress) by default to avoid security risks associated with Flask's development server. **Key Points:** - **Production mode (default)**: Uses Waitress WSGI server (secure) - **Development mode**: Use `--dev` flag to enable Flask development server for easier debugging - **Debug mode**: Use `-vv` (double verbose) to enable Flask debug mode, which works with either server - Debug mode should only be used during development, never in production **Example:** ```bash # Production (secure, default) abstracts-explorer web-ui # Development with debug mode abstracts-explorer web-ui --dev -vv # Production with debug logging (still secure) abstracts-explorer web-ui -vv ``` **Security Note:** Flask's debug mode includes an interactive debugger that could allow arbitrary code execution if exposed. The production server (Waitress) is always recommended for deployed applications, even when debugging is needed. ## Questions? - Open an issue for questions - Check existing documentation - Review test files for examples Thank you for contributing!