Config Module

The config module provides configuration management for Abstracts Explorer.

Overview

The configuration system supports:

Environment variables
.env file loading
Type conversion (string, int, float)
Singleton pattern for global configuration
Priority-based configuration loading

Class Reference

Configuration management for neurips-abstracts package.

This module loads configuration from environment variables and .env files. Uses only standard library (no python-dotenv dependency required).

abstracts_explorer.config.load_env_file(env_path=None)[source]

Load environment variables from a .env file.

Uses a simple parser that handles basic .env file format without requiring external dependencies.

Parameters:: env_path (Path, optional) – Path to .env file. If None, looks for .env in current directory and parent directories up to the package root.
Returns:: Dictionary of environment variables loaded from file.
Return type:: dict

Examples

>>> env_vars = load_env_file(Path(".env"))
>>> print(env_vars.get("CHAT_MODEL"))

class abstracts_explorer.config.Config(env_path=None)[source]

Bases: object

Configuration manager for neurips-abstracts package.

Loads configuration from environment variables with fallback to defaults. Automatically loads from .env file if present.

data_dir

Base directory for data files (databases, embeddings).

Type:: str

chat_model

Name of the language model for chat/RAG.

Type:: str

embedding_model

Name of the embedding model.

Type:: str

llm_backend_url

URL for OpenAI-compatible API endpoint.

Type:: str

llm_backend_auth_token

Authentication token for LLM backend (if required).

Type:: str

embedding_db

ChromaDB configuration - can be either a URL (e.g., “http://chromadb:8000”) or a file path (e.g., “chroma_db” or “/path/to/chroma_db”).

Type:: str

paper_db_path

Path to SQLite paper database.

Type:: str

collection_name

Name of the ChromaDB collection.

Type:: str

max_context_papers

Default number of papers for RAG context.

Type:: int

chat_temperature

Default temperature for chat generation.

Type:: float

chat_max_tokens

Default max tokens for chat responses.

Type:: int

enable_query_rewriting

Whether to enable query rewriting for better semantic search.

Type:: bool

query_similarity_threshold

Similarity threshold for determining when to retrieve new papers (0.0-1.0).

Type:: float

database_url

SQLAlchemy database URL (supports SQLite, PostgreSQL, etc.). Automatically constructed from PAPER_DB config variable.

Type:: str

log_level

Logging level from environment (WARNING, INFO, DEBUG). Empty string if not set. Used by setup_logging() to set the default log level when verbosity flags are not used.

Type:: str

Examples

>>> config = Config()
>>> print(config.chat_model)
'diffbot-small-xl-2508'
>>> config.llm_backend_url
'http://localhost:1234'
>>> # Using DATABASE_URL for PostgreSQL
>>> config.database_url
'postgresql://user:password@localhost/abstracts'

__init__(env_path=None)[source]

Initialize configuration.

Parameters:: env_path (Path, optional) – Path to .env file. If None, searches for .env automatically.

to_dict()[source]

Convert configuration to dictionary.

Returns:: Dictionary of all configuration values.
Return type:: dict

Examples

>>> config = Config()
>>> config_dict = config.to_dict()
>>> print(config_dict["chat_model"])

__repr__()[source]

String representation of configuration.

Return type:: str

abstracts_explorer.config.get_config(reload=False, env_path=None)[source]

Get global configuration instance.

Parameters:

reload (bool, optional) – Force reload configuration from environment, by default False
env_path (Path, optional) – Path to .env file. If provided, loads configuration from this file. Useful for testing to ensure consistent configuration.

Returns:

Global configuration instance.

Return type:

Config

Examples

>>> config = get_config()
>>> print(config.chat_model)
>>> # In tests, use .env.tests for consistent values
>>> config = get_config(reload=True, env_path=Path(".env.tests"))

Usage Examples

Getting Configuration

from abstracts_explorer.config import get_config

# Get singleton instance
config = get_config()

# Access configuration values
print(f"Chat model: {config.chat_model}")
print(f"Backend URL: {config.llm_backend_url}")
print(f"Database URL: {config.database_url}")

Configuration Values

# Chat/LLM settings
config.chat_model                    # str
config.chat_temperature              # float
config.chat_max_tokens              # int
config.llm_backend_url              # str
config.llm_backend_auth_token       # str

# Embedding settings
config.embedding_model              # str
config.embedding_db_path            # str (for local ChromaDB)
config.embedding_db_url             # str (for remote ChromaDB)
config.collection_name              # str

# Database settings
config.database_url                 # str (SQLAlchemy-compatible URL)

# RAG settings
config.max_context_papers           # int

Custom .env File

from abstracts_explorer.config import load_env_file

# Load from specific file
env_vars = load_env_file("/path/to/custom.env")

# Use with os.environ
import os
for key, value in env_vars.items():
    os.environ[key] = value

Configuration Priority

Settings are loaded in order (later overrides earlier):

Built-in defaults - Hardcoded in Config class
.env file - In current directory
Environment variables - System environment
CLI arguments - Command-line overrides (when applicable)

Example Priority

# 1. Default in code
chat_model = "gemma-3-4b-it-qat"

# 2. .env file (overrides default)
CHAT_MODEL=llama-3.2-3b-instruct

# 3. Environment variable (overrides .env)
export CHAT_MODEL=diffbot-small-xl-2508

# 4. CLI argument (overrides all)
abstracts-explorer chat --model custom-model

.env File Format

The .env file uses simple KEY=VALUE format:

# Comments start with #
CHAT_MODEL=gemma-3-4b-it-qat

# Quotes are optional
LLM_BACKEND_URL=http://localhost:1234

# Empty values allowed
LLM_BACKEND_AUTH_TOKEN=

# No spaces around =
CHAT_TEMPERATURE=0.7

Supported Features

Comments (#)
Empty lines (ignored)
Quoted values ("value" or 'value')
Unquoted values
Empty values

Not Supported

Variable expansion ($VAR)
Multi-line values
Export statements (export VAR=value)
Inline comments (VAR=value # comment)

Type Conversion

The Config class automatically converts types:

# String values
config.chat_model          # str: "gemma-3-4b-it-qat"
config.llm_backend_url     # str: "http://localhost:1234"

# Integer values
config.chat_max_tokens     # int: 1000
config.max_context_papers  # int: 5

# Float values
config.chat_temperature    # float: 0.7

Default Values

Default values when not configured:

DATA_DIR = "data"

CHAT_MODEL = "diffbot-small-xl-2508"
CHAT_TEMPERATURE = 0.7
CHAT_MAX_TOKENS = 1000

EMBEDDING_MODEL = "text-embedding-qwen3-embedding-4b"
EMBEDDING_DB_PATH = "chroma_db"
EMBEDDING_DB_URL = ""

LLM_BACKEND_URL = "http://localhost:1234"
LLM_BACKEND_AUTH_TOKEN = ""

PAPER_DB = "abstracts.db"  # Converted to database_url internally
COLLECTION_NAME = "papers"
MAX_CONTEXT_PAPERS = 5

Configuration in Tests

Tests use configuration from environment:

import pytest
from abstracts_explorer.config import get_config

def test_with_config():
    config = get_config()
    
    # Tests use configured values
    assert config.chat_model is not None
    assert config.llm_backend_url is not None

Overriding in Tests

import os
import pytest

@pytest.fixture
def custom_config():
    # Save original
    original = os.environ.get('CHAT_MODEL')
    
    # Override
    os.environ['CHAT_MODEL'] = 'test-model'
    
    yield
    
    # Restore
    if original:
        os.environ['CHAT_MODEL'] = original
    else:
        del os.environ['CHAT_MODEL']

def test_with_custom_config(custom_config):
    config = get_config()
    assert config.chat_model == 'test-model'

Security Best Practices

Do Not Commit Secrets

# .gitignore
.env
*.env
!.env.example

Use Environment Variables in Production

# Production environment
export LLM_BACKEND_AUTH_TOKEN="secret-token"
export LLM_BACKEND_URL="https://production-api.example.com"

Provide Template

# .env.example (commit this)
CHAT_MODEL=gemma-3-4b-it-qat
LLM_BACKEND_URL=http://localhost:1234
LLM_BACKEND_AUTH_TOKEN=

# Users copy and customize
cp .env.example .env

Best Practices

Use .env for development - Easy local configuration
Use environment variables in production - Secure and flexible
Document all settings - Keep .env.example up to date
Validate configuration - Check required settings exist
Use defaults wisely - Provide sensible defaults
Don’t commit secrets - Use .gitignore properly