Plugin Module

The plugin module provides the framework for extending Abstracts Explorer with custom conference data downloaders.

Overview

The framework includes:

Base classes for plugin implementation (DownloaderPlugin, LightweightDownloaderPlugin)
Schema conversion utilities for standardizing paper data
Plugin registry for managing and discovering plugins
Data validation via the LightweightPaper Pydantic model

Available Plugins

Built-in plugins for downloading conference data:

Plugin	Conference	Years	API
`neurips`	NeurIPS	2020–2025	Full Schema
`iclr`	ICLR	2024–2026	Lightweight
`icml`	ICML	2024–2025	Lightweight
`ml4ps`	ML4PS Workshop	2025	Lightweight

Quick Start

from abstracts_explorer.plugin import (
    get_plugin,
    list_plugins,
    LightweightPaper,
)

# List available plugins
for plugin_info in list_plugins():
    print(f"{plugin_info['name']}: {plugin_info['description']}")

# Download papers using a plugin
plugin = get_plugin("neurips")
papers = plugin.download(year=2025)

Creating a Custom Plugin

from abstracts_explorer.plugin import (
    LightweightDownloaderPlugin,
    register_plugin,
)

class MyConferencePlugin(LightweightDownloaderPlugin):
    plugin_name = "myconf"
    plugin_description = "My Conference Downloader"
    supported_years = [2025]

    def download(self, year=None, output_path=None, force_download=False, **kwargs):
        self.validate_year(year)
        # ... fetch and return paper data
        return papers

    def get_metadata(self):
        return {
            "name": self.plugin_name,
            "description": self.plugin_description,
            "supported_years": self.supported_years,
        }

register_plugin(MyConferencePlugin())

See the Plugins Guide for detailed instructions.

API Reference

Plugin Framework

This module provides the plugin framework for extending neurips-abstracts with custom data downloaders.

The framework consists of: - Base classes for plugin implementation (DownloaderPlugin, LightweightDownloaderPlugin) - Schema conversion utilities (convert_to_lightweight_schema) - Plugin registry for managing plugins (PluginRegistry) - Pydantic models for data validation (LightweightPaper)

class abstracts_explorer.plugin.DownloaderPlugin[source]

Bases: ABC

Base class for all downloader plugins.

Each plugin must implement the download method and provide metadata about its capabilities.

Subclasses should set _start_year and override get_url(year) to get automatic supported_years computation with a current-year availability check.

plugin_name: str = 'base'

plugin_description: str = 'Base downloader plugin'

property supported_years: List[int]

Dynamically computed supported years.

Builds the range [_start_year, current_year) and appends the current year when its data URL is already accessible (checked via a HEAD request to get_url(current_year)).

Returns:: Supported conference years.
Return type:: list of int

get_url(year)[source]

Get the data URL for a specific year.

Override in subclasses to enable automatic current-year availability checking in supported_years.

Parameters:: year (int) – Conference year
Returns:: URL used for downloading or probing availability.
Return type:: str
Raises:: NotImplementedError – When the subclass does not provide an implementation.

abstractmethod download(year=None, output_path=None, force_download=False, **kwargs)[source]

Download papers from the data source.

Parameters:

year (int, optional) – Year to download papers for (if applicable)
output_path (str, optional) – Path to save the downloaded data
force_download (bool) – Force re-download even if data exists
**kwargs (Any) – Additional plugin-specific parameters

Returns:

List of validated paper objects ready for database insertion

Return type:

list of LightweightPaper

abstractmethod get_metadata()[source]

Get plugin metadata.

Returns:: Plugin metadata including name, description, supported years, etc.
Return type:: dict

validate_year(year)[source]

Validate that the requested year is supported.

Parameters:: year (int or None) – Year to validate
Raises:: ValueError – If year is not supported by this plugin
Return type:: None

class abstracts_explorer.plugin.LightweightDownloaderPlugin[source]

Bases: DownloaderPlugin

Lightweight base class for downloader plugins using simplified schema.

This plugin type uses a simpler data format that only requires essential fields, making it easier to implement new plugins. The data is automatically converted to the full NeurIPS schema when loaded into the database.

Required fields per paper:

title (str): Paper title
authors (list): List of author names (strings) or author dicts with ‘fullname’
abstract (str): Paper abstract
session (str): Session/workshop/track name
poster_position (str): Poster position identifier
year (int): Conference year (e.g., 2025)
conference (str): Conference name (e.g., “NeurIPS”, “ICLR”)

Optional fields per paper:

paper_pdf_url (str): URL to paper PDF
poster_image_url (str): URL to poster image
url (str): General URL (e.g., OpenReview, ArXiv)
room_name (str): Room name for presentation
keywords (list): List of keywords/tags
starttime (str): Start time (ISO format or readable string)
endtime (str): End time (ISO format or readable string)
id (int): Paper ID (auto-generated if not provided)
award (str): Award name (e.g., “Best Paper Award”)

plugin_name: str = 'lightweight_base'

plugin_description: str = 'Lightweight base downloader plugin'

class abstracts_explorer.plugin.PluginRegistry[source]

Bases: object

Registry for managing downloader plugins.

__init__()[source]

register(plugin)[source]

Register a new plugin.

Parameters:: plugin (DownloaderPlugin) – Plugin instance to register
Return type:: None

unregister(plugin_name)[source]

Unregister a plugin.

Parameters:: plugin_name (str) – Name of plugin to unregister
Return type:: None

get(plugin_name)[source]

Get a plugin by name.

Parameters:: plugin_name (str) – Name of plugin to retrieve
Returns:: Plugin instance or None if not found
Return type:: DownloaderPlugin or None

list_plugins()[source]

List all registered plugins with their metadata.

Returns:: List of plugin metadata dictionaries
Return type:: list

list_plugin_names()[source]

List names of all registered plugins.

Returns:: List of plugin names
Return type:: list

abstracts_explorer.plugin.register_plugin(plugin)[source]

Register a plugin with the global registry.

Parameters:: plugin (DownloaderPlugin) – Plugin instance to register
Return type:: None

abstracts_explorer.plugin.get_plugin(plugin_name)[source]

Get a plugin from the global registry.

Parameters:: plugin_name (str) – Name of plugin to retrieve
Returns:: Plugin instance or None if not found
Return type:: DownloaderPlugin or None

abstracts_explorer.plugin.list_plugins()[source]

List all registered plugins.

Returns:: List of plugin metadata dictionaries
Return type:: list

abstracts_explorer.plugin.list_plugin_names()[source]

List names of all registered plugins.

Returns:: List of plugin names
Return type:: list

abstracts_explorer.plugin.convert_to_lightweight_schema(papers)[source]

Convert full NeurIPS schema to lightweight paper format.

This function extracts only the fields needed for the lightweight schema from papers in the full NeurIPS format, making it easier to work with simplified data structures.

Parameters:: papers (list) – List of papers in full NeurIPS format with fields like: - id (int) - title or name (str) - will use ‘title’, fallback to ‘name’ - authors (list of dict with ‘fullname’ or list of str) - abstract (str) - session (str) - poster_position (str) - paper_pdf_url (str, optional) - poster_image_url (str, optional) - url (str, optional) - room_name (str, optional) - keywords (list or str, optional) - starttime (str, optional) - endtime (str, optional) - award or decision (str, optional) - year (int, optional) - conference (str, optional)
Returns:: Papers in lightweight format with only essential fields. Authors are returned as lists of strings.
Return type:: list of dict

Examples

>>> papers = [
...     {
...         'id': 123,
...         'title': 'Deep Learning',
...         'authors': [
...             {'id': 1, 'fullname': 'John Doe', 'institution': 'MIT'},
...             {'id': 2, 'fullname': 'Jane Smith', 'institution': 'Stanford'}
...         ],
...         'abstract': 'A paper about deep learning',
...         'session': 'Session A',
...         'poster_position': 'A-42',
...         'paper_pdf_url': 'https://example.com/paper.pdf',
...         'year': 2025,
...         'conference': 'NeurIPS'
...     }
... ]
>>> lightweight = convert_to_lightweight_schema(papers)
>>> lightweight[0]['authors']
['John Doe', 'Jane Smith']

Notes

Author objects are converted to lists of name strings
Extra NeurIPS-specific fields are dropped
‘name’ field is converted to ‘title’ if needed
Keywords are converted from string to list if needed

class abstracts_explorer.plugin.LightweightPaper(**data)[source]

Bases: BaseModel

Lightweight paper model for plugin data validation.

This model validates the simplified schema used by LightweightDownloaderPlugin. It requires only essential fields and optionally supports additional metadata.

Required Fields

titlestr: Paper title
authorslist: List of author names (strings)
abstractstr: Paper abstract
sessionstr: Session/workshop/track name
poster_positionstr: Poster position identifier
yearint: Conference year (e.g., 2025)
conferencestr: Conference name (e.g., “NeurIPS”, “ML4PS”)

Optional Fields

original_idint: Paper ID from the original source
paper_pdf_urlstr: URL to paper PDF
poster_image_urlstr: URL to poster image
urlstr: General URL (OpenReview, ArXiv, etc.)
room_namestr: Room name for presentation
keywordslist: List of keywords/tags
starttimestr: Start time
endtimestr: End time
awardstr: Award name (e.g., “Best Paper Award”)

title: str

authors: List[str]

abstract: str

session: str

poster_position: str

year: int

conference: str

original_id: Optional[int]

paper_pdf_url: Optional[str]

poster_image_url: Optional[str]

url: Optional[str]

room_name: Optional[str]

keywords: Optional[List[str]]

starttime: Optional[str]

endtime: Optional[str]

award: Optional[str]

classmethod validate_title(v)[source]

Ensure title is not empty.

Return type:: str

classmethod validate_authors(v)[source]

Ensure authors list is not empty and properly formatted.

Empty or whitespace-only author entries are silently filtered out so that a single malformed entry does not cause the entire paper to be rejected. A ValueError is still raised when the list is empty after filtering, or when any remaining entry contains a semicolon.

Return type:: List[str]

classmethod validate_abstract(v)[source]

Ensure abstract is not empty.

Return type:: str

classmethod validate_session(v)[source]

Ensure session is not empty.

Return type:: str

classmethod validate_conference(v)[source]

Ensure conference is not empty.

Return type:: str

classmethod validate_year(v)[source]

Ensure year is reasonable.

Return type:: int

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstracts_explorer.plugin.sanitize_author_names(authors)[source]

Filter out semicolons from author names.

Semicolons are not allowed in author names because they would interfere with the semicolon-separated format used to store authors in the database. This function replaces semicolons with spaces and normalizes whitespace.

Parameters:: authors (list of str) – List of author names to sanitize
Returns:: List of author names with semicolons replaced by spaces
Return type:: list of str

Examples

>>> sanitize_author_names(["John Doe", "Jane; Smith", "Bob;Johnson"])
['John Doe', 'Jane Smith', 'Bob Johnson']

>>> sanitize_author_names(["Alice"])
['Alice']

>>> sanitize_author_names([])
[]

>>> sanitize_author_names(["Multi;;Semicolons"])
['Multi Semicolons']

Notes

This function is useful when importing data from sources that may contain semicolons in author names. The LightweightPaper model will reject author names containing semicolons during validation.

Multiple consecutive spaces are normalized to a single space.

abstracts_explorer.plugin.validate_lightweight_paper(paper)[source]

Validate a paper dict against the lightweight schema.

Parameters:: paper (dict) – Paper data to validate
Returns:: Validated paper model
Return type:: LightweightPaper
Raises:: ValidationError – If the paper data is invalid

abstracts_explorer.plugin.validate_lightweight_papers(papers)[source]

Validate a list of papers against the lightweight schema.

Papers that fail validation are logged as warnings and skipped rather than aborting the entire import.

Parameters:: papers (list) – List of paper dicts to validate
Returns:: List of validated paper models (papers that failed validation are excluded)
Return type:: list of LightweightPaper