Plugin Module

The plugin module provides the framework for extending Abstracts Explorer with custom conference data downloaders.

Overview

The framework includes:

  • Base classes for plugin implementation (DownloaderPlugin, LightweightDownloaderPlugin)

  • Schema conversion utilities for standardizing paper data

  • Plugin registry for managing and discovering plugins

  • Data validation via the LightweightPaper Pydantic model

Available Plugins

Built-in plugins for downloading conference data:

Plugin

Conference

Years

API

neurips

NeurIPS

2020–2025

Full Schema

iclr

ICLR

2024–2026

Lightweight

icml

ICML

2024–2025

Lightweight

ml4ps

ML4PS Workshop

2025

Lightweight

Quick Start

from abstracts_explorer.plugin import (
    get_plugin,
    list_plugins,
    LightweightPaper,
)

# List available plugins
for plugin_info in list_plugins():
    print(f"{plugin_info['name']}: {plugin_info['description']}")

# Download papers using a plugin
plugin = get_plugin("neurips")
papers = plugin.download(year=2025)

Creating a Custom Plugin

from abstracts_explorer.plugin import (
    LightweightDownloaderPlugin,
    register_plugin,
)

class MyConferencePlugin(LightweightDownloaderPlugin):
    plugin_name = "myconf"
    plugin_description = "My Conference Downloader"
    supported_years = [2025]

    def download(self, year=None, output_path=None, force_download=False, **kwargs):
        self.validate_year(year)
        # ... fetch and return paper data
        return papers

    def get_metadata(self):
        return {
            "name": self.plugin_name,
            "description": self.plugin_description,
            "supported_years": self.supported_years,
        }

register_plugin(MyConferencePlugin())

See the Plugins Guide for detailed instructions.

API Reference

Plugin Framework

This module provides the plugin framework for extending neurips-abstracts with custom data downloaders.

The framework consists of: - Base classes for plugin implementation (DownloaderPlugin, LightweightDownloaderPlugin) - Schema conversion utilities (convert_to_lightweight_schema) - Plugin registry for managing plugins (PluginRegistry) - Pydantic models for data validation (LightweightPaper)

class abstracts_explorer.plugin.DownloaderPlugin[source]

Bases: ABC

Base class for all downloader plugins.

Each plugin must implement the download method and provide metadata about its capabilities.

Subclasses should set _start_year and override get_url(year) to get automatic supported_years computation with a current-year availability check.

plugin_name: str = 'base'
plugin_description: str = 'Base downloader plugin'
property supported_years: List[int]

Dynamically computed supported years.

Builds the range [_start_year, current_year) and appends the current year when its data URL is already accessible (checked via a HEAD request to get_url(current_year)).

Returns:

Supported conference years.

Return type:

list of int

get_url(year)[source]

Get the data URL for a specific year.

Override in subclasses to enable automatic current-year availability checking in supported_years.

Parameters:

year (int) – Conference year

Returns:

URL used for downloading or probing availability.

Return type:

str

Raises:

NotImplementedError – When the subclass does not provide an implementation.

abstractmethod download(year=None, output_path=None, force_download=False, **kwargs)[source]

Download papers from the data source.

Parameters:
  • year (int, optional) – Year to download papers for (if applicable)

  • output_path (str, optional) – Path to save the downloaded data

  • force_download (bool) – Force re-download even if data exists

  • **kwargs (Any) – Additional plugin-specific parameters

Returns:

List of validated paper objects ready for database insertion

Return type:

list of LightweightPaper

abstractmethod get_metadata()[source]

Get plugin metadata.

Returns:

Plugin metadata including name, description, supported years, etc.

Return type:

dict

validate_year(year)[source]

Validate that the requested year is supported.

Parameters:

year (int or None) – Year to validate

Raises:

ValueError – If year is not supported by this plugin

Return type:

None

class abstracts_explorer.plugin.LightweightDownloaderPlugin[source]

Bases: DownloaderPlugin

Lightweight base class for downloader plugins using simplified schema.

This plugin type uses a simpler data format that only requires essential fields, making it easier to implement new plugins. The data is automatically converted to the full NeurIPS schema when loaded into the database.

Required fields per paper:
  • title (str): Paper title

  • authors (list): List of author names (strings) or author dicts with ‘fullname’

  • abstract (str): Paper abstract

  • session (str): Session/workshop/track name

  • poster_position (str): Poster position identifier

  • year (int): Conference year (e.g., 2025)

  • conference (str): Conference name (e.g., “NeurIPS”, “ICLR”)

Optional fields per paper:
  • paper_pdf_url (str): URL to paper PDF

  • poster_image_url (str): URL to poster image

  • url (str): General URL (e.g., OpenReview, ArXiv)

  • room_name (str): Room name for presentation

  • keywords (list): List of keywords/tags

  • starttime (str): Start time (ISO format or readable string)

  • endtime (str): End time (ISO format or readable string)

  • id (int): Paper ID (auto-generated if not provided)

  • award (str): Award name (e.g., “Best Paper Award”)

plugin_name: str = 'lightweight_base'
plugin_description: str = 'Lightweight base downloader plugin'
class abstracts_explorer.plugin.PluginRegistry[source]

Bases: object

Registry for managing downloader plugins.

__init__()[source]
register(plugin)[source]

Register a new plugin.

Parameters:

plugin (DownloaderPlugin) – Plugin instance to register

Return type:

None

unregister(plugin_name)[source]

Unregister a plugin.

Parameters:

plugin_name (str) – Name of plugin to unregister

Return type:

None

get(plugin_name)[source]

Get a plugin by name.

Parameters:

plugin_name (str) – Name of plugin to retrieve

Returns:

Plugin instance or None if not found

Return type:

DownloaderPlugin or None

list_plugins()[source]

List all registered plugins with their metadata.

Returns:

List of plugin metadata dictionaries

Return type:

list

list_plugin_names()[source]

List names of all registered plugins.

Returns:

List of plugin names

Return type:

list

abstracts_explorer.plugin.register_plugin(plugin)[source]

Register a plugin with the global registry.

Parameters:

plugin (DownloaderPlugin) – Plugin instance to register

Return type:

None

abstracts_explorer.plugin.get_plugin(plugin_name)[source]

Get a plugin from the global registry.

Parameters:

plugin_name (str) – Name of plugin to retrieve

Returns:

Plugin instance or None if not found

Return type:

DownloaderPlugin or None

abstracts_explorer.plugin.list_plugins()[source]

List all registered plugins.

Returns:

List of plugin metadata dictionaries

Return type:

list

abstracts_explorer.plugin.list_plugin_names()[source]

List names of all registered plugins.

Returns:

List of plugin names

Return type:

list

abstracts_explorer.plugin.convert_to_lightweight_schema(papers)[source]

Convert full NeurIPS schema to lightweight paper format.

This function extracts only the fields needed for the lightweight schema from papers in the full NeurIPS format, making it easier to work with simplified data structures.

Parameters:

papers (list) – List of papers in full NeurIPS format with fields like: - id (int) - title or name (str) - will use ‘title’, fallback to ‘name’ - authors (list of dict with ‘fullname’ or list of str) - abstract (str) - session (str) - poster_position (str) - paper_pdf_url (str, optional) - poster_image_url (str, optional) - url (str, optional) - room_name (str, optional) - keywords (list or str, optional) - starttime (str, optional) - endtime (str, optional) - award or decision (str, optional) - year (int, optional) - conference (str, optional)

Returns:

Papers in lightweight format with only essential fields. Authors are returned as lists of strings.

Return type:

list of dict

Examples

>>> papers = [
...     {
...         'id': 123,
...         'title': 'Deep Learning',
...         'authors': [
...             {'id': 1, 'fullname': 'John Doe', 'institution': 'MIT'},
...             {'id': 2, 'fullname': 'Jane Smith', 'institution': 'Stanford'}
...         ],
...         'abstract': 'A paper about deep learning',
...         'session': 'Session A',
...         'poster_position': 'A-42',
...         'paper_pdf_url': 'https://example.com/paper.pdf',
...         'year': 2025,
...         'conference': 'NeurIPS'
...     }
... ]
>>> lightweight = convert_to_lightweight_schema(papers)
>>> lightweight[0]['authors']
['John Doe', 'Jane Smith']

Notes

  • Author objects are converted to lists of name strings

  • Extra NeurIPS-specific fields are dropped

  • ‘name’ field is converted to ‘title’ if needed

  • Keywords are converted from string to list if needed

class abstracts_explorer.plugin.LightweightPaper(**data)[source]

Bases: BaseModel

Lightweight paper model for plugin data validation.

This model validates the simplified schema used by LightweightDownloaderPlugin. It requires only essential fields and optionally supports additional metadata.

Required Fields

titlestr

Paper title

authorslist

List of author names (strings)

abstractstr

Paper abstract

sessionstr

Session/workshop/track name

poster_positionstr

Poster position identifier

yearint

Conference year (e.g., 2025)

conferencestr

Conference name (e.g., “NeurIPS”, “ML4PS”)

Optional Fields

original_idint

Paper ID from the original source

paper_pdf_urlstr

URL to paper PDF

poster_image_urlstr

URL to poster image

urlstr

General URL (OpenReview, ArXiv, etc.)

room_namestr

Room name for presentation

keywordslist

List of keywords/tags

starttimestr

Start time

endtimestr

End time

awardstr

Award name (e.g., “Best Paper Award”)

title: str
authors: List[str]
abstract: str
session: str
poster_position: str
year: int
conference: str
original_id: Optional[int]
paper_pdf_url: Optional[str]
poster_image_url: Optional[str]
url: Optional[str]
room_name: Optional[str]
keywords: Optional[List[str]]
starttime: Optional[str]
endtime: Optional[str]
award: Optional[str]
classmethod validate_title(v)[source]

Ensure title is not empty.

Return type:

str

classmethod validate_authors(v)[source]

Ensure authors list is not empty and properly formatted.

Empty or whitespace-only author entries are silently filtered out so that a single malformed entry does not cause the entire paper to be rejected. A ValueError is still raised when the list is empty after filtering, or when any remaining entry contains a semicolon.

Return type:

List[str]

classmethod validate_abstract(v)[source]

Ensure abstract is not empty.

Return type:

str

classmethod validate_session(v)[source]

Ensure session is not empty.

Return type:

str

classmethod validate_conference(v)[source]

Ensure conference is not empty.

Return type:

str

classmethod validate_year(v)[source]

Ensure year is reasonable.

Return type:

int

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstracts_explorer.plugin.sanitize_author_names(authors)[source]

Filter out semicolons from author names.

Semicolons are not allowed in author names because they would interfere with the semicolon-separated format used to store authors in the database. This function replaces semicolons with spaces and normalizes whitespace.

Parameters:

authors (list of str) – List of author names to sanitize

Returns:

List of author names with semicolons replaced by spaces

Return type:

list of str

Examples

>>> sanitize_author_names(["John Doe", "Jane; Smith", "Bob;Johnson"])
['John Doe', 'Jane Smith', 'Bob Johnson']
>>> sanitize_author_names(["Alice"])
['Alice']
>>> sanitize_author_names([])
[]
>>> sanitize_author_names(["Multi;;Semicolons"])
['Multi Semicolons']

Notes

This function is useful when importing data from sources that may contain semicolons in author names. The LightweightPaper model will reject author names containing semicolons during validation.

Multiple consecutive spaces are normalized to a single space.

abstracts_explorer.plugin.validate_lightweight_paper(paper)[source]

Validate a paper dict against the lightweight schema.

Parameters:

paper (dict) – Paper data to validate

Returns:

Validated paper model

Return type:

LightweightPaper

Raises:

ValidationError – If the paper data is invalid

abstracts_explorer.plugin.validate_lightweight_papers(papers)[source]

Validate a list of papers against the lightweight schema.

Papers that fail validation are logged as warnings and skipped rather than aborting the entire import.

Parameters:

papers (list) – List of paper dicts to validate

Returns:

List of validated paper models (papers that failed validation are excluded)

Return type:

list of LightweightPaper