Plugin Module
The plugin module provides the framework for extending Abstracts Explorer with custom conference data downloaders.
Overview
The framework includes:
Base classes for plugin implementation (
DownloaderPlugin,LightweightDownloaderPlugin)Schema conversion utilities for standardizing paper data
Plugin registry for managing and discovering plugins
Data validation via the
LightweightPaperPydantic model
Available Plugins
Built-in plugins for downloading conference data:
Plugin |
Conference |
Years |
API |
|---|---|---|---|
|
NeurIPS |
2020–2025 |
Full Schema |
|
ICLR |
2024–2026 |
Lightweight |
|
ICML |
2024–2025 |
Lightweight |
|
ML4PS Workshop |
2025 |
Lightweight |
Quick Start
from abstracts_explorer.plugin import (
get_plugin,
list_plugins,
LightweightPaper,
)
# List available plugins
for plugin_info in list_plugins():
print(f"{plugin_info['name']}: {plugin_info['description']}")
# Download papers using a plugin
plugin = get_plugin("neurips")
papers = plugin.download(year=2025)
Creating a Custom Plugin
from abstracts_explorer.plugin import (
LightweightDownloaderPlugin,
register_plugin,
)
class MyConferencePlugin(LightweightDownloaderPlugin):
plugin_name = "myconf"
plugin_description = "My Conference Downloader"
supported_years = [2025]
def download(self, year=None, output_path=None, force_download=False, **kwargs):
self.validate_year(year)
# ... fetch and return paper data
return papers
def get_metadata(self):
return {
"name": self.plugin_name,
"description": self.plugin_description,
"supported_years": self.supported_years,
}
register_plugin(MyConferencePlugin())
See the Plugins Guide for detailed instructions.
API Reference
Plugin Framework
This module provides the plugin framework for extending neurips-abstracts with custom data downloaders.
The framework consists of: - Base classes for plugin implementation (DownloaderPlugin, LightweightDownloaderPlugin) - Schema conversion utilities (convert_to_lightweight_schema) - Plugin registry for managing plugins (PluginRegistry) - Pydantic models for data validation (LightweightPaper)
- class abstracts_explorer.plugin.DownloaderPlugin[source]
Bases:
ABCBase class for all downloader plugins.
Each plugin must implement the download method and provide metadata about its capabilities.
Subclasses should set
_start_yearand overrideget_url(year)to get automaticsupported_yearscomputation with a current-year availability check.- property supported_years: List[int]
Dynamically computed supported years.
Builds the range
[_start_year, current_year)and appends the current year when its data URL is already accessible (checked via a HEAD request toget_url(current_year)).
- get_url(year)[source]
Get the data URL for a specific year.
Override in subclasses to enable automatic current-year availability checking in
supported_years.- Parameters:
year (int) – Conference year
- Returns:
URL used for downloading or probing availability.
- Return type:
- Raises:
NotImplementedError – When the subclass does not provide an implementation.
- abstractmethod download(year=None, output_path=None, force_download=False, **kwargs)[source]
Download papers from the data source.
- Parameters:
- Returns:
List of validated paper objects ready for database insertion
- Return type:
- abstractmethod get_metadata()[source]
Get plugin metadata.
- Returns:
Plugin metadata including name, description, supported years, etc.
- Return type:
- validate_year(year)[source]
Validate that the requested year is supported.
- Parameters:
year (int or None) – Year to validate
- Raises:
ValueError – If year is not supported by this plugin
- Return type:
- class abstracts_explorer.plugin.LightweightDownloaderPlugin[source]
Bases:
DownloaderPluginLightweight base class for downloader plugins using simplified schema.
This plugin type uses a simpler data format that only requires essential fields, making it easier to implement new plugins. The data is automatically converted to the full NeurIPS schema when loaded into the database.
- Required fields per paper:
title (str): Paper title
authors (list): List of author names (strings) or author dicts with ‘fullname’
abstract (str): Paper abstract
session (str): Session/workshop/track name
poster_position (str): Poster position identifier
year (int): Conference year (e.g., 2025)
conference (str): Conference name (e.g., “NeurIPS”, “ICLR”)
- Optional fields per paper:
paper_pdf_url (str): URL to paper PDF
poster_image_url (str): URL to poster image
url (str): General URL (e.g., OpenReview, ArXiv)
room_name (str): Room name for presentation
keywords (list): List of keywords/tags
starttime (str): Start time (ISO format or readable string)
endtime (str): End time (ISO format or readable string)
id (int): Paper ID (auto-generated if not provided)
award (str): Award name (e.g., “Best Paper Award”)
- class abstracts_explorer.plugin.PluginRegistry[source]
Bases:
objectRegistry for managing downloader plugins.
- register(plugin)[source]
Register a new plugin.
- Parameters:
plugin (DownloaderPlugin) – Plugin instance to register
- Return type:
- get(plugin_name)[source]
Get a plugin by name.
- Parameters:
plugin_name (str) – Name of plugin to retrieve
- Returns:
Plugin instance or None if not found
- Return type:
DownloaderPlugin or None
- abstracts_explorer.plugin.register_plugin(plugin)[source]
Register a plugin with the global registry.
- Parameters:
plugin (DownloaderPlugin) – Plugin instance to register
- Return type:
- abstracts_explorer.plugin.get_plugin(plugin_name)[source]
Get a plugin from the global registry.
- Parameters:
plugin_name (str) – Name of plugin to retrieve
- Returns:
Plugin instance or None if not found
- Return type:
DownloaderPlugin or None
- abstracts_explorer.plugin.list_plugins()[source]
List all registered plugins.
- Returns:
List of plugin metadata dictionaries
- Return type:
- abstracts_explorer.plugin.list_plugin_names()[source]
List names of all registered plugins.
- Returns:
List of plugin names
- Return type:
- abstracts_explorer.plugin.convert_to_lightweight_schema(papers)[source]
Convert full NeurIPS schema to lightweight paper format.
This function extracts only the fields needed for the lightweight schema from papers in the full NeurIPS format, making it easier to work with simplified data structures.
- Parameters:
papers (list) – List of papers in full NeurIPS format with fields like: - id (int) - title or name (str) - will use ‘title’, fallback to ‘name’ - authors (list of dict with ‘fullname’ or list of str) - abstract (str) - session (str) - poster_position (str) - paper_pdf_url (str, optional) - poster_image_url (str, optional) - url (str, optional) - room_name (str, optional) - keywords (list or str, optional) - starttime (str, optional) - endtime (str, optional) - award or decision (str, optional) - year (int, optional) - conference (str, optional)
- Returns:
Papers in lightweight format with only essential fields. Authors are returned as lists of strings.
- Return type:
Examples
>>> papers = [ ... { ... 'id': 123, ... 'title': 'Deep Learning', ... 'authors': [ ... {'id': 1, 'fullname': 'John Doe', 'institution': 'MIT'}, ... {'id': 2, 'fullname': 'Jane Smith', 'institution': 'Stanford'} ... ], ... 'abstract': 'A paper about deep learning', ... 'session': 'Session A', ... 'poster_position': 'A-42', ... 'paper_pdf_url': 'https://example.com/paper.pdf', ... 'year': 2025, ... 'conference': 'NeurIPS' ... } ... ] >>> lightweight = convert_to_lightweight_schema(papers) >>> lightweight[0]['authors'] ['John Doe', 'Jane Smith']
Notes
Author objects are converted to lists of name strings
Extra NeurIPS-specific fields are dropped
‘name’ field is converted to ‘title’ if needed
Keywords are converted from string to list if needed
- class abstracts_explorer.plugin.LightweightPaper(**data)[source]
Bases:
BaseModelLightweight paper model for plugin data validation.
This model validates the simplified schema used by LightweightDownloaderPlugin. It requires only essential fields and optionally supports additional metadata.
Required Fields
- titlestr
Paper title
- authorslist
List of author names (strings)
- abstractstr
Paper abstract
- sessionstr
Session/workshop/track name
- poster_positionstr
Poster position identifier
- yearint
Conference year (e.g., 2025)
- conferencestr
Conference name (e.g., “NeurIPS”, “ML4PS”)
Optional Fields
- original_idint
Paper ID from the original source
- paper_pdf_urlstr
URL to paper PDF
- poster_image_urlstr
URL to poster image
- urlstr
General URL (OpenReview, ArXiv, etc.)
- room_namestr
Room name for presentation
- keywordslist
List of keywords/tags
- starttimestr
Start time
- endtimestr
End time
- awardstr
Award name (e.g., “Best Paper Award”)
- classmethod validate_authors(v)[source]
Ensure authors list is not empty and properly formatted.
Empty or whitespace-only author entries are silently filtered out so that a single malformed entry does not cause the entire paper to be rejected. A
ValueErroris still raised when the list is empty after filtering, or when any remaining entry contains a semicolon.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- abstracts_explorer.plugin.sanitize_author_names(authors)[source]
Filter out semicolons from author names.
Semicolons are not allowed in author names because they would interfere with the semicolon-separated format used to store authors in the database. This function replaces semicolons with spaces and normalizes whitespace.
- Parameters:
- Returns:
List of author names with semicolons replaced by spaces
- Return type:
Examples
>>> sanitize_author_names(["John Doe", "Jane; Smith", "Bob;Johnson"]) ['John Doe', 'Jane Smith', 'Bob Johnson']
>>> sanitize_author_names(["Alice"]) ['Alice']
>>> sanitize_author_names([]) []
>>> sanitize_author_names(["Multi;;Semicolons"]) ['Multi Semicolons']
Notes
This function is useful when importing data from sources that may contain semicolons in author names. The LightweightPaper model will reject author names containing semicolons during validation.
Multiple consecutive spaces are normalized to a single space.
- abstracts_explorer.plugin.validate_lightweight_paper(paper)[source]
Validate a paper dict against the lightweight schema.
- Parameters:
paper (dict) – Paper data to validate
- Returns:
Validated paper model
- Return type:
- Raises:
ValidationError – If the paper data is invalid
- abstracts_explorer.plugin.validate_lightweight_papers(papers)[source]
Validate a list of papers against the lightweight schema.
Papers that fail validation are logged as warnings and skipped rather than aborting the entire import.
- Parameters:
papers (list) – List of paper dicts to validate
- Returns:
List of validated paper models (papers that failed validation are excluded)
- Return type: