Skip to content

PROVESID

A comprehensive Python package for chemical identifier resolution and experimental property extraction from multiple chemical databases and APIs.

Overview

PROVESID (PROVenance and Experimental Structuring of Identifier Data) provides unified interfaces to access chemical information from various sources including:

  • PubChem: Comprehensive chemical database with compound, substance, and bioassay information
  • PubChem PUG View: Experimental property data extraction with full reference information
  • NCI Chemical Identifier Resolver: Chemical structure identifier conversion service
  • CAS Common Chemistry: Open chemistry database from the Chemical Abstracts Service
  • ClassyFire: Chemical taxonomy and classification
  • OPSIN: Chemical name to structure conversion

Key Features

๐Ÿ” Multi-Source Chemical Data Access

  • Unified interfaces to major chemical databases
  • Consistent error handling and response formats
  • Automatic rate limiting and retry mechanisms

๐Ÿงช Experimental Property Extraction

  • Extract experimental properties from PubChem PUG View
  • Parse values, units, and full reference information
  • Generate structured DataFrames for analysis

๐Ÿ”„ Chemical Identifier Conversion

  • Convert between SMILES, InChI, CAS numbers, and names
  • Resolve chemical identifiers across different formats
  • Validate and standardize chemical structures

๐Ÿ“Š Data Processing & Analysis

  • Batch processing capabilities for large datasets
  • DataFrame output for easy integration with pandas
  • Comprehensive error handling and logging

Quick Start

Installation

pip install provesid

Basic Usage

from provesid import PubChemAPI, NCIChemicalIdentifierResolver, PubChemView

# Get compound information from PubChem
api = PubChemAPI()
compound = api.get_compound_by_cid(2244)  # Aspirin
properties = api.get_compound_properties([2244], ['MolecularWeight', 'MolecularFormula'])

# Convert chemical identifiers
resolver = NCIChemicalIdentifierResolver()
smiles = resolver.resolve('aspirin', 'smiles')
inchi = resolver.resolve('CCO', 'stdinchi')  # Ethanol SMILES to InChI

# Extract experimental properties
view = PubChemView()
melting_points = view.get_experimental_properties(2244, 'Melting Point')
df = view.experimental_properties_to_dataframe(2244, 'Melting Point')

Property Extraction Example

from provesid.pubchemview import get_experimental_properties_table

# Get a comprehensive table of experimental properties
table = get_experimental_properties_table(2244, 'Boiling Point')
print(table)
#   CID StringWithMarkup          Value Unit                    Reference
# 0  2244     139 ยฐC at 760 mmHg  139    ยฐC    J. Chem. Eng. Data 1996, 41, 1190-1193
# 1  2244     140 ยฐC              140    ยฐC    Lange's Handbook of Chemistry, 1985

API Documentation

Comprehensive API documentation is available for all modules:

  • PubChem API - Access to PubChem compound, substance, and bioassay data
  • PubChem View - Experimental property extraction and reference parsing
  • NCI Resolver - Chemical identifier conversion and validation
  • Common Chemistry - CAS Common Chemistry database access
  • ClassyFire - Chemical classification and taxonomy
  • OPSIN - Chemical name to structure conversion

Examples

Explore comprehensive tutorials for each API:

Development

PROVESID is actively developed and welcomes contributions:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use PROVESID in your research, please cite:

@software{provesid,
  title={PROVESID: A Python Package for Chemical Identifier Resolution and Property Extraction},
  author={PROVESID Team},
  year={2024},
  url={https://github.com/provesid/provesid}
}