Skip to content

API Reference

This section provides comprehensive documentation for all PROVESID API modules.

Overview

PROVESID provides Python interfaces to several major chemical databases and web services:

  • PubChem API - Standard PubChem REST API with enhanced data access ✨ Enhanced
  • PubChem PUG View - Advanced property extraction from PubChem ✨ Enhanced
  • ChEMBL - Local SQLite database for bioactive drug-like compounds 🆕
  • ChEBI - Chemical Entities of Biological Interest database
  • NCI Resolver - Chemical identifier resolution
  • CAS Common Chemistry - CAS Registry data access
  • OPSIN - IUPAC name to structure conversion
  • ClassyFire - Chemical structure classification

Quick Reference

Basic Usage Pattern

All API classes follow a similar pattern:

# Import the desired API
from provesid import PubChemAPI  # Enhanced standard API
from provesid import CheMBL  # Local ChEMBL database
from provesid import ChEBI  # ChEBI database
from provesid.pubchemview import PubChemPUGViewAPI  # Advanced properties
from provesid.cascommonchem import CASCommonChem
from provesid.opsin import OPSIN
from provesid.classyfire import ClassyFireAPI

# Initialize APIs
pc = PubChemAPI()  # Standard PubChem API
pug_view = PubChemPUGViewAPI()  # Advanced PubChem properties
chembl = CheMBL()  # Local ChEMBL database (auto-downloads if needed)
chebi = ChEBI()  # ChEBI API
cas_api = CASCommonChem()
opsin = OPSIN()

# ClassyFireAPI uses static methods
result = ClassyFireAPI.submit_query("label", "CCO")

Common Data Flow

# Start with a compound name
compound_name = "caffeine"

# 1. Convert name to structure (if needed)
opsin = OPSIN()
structure = opsin.get_id(compound_name)

if structure['status'] == 'SUCCESS':
    smiles = structure['smiles']

    # 2. Get basic compound info (Enhanced PubChem API)
    pc = PubChemAPI()
    cids = pc.get_cids_by_name(compound_name)
    if cids:
        basic_info = pc.get_basic_compound_info(cids[0])
        # Direct access - no wrapper needed!
        compound = pc.get_compound_by_cid(cids[0])

    # 3. Get detailed experimental properties
    pug_view = PubChemPUGViewAPI() 
    properties = pug_view.get_compound_properties_by_smiles(smiles)

Recent API Enhancements ✨

PubChem API Improvements

Enhanced Data Access: Removed redundant wrapper structures for cleaner code:

# Before: compound["PC_Compounds"][0]
# Now: compound (direct access!)
compound = pc.get_compound_by_cid(cid)

New Search Methods: - get_cids_by_inchikey() - Search by InChI Key - Improved get_cids_by_smiles() - Returns clean lists - Multi-domain search capabilities

Enhanced Pattern Recognition in PubChem View: - Support for "log Kow = 1.19" format - Comprehensive solubility pattern matching - Improved vapor pressure extraction

    smiles = structure['smiles']

    # 2. Get detailed properties
    pubchem = PubChemPUGViewAPI()
    properties = pubchem.get_compound_properties_by_smiles(smiles)

    # 3. Get CAS information
    cas_api = CASCommonChem()
    cas_data = cas_api.smiles_to_detail(smiles)

    # 4. Get classification
    response = ClassyFireAPI.submit_query("caffeine_classification", smiles)

    # 5. Get ChEBI information
    chebi = ChEBI()
    chebi_results = chebi.search_by_name(compound_name)
    if chebi_results:
        chebi_id = chebi_results[0]['chebiId']
        chebi_entity = chebi.get_complete_entity(chebi_id)

Module Comparison

Feature PubChem API PubChem View ChEMBL ChEBI CAS Common OPSIN ClassyFire NCI Resolver
Primary Use Standard data Properties Drug-like compounds Biological entities Registry data Name→Structure Classification ID conversion
Input Types CID, Name, SMILES, InChI Key CID, Name, SMILES ChEMBL ID, Name, InChI, SMILES ChEBI ID, Name CAS, Name, SMILES IUPAC names SMILES, InChI Various IDs
Data Source REST API REST API Local SQLite REST API + SDF REST API Web service Web service Web service
Database Size 110M+ 110M+ 2.3M 190K+ 500K+ N/A N/A N/A
Output Format JSON ✨ Clean JSON, DataFrame Dict JSON, XML JSON JSON JSON, SDF, CSV JSON
Rate Limits Yes Yes None (local) Yes Unofficial Unofficial Unofficial Yes
Offline Mode No No Yes 🆕 Partial (SDF) No No No No
Batch Support Yes ✨ Enhanced Yes Yes Yes Manual Yes Manual Yes
Recent Updates Major Enhanced 🆕 New - - - - -

Authentication Requirements

Service Authentication Notes
PubChem API None Rate limits apply, enhanced data access ✨
PubChem PUG View None Rate limits apply, advanced properties
ChEMBL None Local database, no limits 🆕
ChEBI None Rate limits apply
CAS Common Chemistry None Free tier available
OPSIN None Cambridge University service
ClassyFire None Long processing times
NCI Resolver None Rate limits apply

Error Handling Best Practices

All modules implement consistent error handling:

try:
    result = api.method(parameter)

    if result:  # Check if result exists
        # Process result
        pass
    else:
        print("No data found")

except requests.exceptions.Timeout:
    print("Request timed out")
except requests.exceptions.ConnectionError:
    print("Connection error")
except Exception as e:
    print(f"Unexpected error: {e}")

Performance Considerations

Rate Limiting

  • Implement delays between requests
  • Use caching for repeated queries
  • Batch process when supported

Memory Management

  • Process large datasets in chunks
  • Clear unnecessary variables
  • Use generators for large result sets

Network Optimization

  • Set appropriate timeouts
  • Implement retry logic
  • Handle network failures gracefully

Integration Examples

Cross-Platform Validation

def validate_compound_across_platforms(identifier, id_type="name"):
    """Validate compound information across multiple platforms"""

    results = {}

    # Get structure
    if id_type == "name":
        opsin = OPSIN()
        structure = opsin.get_id(identifier)
        if structure['status'] == 'SUCCESS':
            smiles = structure['smiles']
            results['opsin'] = structure
        else:
            return None
    else:
        smiles = identifier

    # Get data from each platform
    platforms = {
        'pubchem': lambda: PubChemPUGViewAPI().get_compound_properties_by_smiles(smiles),
        'cas': lambda: CASCommonChem().smiles_to_detail(smiles),
        'nci': lambda: NCIResolverAPI().smiles_to_names(smiles),
        'chebi': lambda: ChEBI().search_by_name(identifier) if id_type == "name" else None
    }

    for platform, method in platforms.items():
        try:
            data = method()
            if data:
                results[platform] = data
        except Exception as e:
            print(f"Error with {platform}: {e}")
            results[platform] = None

    return results

Comprehensive Compound Report

def generate_compound_report(compound_name):
    """Generate comprehensive compound report"""

    report = {
        'compound_name': compound_name,
        'timestamp': datetime.now().isoformat(),
        'structure': None,
        'properties': None,
        'classification': None,
        'registry_data': None,
        'chebi_data': None
    }

    # Structure information
    opsin = OPSIN()
    structure = opsin.get_id(compound_name)
    if structure['status'] == 'SUCCESS':
        report['structure'] = structure
        smiles = structure['smiles']

        # Properties
        pubchem = PubChemPUGViewAPI()
        properties = pubchem.get_compound_properties_by_smiles(smiles)
        report['properties'] = properties

        # Registry data
        cas_api = CASCommonChem()
        cas_data = cas_api.smiles_to_detail(smiles)
        report['registry_data'] = cas_data

        # Classification (submit query - results may take time)
        response = ClassyFireAPI.submit_query(f"{compound_name}_report", smiles)
        if response and response.status_code == 200:
            query_id = response.json()['id']
            report['classification_query_id'] = query_id

        # ChEBI data
        chebi = ChEBI()
        chebi_results = chebi.search_by_name(compound_name)
        if chebi_results:
            report['chebi_data'] = chebi_results[0]
            # Get detailed entity information
            chebi_id = chebi_results[0]['chebiId']
            entity = chebi.get_complete_entity(chebi_id)
            if entity:
                report['chebi_entity'] = entity

    return report

Troubleshooting

Common Issues

  1. Timeout Errors
  2. Increase timeout parameters
  3. Check network connectivity
  4. Verify service availability

  5. Rate Limiting

  6. Implement request delays
  7. Use batch operations where available
  8. Monitor response headers

  9. Data Not Found

  10. Verify input format
  11. Try alternative identifiers
  12. Check service coverage

  13. Invalid Responses

  14. Validate input data
  15. Check API documentation for changes
  16. Implement response validation

Debug Mode

Enable debug logging for detailed troubleshooting:

import logging

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)

# Use APIs normally - detailed logs will be shown
api = PubChemPUGViewAPI()
result = api.get_compound_properties_by_name("aspirin")

Support and Resources

  • GitHub Issues: Report bugs and request features
  • Documentation: Comprehensive guides and examples
  • Community: Discussions and community support
  • API Updates: Monitor upstream API changes

For specific API documentation, see the individual module pages linked above.