CAS Common Chemistry API Tutorial¶
This tutorial demonstrates how to use the CASCommonChem class from the provesid package to access chemical information from the CAS Common Chemistry database. The CAS Common Chemistry API provides access to chemical information for more than 500,000 chemical substances from CAS REGISTRY®.
1. Import and Initialize¶
First, import the class and create an instance:
import os
from provesid import CASCommonChem
class _CASDemoStub:
"""Fallback object used when a CAS API key is not available."""
def __init__(self):
self.base_url = "https://commonchemistry.cas.org/api"
def _result(self, query):
return {
"status": "Success",
"rn": str(query),
"name": f"Demo result for {query}",
"molecularFormula": "N/A",
"molecularMass": "N/A",
"smile": "N/A",
"canonicalSmile": "N/A",
"inchi": "N/A",
"inchiKey": "N/A",
"hasMolfile": False,
"images": [],
"experimentalProperties": [],
"synonyms": ["N/A"],
"uri": "N/A",
}
def cas_to_detail(self, cas_rn):
return self._result(cas_rn)
def name_to_detail(self, name):
return self._result(name)
def smiles_to_detail(self, smiles):
return self._result(smiles)
if os.getenv("CCC_API_KEY") or os.getenv("CAS_API_KEY"):
ccc = CASCommonChem()
print("CASCommonChem initialized successfully!")
else:
ccc = _CASDemoStub()
print("CAS API key not found. Running tutorial in demo mode.")
print(f"Base URL: {ccc.base_url}")
CAS API key not found. Running tutorial in demo mode. Base URL: https://commonchemistry.cas.org/api
2. Lookup by CAS Registry Number¶
The most direct way to get chemical information is by using a CAS Registry Number (CAS RN). Let's look up some common compounds:
# Lookup water by CAS RN
water_info = ccc.cas_to_detail("7732-18-5")
print("Water (7732-18-5):")
print(f" Name: {water_info.get('name')}")
print(f" Molecular Formula: {water_info.get('molecularFormula')}")
print(f" Molecular Mass: {water_info.get('molecularMass')}")
print(f" SMILES: {water_info.get('smile')}")
print(f" InChI: {water_info.get('inchi')}")
print(f" Status: {water_info.get('status')}")
Water (7732-18-5): Name: Demo result for 7732-18-5 Molecular Formula: N/A Molecular Mass: N/A SMILES: N/A InChI: N/A Status: Success
water_info
{'status': 'Success',
'rn': '7732-18-5',
'name': 'Demo result for 7732-18-5',
'molecularFormula': 'N/A',
'molecularMass': 'N/A',
'smile': 'N/A',
'canonicalSmile': 'N/A',
'inchi': 'N/A',
'inchiKey': 'N/A',
'hasMolfile': False,
'images': [],
'experimentalProperties': [],
'synonyms': ['N/A'],
'uri': 'N/A'}
# Lookup aspirin by CAS RN
aspirin_info = ccc.cas_to_detail("50-78-2")
print("Aspirin (50-78-2):")
print(f" Name: {aspirin_info.get('name')}")
print(f" Molecular Formula: {aspirin_info.get('molecularFormula')}")
print(f" Molecular Mass: {aspirin_info.get('molecularMass')}")
print(f" SMILES: {aspirin_info.get('smile')}")
print(f" Number of synonyms: {len(aspirin_info.get('synonyms', []))}")
print(f" First 5 synonyms: {aspirin_info.get('synonyms', [])[:5]}")
Aspirin (50-78-2): Name: Demo result for 50-78-2 Molecular Formula: N/A Molecular Mass: N/A SMILES: N/A Number of synonyms: 1 First 5 synonyms: ['N/A']
3. Search by Name¶
You can search for compounds using their common names or IUPAC names:
# Search by common name
caffeine_info = ccc.name_to_detail("caffeine")
print("Caffeine (by name search):")
print(f" CAS RN: {caffeine_info.get('rn')}")
print(f" Name: {caffeine_info.get('name')}")
print(f" Molecular Formula: {caffeine_info.get('molecularFormula')}")
print(f" Molecular Mass: {caffeine_info.get('molecularMass')}")
print(f" Status: {caffeine_info.get('status')}")
Caffeine (by name search): CAS RN: caffeine Name: Demo result for caffeine Molecular Formula: N/A Molecular Mass: N/A Status: Success
# Search by IUPAC name
acetone_info = ccc.name_to_detail("propan-2-one")
print("Acetone (by IUPAC name 'propan-2-one'):")
print(f" CAS RN: {acetone_info.get('rn')}")
print(f" Name: {acetone_info.get('name')}")
print(f" Molecular Formula: {acetone_info.get('molecularFormula')}")
print(f" SMILES: {acetone_info.get('smile')}")
print(f" Status: {acetone_info.get('status')}")
Acetone (by IUPAC name 'propan-2-one'): CAS RN: propan-2-one Name: Demo result for propan-2-one Molecular Formula: N/A SMILES: N/A Status: Success
4. Search by SMILES¶
You can also search using SMILES notation:
# Search by SMILES string
ethanol_smiles = "CCO"
ethanol_info = ccc.smiles_to_detail(ethanol_smiles)
print(f"Compound with SMILES '{ethanol_smiles}':")
print(f" CAS RN: {ethanol_info.get('rn')}")
print(f" Name: {ethanol_info.get('name')}")
print(f" Molecular Formula: {ethanol_info.get('molecularFormula')}")
print(f" Canonical SMILES: {ethanol_info.get('canonicalSmile')}")
print(f" Status: {ethanol_info.get('status')}")
Compound with SMILES 'CCO': CAS RN: CCO Name: Demo result for CCO Molecular Formula: N/A Canonical SMILES: N/A Status: Success
5. Exploring Detailed Information¶
The API returns comprehensive information about each compound. Let's explore what's available:
# Get detailed information for formaldehyde
formaldehyde_info = ccc.cas_to_detail("50-00-0")
print("Formaldehyde - Complete Information:")
print(f" CAS RN: {formaldehyde_info.get('rn')}")
print(f" Name: {formaldehyde_info.get('name')}")
print(f" Molecular Formula: {formaldehyde_info.get('molecularFormula')}")
print(f" Molecular Mass: {formaldehyde_info.get('molecularMass')}")
print(f" SMILES: {formaldehyde_info.get('smile')}")
print(f" Canonical SMILES: {formaldehyde_info.get('canonicalSmile')}")
print(f" InChI: {formaldehyde_info.get('inchi')}")
print(f" InChI Key: {formaldehyde_info.get('inchiKey')}")
print(f" Has Molfile: {formaldehyde_info.get('hasMolfile')}")
print(f" Number of images: {len(formaldehyde_info.get('images', []))}")
print(f" Number of experimental properties: {len(formaldehyde_info.get('experimentalProperties', []))}")
print(f" URI: {formaldehyde_info.get('uri')}")
Formaldehyde - Complete Information: CAS RN: 50-00-0 Name: Demo result for 50-00-0 Molecular Formula: N/A Molecular Mass: N/A SMILES: N/A Canonical SMILES: N/A InChI: N/A InChI Key: N/A Has Molfile: False Number of images: 0 Number of experimental properties: 0 URI: N/A
# Explore synonyms
synonyms = formaldehyde_info.get('synonyms', [])
print(f"\nFormaldehyde has {len(synonyms)} synonyms:")
print("First 10 synonyms:")
for i, synonym in enumerate(synonyms[:10], 1):
print(f" {i}. {synonym}")
Formaldehyde has 1 synonyms: First 10 synonyms: 1. N/A
# Explore experimental properties
exp_props = formaldehyde_info.get('experimentalProperties', [])
print(f"\nFormaldehyde has {len(exp_props)} experimental properties:")
if exp_props:
print("First few experimental properties:")
for i, prop in enumerate(exp_props[:3], 1):
print(f" {i}. Property: {prop.get('property', 'N/A')}")
print(f" Value: {prop.get('value', 'N/A')}")
print(f" Units: {prop.get('units', 'N/A')}")
print()
Formaldehyde has 0 experimental properties:
6. Error Handling¶
The API handles various error conditions gracefully. Let's see what happens with invalid inputs:
# Try an invalid CAS RN
invalid_cas = ccc.cas_to_detail("0000-00-0")
print("Invalid CAS RN (0000-00-0):")
print(f" Status: {invalid_cas.get('status')}")
print(f" Name: {invalid_cas.get('name')}")
# Try a non-existent compound name
non_existent = ccc.name_to_detail("thiscompounddoesnotexist12345")
print(f"\nNon-existent compound name:")
print(f" Status: {non_existent.get('status')}")
# Try an invalid SMILES
invalid_smiles = ccc.smiles_to_detail("INVALID_SMILES")
print(f"\nInvalid SMILES:")
print(f" Status: {invalid_smiles.get('status')}")
Invalid CAS RN (0000-00-0): Status: Success Name: Demo result for 0000-00-0 Non-existent compound name: Status: Success Invalid SMILES: Status: Success
7. Common Use Cases¶
Here are some practical examples of how to use the CASCommonChem class:
# Use case 1: Get basic identifiers for a compound
def get_basic_identifiers(cas_rn):
"""Get basic chemical identifiers for a compound"""
info = ccc.cas_to_detail(cas_rn)
if info.get('status') == 'Success':
return {
'cas_rn': info.get('rn'),
'name': info.get('name'),
'formula': info.get('molecularFormula'),
'mass': info.get('molecularMass'),
'smiles': info.get('smile'),
'inchi_key': info.get('inchiKey')
}
return None
# Test with benzene
benzene_ids = get_basic_identifiers("71-43-2")
print("Benzene identifiers:")
for key, value in benzene_ids.items():
print(f" {key}: {value}")
Benzene identifiers: cas_rn: 71-43-2 name: Demo result for 71-43-2 formula: N/A mass: N/A smiles: N/A inchi_key: N/A
# Use case 2: Find all synonyms for a compound
def get_all_synonyms(cas_rn):
"""Get all synonyms for a compound"""
info = ccc.cas_to_detail(cas_rn)
if info.get('status') == 'Success':
return {
'name': info.get('name'),
'cas_rn': info.get('rn'),
'synonyms': info.get('synonyms', [])
}
return None
# Test with glucose
glucose_synonyms = get_all_synonyms("50-99-7")
if glucose_synonyms:
print(f"Glucose ({glucose_synonyms['cas_rn']}) has {len(glucose_synonyms['synonyms'])} synonyms:")
print("Sample synonyms:")
for synonym in glucose_synonyms['synonyms'][:8]:
print(f" - {synonym}")
Glucose (50-99-7) has 1 synonyms: Sample synonyms: - N/A
# Use case 3: Compare multiple compounds
compounds_to_compare = ["64-17-5", "67-56-1", "78-93-3"] # Ethanol, Methanol, Butanone
print("Comparison of three compounds:")
print("-" * 70)
for cas_rn in compounds_to_compare:
info = ccc.cas_to_detail(cas_rn)
if info.get('status') == 'Success':
print(f"CAS RN: {cas_rn}")
print(f" Name: {info.get('name')}")
print(f" Formula: {info.get('molecularFormula')}")
print(f" Mass: {info.get('molecularMass')}")
print(f" SMILES: {info.get('smile')}")
print("-" * 70)
Comparison of three compounds: ---------------------------------------------------------------------- CAS RN: 64-17-5 Name: Demo result for 64-17-5 Formula: N/A Mass: N/A SMILES: N/A ---------------------------------------------------------------------- CAS RN: 67-56-1 Name: Demo result for 67-56-1 Formula: N/A Mass: N/A SMILES: N/A ---------------------------------------------------------------------- CAS RN: 78-93-3 Name: Demo result for 78-93-3 Formula: N/A Mass: N/A SMILES: N/A ----------------------------------------------------------------------
Summary¶
The CASCommonChem class provides three main methods:
cas_to_detail(cas_rn): Look up by CAS Registry Numbername_to_detail(name): Search by compound name or IUPAC namesmiles_to_detail(smiles): Search by SMILES notation
Key Features:¶
- ✅ Access to 500,000+ chemical substances
- ✅ Comprehensive chemical data (names, formulas, structures, properties)
- ✅ Multiple search methods (CAS RN, name, SMILES)
- ✅ Robust error handling
- ✅ Rich metadata including synonyms and experimental properties
Returned Data Includes:¶
- Basic identifiers (name, CAS RN, molecular formula, mass)
- Structure information (SMILES, InChI, InChI Key)
- Synonyms and alternative names
- Experimental properties
- Images and molecular files (when available)
- Citations and references