CAS Common Chemistry API Tutorial¶
This tutorial demonstrates how to use the CASCommonChem
class from the provesid
package to access chemical information from the CAS Common Chemistry database. The CAS Common Chemistry API provides access to chemical information for more than 500,000 chemical substances from CAS REGISTRY®.
1. Import and Initialize¶
First, import the class and create an instance:
from provesid import CASCommonChem
ccc = CASCommonChem()
print("CASCommonChem initialized successfully!")
print(f"Base URL: {ccc.base_url}")
CASCommonChem initialized successfully! Base URL: https://commonchemistry.cas.org/api
2. Lookup by CAS Registry Number¶
The most direct way to get chemical information is by using a CAS Registry Number (CAS RN). Let's look up some common compounds:
# Lookup water by CAS RN
water_info = ccc.cas_to_detail("7732-18-5")
print("Water (7732-18-5):")
print(f" Name: {water_info.get('name')}")
print(f" Molecular Formula: {water_info.get('molecularFormula')}")
print(f" Molecular Mass: {water_info.get('molecularMass')}")
print(f" SMILES: {water_info.get('smile')}")
print(f" InChI: {water_info.get('inchi')}")
print(f" Status: {water_info.get('status')}")
Water (7732-18-5): Name: Water Molecular Formula: H<sub>2</sub>O Molecular Mass: 18.02 SMILES: O InChI: InChI=1S/H2O/h1H2 Status: Success
# Lookup aspirin by CAS RN
aspirin_info = ccc.cas_to_detail("50-78-2")
print("Aspirin (50-78-2):")
print(f" Name: {aspirin_info.get('name')}")
print(f" Molecular Formula: {aspirin_info.get('molecularFormula')}")
print(f" Molecular Mass: {aspirin_info.get('molecularMass')}")
print(f" SMILES: {aspirin_info.get('smile')}")
print(f" Number of synonyms: {len(aspirin_info.get('synonyms', []))}")
print(f" First 5 synonyms: {aspirin_info.get('synonyms', [])[:5]}")
Aspirin (50-78-2): Name: Aspirin Molecular Formula: C<sub>9</sub>H<sub>8</sub>O<sub>4</sub> Molecular Mass: 180.16 SMILES: C(O)(=O)C1=C(OC(C)=O)C=CC=C1 Number of synonyms: 143 First 5 synonyms: ['Benzoic acid, 2-(acetyloxy)-', 'Rhodine', 'Salicylic acid acetate', '2-(Acetyloxy)benzoic acid', 'Aceticyl']
3. Search by Name¶
You can search for compounds using their common names or IUPAC names:
# Search by common name
caffeine_info = ccc.name_to_detail("caffeine")
print("Caffeine (by name search):")
print(f" CAS RN: {caffeine_info.get('rn')}")
print(f" Name: {caffeine_info.get('name')}")
print(f" Molecular Formula: {caffeine_info.get('molecularFormula')}")
print(f" Molecular Mass: {caffeine_info.get('molecularMass')}")
print(f" Status: {caffeine_info.get('status')}")
Caffeine (by name search): CAS RN: 58-08-2 Name: Caffeine Molecular Formula: C<sub>8</sub>H<sub>10</sub>N<sub>4</sub>O<sub>2</sub> Molecular Mass: 194.19 Status: Success
# Search by IUPAC name
acetone_info = ccc.name_to_detail("propan-2-one")
print("Acetone (by IUPAC name 'propan-2-one'):")
print(f" CAS RN: {acetone_info.get('rn')}")
print(f" Name: {acetone_info.get('name')}")
print(f" Molecular Formula: {acetone_info.get('molecularFormula')}")
print(f" SMILES: {acetone_info.get('smile')}")
print(f" Status: {acetone_info.get('status')}")
Acetone (by IUPAC name 'propan-2-one'): CAS RN: Name: Molecular Formula: SMILES: Status: Not found
4. Search by SMILES¶
You can also search using SMILES notation:
# Search by SMILES string
ethanol_smiles = "CCO"
ethanol_info = ccc.smiles_to_detail(ethanol_smiles)
print(f"Compound with SMILES '{ethanol_smiles}':")
print(f" CAS RN: {ethanol_info.get('rn')}")
print(f" Name: {ethanol_info.get('name')}")
print(f" Molecular Formula: {ethanol_info.get('molecularFormula')}")
print(f" Canonical SMILES: {ethanol_info.get('canonicalSmile')}")
print(f" Status: {ethanol_info.get('status')}")
Compound with SMILES 'CCO': CAS RN: Name: Molecular Formula: Canonical SMILES: Status: Not found
5. Exploring Detailed Information¶
The API returns comprehensive information about each compound. Let's explore what's available:
# Get detailed information for formaldehyde
formaldehyde_info = ccc.cas_to_detail("50-00-0")
print("Formaldehyde - Complete Information:")
print(f" CAS RN: {formaldehyde_info.get('rn')}")
print(f" Name: {formaldehyde_info.get('name')}")
print(f" Molecular Formula: {formaldehyde_info.get('molecularFormula')}")
print(f" Molecular Mass: {formaldehyde_info.get('molecularMass')}")
print(f" SMILES: {formaldehyde_info.get('smile')}")
print(f" Canonical SMILES: {formaldehyde_info.get('canonicalSmile')}")
print(f" InChI: {formaldehyde_info.get('inchi')}")
print(f" InChI Key: {formaldehyde_info.get('inchiKey')}")
print(f" Has Molfile: {formaldehyde_info.get('hasMolfile')}")
print(f" Number of images: {len(formaldehyde_info.get('images', []))}")
print(f" Number of experimental properties: {len(formaldehyde_info.get('experimentalProperties', []))}")
print(f" URI: {formaldehyde_info.get('uri')}")
Formaldehyde - Complete Information: CAS RN: 50-00-0 Name: Formaldehyde Molecular Formula: CH<sub>2</sub>O Molecular Mass: 30.03 SMILES: C=O Canonical SMILES: O=C InChI: InChI=1S/CH2O/c1-2/h1H2 InChI Key: InChIKey=WSFSSNUMVMOOMR-UHFFFAOYSA-N Has Molfile: True Number of images: 1 Number of experimental properties: 3 URI: substance/pt/50000
# Explore synonyms
synonyms = formaldehyde_info.get('synonyms', [])
print(f"\nFormaldehyde has {len(synonyms)} synonyms:")
print("First 10 synonyms:")
for i, synonym in enumerate(synonyms[:10], 1):
print(f" {i}. {synonym}")
Formaldehyde has 30 synonyms: First 10 synonyms: 1. Formaldehyde 2. BFV 3. Fannoform 4. Formalin 5. Formalith 6. Formic aldehyde 7. Formol 8. Fyde 9. Methanal 10. Methyl aldehyde
# Explore experimental properties
exp_props = formaldehyde_info.get('experimentalProperties', [])
print(f"\nFormaldehyde has {len(exp_props)} experimental properties:")
if exp_props:
print("First few experimental properties:")
for i, prop in enumerate(exp_props[:3], 1):
print(f" {i}. Property: {prop.get('property', 'N/A')}")
print(f" Value: {prop.get('value', 'N/A')}")
print(f" Units: {prop.get('units', 'N/A')}")
print()
Formaldehyde has 3 experimental properties: First few experimental properties: 1. Property: -19.5 °C Value: N/A Units: N/A 2. Property: -92 °C Value: N/A Units: N/A 3. Property: 0.8 g/cm³ Value: N/A Units: N/A
6. Error Handling¶
The API handles various error conditions gracefully. Let's see what happens with invalid inputs:
# Try an invalid CAS RN
invalid_cas = ccc.cas_to_detail("0000-00-0")
print("Invalid CAS RN (0000-00-0):")
print(f" Status: {invalid_cas.get('status')}")
print(f" Name: {invalid_cas.get('name')}")
# Try a non-existent compound name
non_existent = ccc.name_to_detail("thiscompounddoesnotexist12345")
print(f"\nNon-existent compound name:")
print(f" Status: {non_existent.get('status')}")
# Try an invalid SMILES
invalid_smiles = ccc.smiles_to_detail("INVALID_SMILES")
print(f"\nInvalid SMILES:")
print(f" Status: {invalid_smiles.get('status')}")
Invalid CAS RN (0000-00-0): Status: Invalid Request Name: Non-existent compound name: Status: Not found Non-existent compound name: Status: Not found Invalid SMILES: Status: Not found Invalid SMILES: Status: Not found
7. Common Use Cases¶
Here are some practical examples of how to use the CASCommonChem class:
# Use case 1: Get basic identifiers for a compound
def get_basic_identifiers(cas_rn):
"""Get basic chemical identifiers for a compound"""
info = ccc.cas_to_detail(cas_rn)
if info.get('status') == 'Success':
return {
'cas_rn': info.get('rn'),
'name': info.get('name'),
'formula': info.get('molecularFormula'),
'mass': info.get('molecularMass'),
'smiles': info.get('smile'),
'inchi_key': info.get('inchiKey')
}
return None
# Test with benzene
benzene_ids = get_basic_identifiers("71-43-2")
print("Benzene identifiers:")
for key, value in benzene_ids.items():
print(f" {key}: {value}")
Benzene identifiers: cas_rn: 71-43-2 name: Benzene formula: C<sub>6</sub>H<sub>6</sub> mass: 78.11 smiles: C=1C=CC=CC1 inchi_key: InChIKey=UHOVQNZJYSORNB-UHFFFAOYSA-N
# Use case 2: Find all synonyms for a compound
def get_all_synonyms(cas_rn):
"""Get all synonyms for a compound"""
info = ccc.cas_to_detail(cas_rn)
if info.get('status') == 'Success':
return {
'name': info.get('name'),
'cas_rn': info.get('rn'),
'synonyms': info.get('synonyms', [])
}
return None
# Test with glucose
glucose_synonyms = get_all_synonyms("50-99-7")
if glucose_synonyms:
print(f"Glucose ({glucose_synonyms['cas_rn']}) has {len(glucose_synonyms['synonyms'])} synonyms:")
print("Sample synonyms:")
for synonym in glucose_synonyms['synonyms'][:8]:
print(f" - {synonym}")
Glucose (50-99-7) has 54 synonyms: Sample synonyms: - <span class="text-smallcaps">D</span>-Glucose - Anhydrous dextrose - Cartose - Cerelose - Corn sugar - Dextropur - Dextrose - Dextrosol
# Use case 3: Compare multiple compounds
compounds_to_compare = ["64-17-5", "67-56-1", "78-93-3"] # Ethanol, Methanol, Butanone
print("Comparison of three compounds:")
print("-" * 70)
for cas_rn in compounds_to_compare:
info = ccc.cas_to_detail(cas_rn)
if info.get('status') == 'Success':
print(f"CAS RN: {cas_rn}")
print(f" Name: {info.get('name')}")
print(f" Formula: {info.get('molecularFormula')}")
print(f" Mass: {info.get('molecularMass')}")
print(f" SMILES: {info.get('smile')}")
print("-" * 70)
Comparison of three compounds: ---------------------------------------------------------------------- CAS RN: 64-17-5 Name: Ethanol Formula: C<sub>2</sub>H<sub>6</sub>O Mass: 46.07 SMILES: C(C)O ---------------------------------------------------------------------- CAS RN: 64-17-5 Name: Ethanol Formula: C<sub>2</sub>H<sub>6</sub>O Mass: 46.07 SMILES: C(C)O ---------------------------------------------------------------------- CAS RN: 67-56-1 Name: Methanol Formula: CH<sub>4</sub>O Mass: 32.04 SMILES: CO ---------------------------------------------------------------------- CAS RN: 67-56-1 Name: Methanol Formula: CH<sub>4</sub>O Mass: 32.04 SMILES: CO ---------------------------------------------------------------------- CAS RN: 78-93-3 Name: Methyl ethyl ketone Formula: C<sub>4</sub>H<sub>8</sub>O Mass: 72.11 SMILES: C(CC)(C)=O ---------------------------------------------------------------------- CAS RN: 78-93-3 Name: Methyl ethyl ketone Formula: C<sub>4</sub>H<sub>8</sub>O Mass: 72.11 SMILES: C(CC)(C)=O ----------------------------------------------------------------------
Summary¶
The CASCommonChem
class provides three main methods:
cas_to_detail(cas_rn)
: Look up by CAS Registry Numbername_to_detail(name)
: Search by compound name or IUPAC namesmiles_to_detail(smiles)
: Search by SMILES notation
Key Features:¶
- ✅ Access to 500,000+ chemical substances
- ✅ Comprehensive chemical data (names, formulas, structures, properties)
- ✅ Multiple search methods (CAS RN, name, SMILES)
- ✅ Robust error handling
- ✅ Rich metadata including synonyms and experimental properties
Returned Data Includes:¶
- Basic identifiers (name, CAS RN, molecular formula, mass)
- Structure information (SMILES, InChI, InChI Key)
- Synonyms and alternative names
- Experimental properties
- Images and molecular files (when available)
- Citations and references