NCI Chemical Identifier Resolver API¶

The NCI Chemical Identifier Resolver provides a comprehensive interface to the NCI CADD Group's Chemical Identifier Resolver web service for converting between different chemical structure identifiers.

`provesid.resolver` ¶

Classes¶

`NCIResolverError` ¶

Bases: Exception

Custom exception for NCI Chemical Identifier Resolver errors

Source code in src/provesid/resolver.py

class NCIResolverError(Exception):
    """Custom exception for NCI Chemical Identifier Resolver errors"""
    pass

`NCIResolverNotFoundError` ¶

Bases: NCIResolverError

Exception raised when chemical identifier is not found

Source code in src/provesid/resolver.py

class NCIResolverNotFoundError(NCIResolverError):
    """Exception raised when chemical identifier is not found"""
    pass

`NCIResolverTimeoutError` ¶

Bases: NCIResolverError

Exception raised when request times out

Source code in src/provesid/resolver.py

class NCIResolverTimeoutError(NCIResolverError):
    """Exception raised when request times out"""
    pass

`NCIChemicalIdentifierResolver` ¶

A Python interface to the NCI Chemical Identifier Resolver web service

This class provides methods to interact with the NCI CADD Group's Chemical Identifier Resolver service for converting between different chemical structure identifiers.

The service can resolve various types of chemical identifiers and convert them into different representations.

URL API scheme: https://cactus.nci.nih.gov/chemical/structure/{identifier}/{representation}

Usage examples

resolver = NCIChemicalIdentifierResolver()

Convert SMILES to InChI¶

inchi = resolver.resolve('CCO', 'stdinchi')

Get all names for a compound¶

names = resolver.resolve('aspirin', 'names')

Get molecular weight¶

mw = resolver.resolve('caffeine', 'mw')

Get comprehensive molecular data¶

mol_data = resolver.get_molecular_data('50-00-0') # formaldehyde CAS

Source code in src/provesid/resolver.py

class NCIChemicalIdentifierResolver:
    """
    A Python interface to the NCI Chemical Identifier Resolver web service

    This class provides methods to interact with the NCI CADD Group's Chemical Identifier
    Resolver service for converting between different chemical structure identifiers.

    The service can resolve various types of chemical identifiers and convert them
    into different representations.

    URL API scheme: https://cactus.nci.nih.gov/chemical/structure/{identifier}/{representation}

    Usage examples:
        resolver = NCIChemicalIdentifierResolver()

        # Convert SMILES to InChI
        inchi = resolver.resolve('CCO', 'stdinchi')

        # Get all names for a compound
        names = resolver.resolve('aspirin', 'names')

        # Get molecular weight
        mw = resolver.resolve('caffeine', 'mw')

        # Get comprehensive molecular data
        mol_data = resolver.get_molecular_data('50-00-0')  # formaldehyde CAS
    """

    def __init__(self, base_url: str = "https://cactus.nci.nih.gov/chemical/structure", 
                 timeout: int = 30, pause_time: float = 0.1, use_cache: bool = True):
        """
        Initialize NCI Chemical Identifier Resolver client

        Args:
            base_url: Base URL for the NCI resolver service
            timeout: Request timeout in seconds
            pause_time: Minimum time between API calls in seconds
            use_cache: Whether to use cache for lookups (default: True). 
                      When False, skips cache lookup but still stores results.
        """
        self.base_url = base_url.rstrip('/')
        self.timeout = timeout
        self.pause_time = pause_time
        self.last_request_time = 0
        self.use_cache = use_cache

        # Available representation methods
        self.representations = {
            # Structure identifiers
            'stdinchi': 'Standard InChI',
            'stdinchikey': 'Standard InChIKey', 
            'smiles': 'Unique SMILES',
            'ficts': 'NCI/CADD FICTS identifier',
            'ficus': 'NCI/CADD FICuS identifier',
            'uuuuu': 'NCI/CADD uuuuu identifier',
            'hashisy': 'CACTVS HASHISY hashcode',
            # File formats
            'sdf': 'SD file format',
            # Names and properties
            'names': 'Chemical names list',
            'iupac_name': 'IUPAC name',
            'cas': 'CAS Registry Number',
            'mw': 'Molecular weight',
            'formula': 'Molecular formula',
            # Images
            'image': 'Chemical structure image',
            # Additional properties (may vary by compound)
            'exactmass': 'Exact mass',
            'charge': 'Formal charge',
            'h_bond_acceptor_count': 'Hydrogen bond acceptor count',
            'h_bond_donor_count': 'Hydrogen bond donor count',
            'rotor_count': 'Rotatable bond count',
            'effective_rotor_count': 'Effective rotor count',
            'ring_count': 'Ring count',
            'ringsys_count': 'Ring system count'
        }

    def clear_cache(self):
        """Clear all cached results for NCI Chemical Identifier Resolver"""
        from .cache import clear_nci_cache
        clear_nci_cache()

    def get_cache_info(self):
        """Get cache statistics for NCI Chemical Identifier Resolver cached methods"""
        from .cache import get_nci_cache_info
        return get_nci_cache_info()

    def _rate_limit(self):
        """Enforce rate limiting between requests"""
        if self.pause_time > 0:
            current_time = time.time()
            time_since_last = current_time - self.last_request_time
            if time_since_last < self.pause_time:
                time.sleep(self.pause_time - time_since_last)
            self.last_request_time = time.time()

    def _make_request(self, url: str) -> str:
        """
        Make HTTP request with error handling and rate limiting

        Args:
            url: Request URL

        Returns:
            Response text

        Raises:
            NCIResolverTimeoutError: If request times out
            NCIResolverNotFoundError: If identifier not found (404)
            NCIResolverError: For other HTTP errors
        """
        self._rate_limit()

        try:
            response = requests.get(url, timeout=self.timeout)

            if response.status_code == 200:
                return response.text.strip()
            elif response.status_code == 404:
                raise NCIResolverNotFoundError("Chemical identifier not found")
            elif response.status_code == 500:
                raise NCIResolverError("Internal server error")
            else:
                raise NCIResolverError(f"HTTP error {response.status_code}: {response.text}")

        except requests.Timeout:
            raise NCIResolverTimeoutError("Request timed out")
        except requests.RequestException as e:
            raise NCIResolverError(f"Request failed: {str(e)}")

    def _build_url(self, identifier: str, representation: str, xml_format: bool = False) -> str:
        """
        Build URL for the NCI resolver service

        Args:
            identifier: Chemical structure identifier
            representation: Desired representation
            xml_format: Whether to request XML format

        Returns:
            Complete URL string
        """
        # URL-encode the identifier for special characters
        encoded_identifier = quote(identifier, safe='')

        # Build URL components
        url_parts = [self.base_url, encoded_identifier, representation]

        if xml_format:
            url_parts.append('xml')

        return '/'.join(url_parts)

    @cached(service='nci')
    def resolve(self, identifier: str, representation: str, xml_format: bool = False) -> str:
        """
        Resolve a chemical identifier to another representation

        Args:
            identifier: Input chemical identifier (name, SMILES, InChI, CAS, etc.)
            representation: Target representation (see self.representations for options)
            xml_format: Whether to request XML format response

        Returns:
            Resolved representation as string

        Raises:
            ValueError: If representation is not supported
            NCIResolverNotFoundError: If identifier cannot be resolved
            NCIResolverError: For other resolver errors
        """
        if not identifier or not identifier.strip():
            raise NCIResolverError("Empty or invalid identifier provided")

        if representation not in self.representations:
            available = ', '.join(self.representations.keys())
            raise ValueError(f"Unsupported representation '{representation}'. "
                           f"Available: {available}")

        url = self._build_url(identifier, representation, xml_format)
        return self._make_request(url)

    def get_available_representations(self) -> List[str]:
        """
        Get list of available representation types

        Returns:
            List of available representation keys
        """
        return list(self.representations.keys())

    @cached(service='nci')
    def resolve_multiple(self, identifier: str, representations: List[str]) -> Dict[str, str]:
        """
        Resolve a single identifier to multiple representations

        Args:
            identifier: Input chemical identifier
            representations: List of target representations

        Returns:
            Dictionary mapping representation to resolved value
        """
        results = {}
        for representation in representations:
            try:
                results[representation] = self.resolve(identifier, representation)
            except NCIResolverError as e:
                results[representation] = None
                logging.warning(f"Failed to resolve {identifier} to {representation}: {e}")

        return results

    @cached(service='nci')
    def get_molecular_data(self, identifier: str) -> Dict[str, Any]:
        """
        Get comprehensive molecular data for a chemical identifier

        This method attempts to retrieve multiple common properties and identifiers
        for a given chemical, similar to the original nci_cas_to_mol function.

        Args:
            identifier: Input chemical identifier

        Returns:
            Dictionary with molecular data and metadata
        """
        # Standard representations to retrieve
        standard_reps = [
            'stdinchi', 'stdinchikey', 'smiles', 'names', 'iupac_name', 
            'cas', 'mw', 'formula', 'ficts', 'ficus', 'uuuuu', 'hashisy'
        ]

        result = {
            'found_by': identifier,
            'success': True,
            'error': None,
            'available_data': {}
        }

        success_count = 0

        for rep in standard_reps:
            try:
                value = self.resolve(identifier, rep)

                # Process specific data types
                if rep == 'names':
                    # Split names by newline and filter empty strings
                    names_list = [name.strip() for name in value.split('\n') if name.strip()]
                    result['available_data'][rep] = names_list
                elif rep == 'mw':
                    # Try to convert molecular weight to float
                    try:
                        result['available_data'][rep] = float(value)
                    except ValueError:
                        result['available_data'][rep] = value
                else:
                    result['available_data'][rep] = value

                success_count += 1

            except NCIResolverError as e:
                result['available_data'][rep] = None
                logging.debug(f"Could not resolve {identifier} to {rep}: {e}")

        # Set overall success status
        if success_count == 0:
            result['success'] = False
            result['error'] = "No representations could be resolved"

        # Add convenience accessors for backwards compatibility
        data = result['available_data']
        result.update({
            'stdinchi': data.get('stdinchi'),
            'stdinchikey': data.get('stdinchikey'), 
            'smiles': data.get('smiles'),
            'names': data.get('names'),
            'iupac_name': data.get('iupac_name'),
            'cas': data.get('cas'),
            'mw': data.get('mw'),
            'formula': data.get('formula'),
            'ficts': data.get('ficts'),
            'ficus': data.get('ficus'), 
            'uuuuu': data.get('uuuuu'),
            'hashisy': data.get('hashisy'),
            'note': 'OK' if result['success'] else 'Error calling the NCI web API'
        })

        return result

    def get_image_url(self, identifier: str, image_format: str = 'gif', 
                     width: int = 200, height: int = 200) -> str:
        """
        Get URL for chemical structure image

        Args:
            identifier: Chemical identifier
            image_format: Image format ('gif' or 'png')
            width: Image width in pixels
            height: Image height in pixels

        Returns:
            URL for the structure image
        """
        url = self._build_url(identifier, 'image')

        # Add image format and size parameters
        params = []
        if image_format.lower() in ['gif', 'png']:
            params.append(f"format={image_format.lower()}")
        if width != 200 or height != 200:
            params.append(f"width={width}")
            params.append(f"height={height}")

        if params:
            url += '?' + '&'.join(params)

        return url

    def download_image(self, identifier: str, filename: str, 
                      image_format: str = 'gif', width: int = 200, height: int = 200) -> bool:
        """
        Download chemical structure image to file

        Args:
            identifier: Chemical identifier
            filename: Output filename
            image_format: Image format ('gif' or 'png')
            width: Image width in pixels
            height: Image height in pixels

        Returns:
            True if download successful, False otherwise
        """
        try:
            image_url = self.get_image_url(identifier, image_format, width, height)
            self._rate_limit()

            response = requests.get(image_url, timeout=self.timeout)
            response.raise_for_status()

            with open(filename, 'wb') as f:
                f.write(response.content)

            return True

        except Exception as e:
            logging.error(f"Failed to download image for {identifier}: {e}")
            return False

    @cached(service='nci')
    def batch_resolve(self, identifiers: List[str], representation: str) -> Dict[str, str]:
        """
        Resolve multiple identifiers to a single representation

        Args:
            identifiers: List of chemical identifiers
            representation: Target representation

        Returns:
            Dictionary mapping identifier to resolved value (None if failed)
        """
        results = {}

        for identifier in identifiers:
            try:
                results[identifier] = self.resolve(identifier, representation)
            except NCIResolverError as e:
                results[identifier] = None
                logging.warning(f"Failed to resolve {identifier}: {e}")

        return results

    @cached(service='nci')
    def is_valid_identifier(self, identifier: str) -> bool:
        """
        Check if an identifier can be resolved by the service

        Args:
            identifier: Chemical identifier to test

        Returns:
            True if identifier can be resolved, False otherwise
        """
        try:
            # Try to get SMILES as a basic test
            self.resolve(identifier, 'smiles')
            return True
        except NCIResolverError:
            return False

    @cached(service='nci')
    def search_by_partial_name(self, partial_name: str) -> List[str]:
        """
        Search for compounds by partial name match
        Note: This is a basic implementation - the NCI service doesn't have
        a dedicated partial matching endpoint, so this tries the exact name first.

        Args:
            partial_name: Partial chemical name

        Returns:
            List of matching names (may be empty)
        """
        try:
            names = self.resolve(partial_name, 'names')
            return [name.strip() for name in names.split('\n') if name.strip()]
        except NCIResolverError:
            return []

Functions¶

`init(base_url='https://cactus.nci.nih.gov/chemical/structure', timeout=30, pause_time=0.1, use_cache=True)` ¶

Initialize NCI Chemical Identifier Resolver client

Parameters:

Name	Type	Description	Default
`base_url`	`str`	Base URL for the NCI resolver service	`'https://cactus.nci.nih.gov/chemical/structure'`
`timeout`	`int`	Request timeout in seconds	`30`
`pause_time`	`float`	Minimum time between API calls in seconds	`0.1`
`use_cache`	`bool`	Whether to use cache for lookups (default: True). When False, skips cache lookup but still stores results.	`True`

Source code in src/provesid/resolver.py

def __init__(self, base_url: str = "https://cactus.nci.nih.gov/chemical/structure", 
             timeout: int = 30, pause_time: float = 0.1, use_cache: bool = True):
    """
    Initialize NCI Chemical Identifier Resolver client

    Args:
        base_url: Base URL for the NCI resolver service
        timeout: Request timeout in seconds
        pause_time: Minimum time between API calls in seconds
        use_cache: Whether to use cache for lookups (default: True). 
                  When False, skips cache lookup but still stores results.
    """
    self.base_url = base_url.rstrip('/')
    self.timeout = timeout
    self.pause_time = pause_time
    self.last_request_time = 0
    self.use_cache = use_cache

    # Available representation methods
    self.representations = {
        # Structure identifiers
        'stdinchi': 'Standard InChI',
        'stdinchikey': 'Standard InChIKey', 
        'smiles': 'Unique SMILES',
        'ficts': 'NCI/CADD FICTS identifier',
        'ficus': 'NCI/CADD FICuS identifier',
        'uuuuu': 'NCI/CADD uuuuu identifier',
        'hashisy': 'CACTVS HASHISY hashcode',
        # File formats
        'sdf': 'SD file format',
        # Names and properties
        'names': 'Chemical names list',
        'iupac_name': 'IUPAC name',
        'cas': 'CAS Registry Number',
        'mw': 'Molecular weight',
        'formula': 'Molecular formula',
        # Images
        'image': 'Chemical structure image',
        # Additional properties (may vary by compound)
        'exactmass': 'Exact mass',
        'charge': 'Formal charge',
        'h_bond_acceptor_count': 'Hydrogen bond acceptor count',
        'h_bond_donor_count': 'Hydrogen bond donor count',
        'rotor_count': 'Rotatable bond count',
        'effective_rotor_count': 'Effective rotor count',
        'ring_count': 'Ring count',
        'ringsys_count': 'Ring system count'
    }

`clear_cache()` ¶

Clear all cached results for NCI Chemical Identifier Resolver

Source code in src/provesid/resolver.py

def clear_cache(self):
    """Clear all cached results for NCI Chemical Identifier Resolver"""
    from .cache import clear_nci_cache
    clear_nci_cache()

`get_cache_info()` ¶

Get cache statistics for NCI Chemical Identifier Resolver cached methods

Source code in src/provesid/resolver.py

def get_cache_info(self):
    """Get cache statistics for NCI Chemical Identifier Resolver cached methods"""
    from .cache import get_nci_cache_info
    return get_nci_cache_info()

`resolve(identifier, representation, xml_format=False)` ¶

Resolve a chemical identifier to another representation

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Input chemical identifier (name, SMILES, InChI, CAS, etc.)	required
`representation`	`str`	Target representation (see self.representations for options)	required
`xml_format`	`bool`	Whether to request XML format response	`False`

Returns:

Type	Description
`str`	Resolved representation as string

Raises:

Type	Description
`ValueError`	If representation is not supported
`NCIResolverNotFoundError`	If identifier cannot be resolved
`NCIResolverError`	For other resolver errors

Source code in src/provesid/resolver.py

@cached(service='nci')
def resolve(self, identifier: str, representation: str, xml_format: bool = False) -> str:
    """
    Resolve a chemical identifier to another representation

    Args:
        identifier: Input chemical identifier (name, SMILES, InChI, CAS, etc.)
        representation: Target representation (see self.representations for options)
        xml_format: Whether to request XML format response

    Returns:
        Resolved representation as string

    Raises:
        ValueError: If representation is not supported
        NCIResolverNotFoundError: If identifier cannot be resolved
        NCIResolverError: For other resolver errors
    """
    if not identifier or not identifier.strip():
        raise NCIResolverError("Empty or invalid identifier provided")

    if representation not in self.representations:
        available = ', '.join(self.representations.keys())
        raise ValueError(f"Unsupported representation '{representation}'. "
                       f"Available: {available}")

    url = self._build_url(identifier, representation, xml_format)
    return self._make_request(url)

`get_available_representations()` ¶

Get list of available representation types

Returns:

Type	Description
`List[str]`	List of available representation keys

Source code in src/provesid/resolver.py

def get_available_representations(self) -> List[str]:
    """
    Get list of available representation types

    Returns:
        List of available representation keys
    """
    return list(self.representations.keys())

`resolve_multiple(identifier, representations)` ¶

Resolve a single identifier to multiple representations

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Input chemical identifier	required
`representations`	`List[str]`	List of target representations	required

Returns:

Type	Description
`Dict[str, str]`	Dictionary mapping representation to resolved value

Source code in src/provesid/resolver.py

@cached(service='nci')
def resolve_multiple(self, identifier: str, representations: List[str]) -> Dict[str, str]:
    """
    Resolve a single identifier to multiple representations

    Args:
        identifier: Input chemical identifier
        representations: List of target representations

    Returns:
        Dictionary mapping representation to resolved value
    """
    results = {}
    for representation in representations:
        try:
            results[representation] = self.resolve(identifier, representation)
        except NCIResolverError as e:
            results[representation] = None
            logging.warning(f"Failed to resolve {identifier} to {representation}: {e}")

    return results

`get_molecular_data(identifier)` ¶

Get comprehensive molecular data for a chemical identifier

This method attempts to retrieve multiple common properties and identifiers for a given chemical, similar to the original nci_cas_to_mol function.

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Input chemical identifier	required

Returns:

Type	Description
`Dict[str, Any]`	Dictionary with molecular data and metadata

Source code in src/provesid/resolver.py

@cached(service='nci')
def get_molecular_data(self, identifier: str) -> Dict[str, Any]:
    """
    Get comprehensive molecular data for a chemical identifier

    This method attempts to retrieve multiple common properties and identifiers
    for a given chemical, similar to the original nci_cas_to_mol function.

    Args:
        identifier: Input chemical identifier

    Returns:
        Dictionary with molecular data and metadata
    """
    # Standard representations to retrieve
    standard_reps = [
        'stdinchi', 'stdinchikey', 'smiles', 'names', 'iupac_name', 
        'cas', 'mw', 'formula', 'ficts', 'ficus', 'uuuuu', 'hashisy'
    ]

    result = {
        'found_by': identifier,
        'success': True,
        'error': None,
        'available_data': {}
    }

    success_count = 0

    for rep in standard_reps:
        try:
            value = self.resolve(identifier, rep)

            # Process specific data types
            if rep == 'names':
                # Split names by newline and filter empty strings
                names_list = [name.strip() for name in value.split('\n') if name.strip()]
                result['available_data'][rep] = names_list
            elif rep == 'mw':
                # Try to convert molecular weight to float
                try:
                    result['available_data'][rep] = float(value)
                except ValueError:
                    result['available_data'][rep] = value
            else:
                result['available_data'][rep] = value

            success_count += 1

        except NCIResolverError as e:
            result['available_data'][rep] = None
            logging.debug(f"Could not resolve {identifier} to {rep}: {e}")

    # Set overall success status
    if success_count == 0:
        result['success'] = False
        result['error'] = "No representations could be resolved"

    # Add convenience accessors for backwards compatibility
    data = result['available_data']
    result.update({
        'stdinchi': data.get('stdinchi'),
        'stdinchikey': data.get('stdinchikey'), 
        'smiles': data.get('smiles'),
        'names': data.get('names'),
        'iupac_name': data.get('iupac_name'),
        'cas': data.get('cas'),
        'mw': data.get('mw'),
        'formula': data.get('formula'),
        'ficts': data.get('ficts'),
        'ficus': data.get('ficus'), 
        'uuuuu': data.get('uuuuu'),
        'hashisy': data.get('hashisy'),
        'note': 'OK' if result['success'] else 'Error calling the NCI web API'
    })

    return result

`get_image_url(identifier, image_format='gif', width=200, height=200)` ¶

Get URL for chemical structure image

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Chemical identifier	required
`image_format`	`str`	Image format ('gif' or 'png')	`'gif'`
`width`	`int`	Image width in pixels	`200`
`height`	`int`	Image height in pixels	`200`

Returns:

Type	Description
`str`	URL for the structure image

Source code in src/provesid/resolver.py

def get_image_url(self, identifier: str, image_format: str = 'gif', 
                 width: int = 200, height: int = 200) -> str:
    """
    Get URL for chemical structure image

    Args:
        identifier: Chemical identifier
        image_format: Image format ('gif' or 'png')
        width: Image width in pixels
        height: Image height in pixels

    Returns:
        URL for the structure image
    """
    url = self._build_url(identifier, 'image')

    # Add image format and size parameters
    params = []
    if image_format.lower() in ['gif', 'png']:
        params.append(f"format={image_format.lower()}")
    if width != 200 or height != 200:
        params.append(f"width={width}")
        params.append(f"height={height}")

    if params:
        url += '?' + '&'.join(params)

    return url

`download_image(identifier, filename, image_format='gif', width=200, height=200)` ¶

Download chemical structure image to file

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Chemical identifier	required
`filename`	`str`	Output filename	required
`image_format`	`str`	Image format ('gif' or 'png')	`'gif'`
`width`	`int`	Image width in pixels	`200`
`height`	`int`	Image height in pixels	`200`

Returns:

Type	Description
`bool`	True if download successful, False otherwise

Source code in src/provesid/resolver.py

def download_image(self, identifier: str, filename: str, 
                  image_format: str = 'gif', width: int = 200, height: int = 200) -> bool:
    """
    Download chemical structure image to file

    Args:
        identifier: Chemical identifier
        filename: Output filename
        image_format: Image format ('gif' or 'png')
        width: Image width in pixels
        height: Image height in pixels

    Returns:
        True if download successful, False otherwise
    """
    try:
        image_url = self.get_image_url(identifier, image_format, width, height)
        self._rate_limit()

        response = requests.get(image_url, timeout=self.timeout)
        response.raise_for_status()

        with open(filename, 'wb') as f:
            f.write(response.content)

        return True

    except Exception as e:
        logging.error(f"Failed to download image for {identifier}: {e}")
        return False

`batch_resolve(identifiers, representation)` ¶

Resolve multiple identifiers to a single representation

Parameters:

Name	Type	Description	Default
`identifiers`	`List[str]`	List of chemical identifiers	required
`representation`	`str`	Target representation	required

Returns:

Type	Description
`Dict[str, str]`	Dictionary mapping identifier to resolved value (None if failed)

Source code in src/provesid/resolver.py

@cached(service='nci')
def batch_resolve(self, identifiers: List[str], representation: str) -> Dict[str, str]:
    """
    Resolve multiple identifiers to a single representation

    Args:
        identifiers: List of chemical identifiers
        representation: Target representation

    Returns:
        Dictionary mapping identifier to resolved value (None if failed)
    """
    results = {}

    for identifier in identifiers:
        try:
            results[identifier] = self.resolve(identifier, representation)
        except NCIResolverError as e:
            results[identifier] = None
            logging.warning(f"Failed to resolve {identifier}: {e}")

    return results

`is_valid_identifier(identifier)` ¶

Check if an identifier can be resolved by the service

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Chemical identifier to test	required

Returns:

Type	Description
`bool`	True if identifier can be resolved, False otherwise

Source code in src/provesid/resolver.py

@cached(service='nci')
def is_valid_identifier(self, identifier: str) -> bool:
    """
    Check if an identifier can be resolved by the service

    Args:
        identifier: Chemical identifier to test

    Returns:
        True if identifier can be resolved, False otherwise
    """
    try:
        # Try to get SMILES as a basic test
        self.resolve(identifier, 'smiles')
        return True
    except NCIResolverError:
        return False

`search_by_partial_name(partial_name)` ¶

Search for compounds by partial name match Note: This is a basic implementation - the NCI service doesn't have a dedicated partial matching endpoint, so this tries the exact name first.

Parameters:

Name	Type	Description	Default
`partial_name`	`str`	Partial chemical name	required

Returns:

Type	Description
`List[str]`	List of matching names (may be empty)

Source code in src/provesid/resolver.py

@cached(service='nci')
def search_by_partial_name(self, partial_name: str) -> List[str]:
    """
    Search for compounds by partial name match
    Note: This is a basic implementation - the NCI service doesn't have
    a dedicated partial matching endpoint, so this tries the exact name first.

    Args:
        partial_name: Partial chemical name

    Returns:
        List of matching names (may be empty)
    """
    try:
        names = self.resolve(partial_name, 'names')
        return [name.strip() for name in names.split('\n') if name.strip()]
    except NCIResolverError:
        return []

Functions¶

`nci_cas_to_mol(cas_rn)` ¶

Convert a CAS RN to a molecule data structure using the NCI web API

This function maintains compatibility with the original nci_cas_to_mol function while using the new NCIChemicalIdentifierResolver class.

Parameters:

Name	Type	Description	Default
`cas_rn`	`str`	CAS Registry Number	required

Returns:

Type	Description
`Dict[str, Any]`	Dictionary with molecular data

Source code in src/provesid/resolver.py

@cached(service='nci')
def nci_cas_to_mol(cas_rn: str) -> Dict[str, Any]:
    """
    Convert a CAS RN to a molecule data structure using the NCI web API

    This function maintains compatibility with the original nci_cas_to_mol function
    while using the new NCIChemicalIdentifierResolver class.

    Args:
        cas_rn: CAS Registry Number

    Returns:
        Dictionary with molecular data
    """
    resolver = NCIChemicalIdentifierResolver()
    return resolver.get_molecular_data(cas_rn)

`nci_id_to_mol(identifier)` ¶

Convert any chemical identifier to a molecule data structure

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Chemical identifier (CAS, name, SMILES, InChI, etc.)	required

Returns:

Type	Description
`Dict[str, Any]`	Dictionary with molecular data

Source code in src/provesid/resolver.py

@cached(service='nci')
def nci_id_to_mol(identifier: str) -> Dict[str, Any]:
    """
    Convert any chemical identifier to a molecule data structure

    Args:
        identifier: Chemical identifier (CAS, name, SMILES, InChI, etc.)

    Returns:
        Dictionary with molecular data
    """
    resolver = NCIChemicalIdentifierResolver()
    return resolver.get_molecular_data(identifier)

`nci_resolver(input_value, output_type, timeout=30)` ¶

Simple resolver function for converting between identifier types

This function maintains compatibility with the original nci_resolver function.

Parameters:

Name	Type	Description	Default
`input_value`	`str`	Input chemical identifier	required
`output_type`	`str`	Desired output representation	required
`timeout`	`int`	Request timeout in seconds	`30`

Returns:

Type	Description
`Optional[str]`	Resolved representation as string, None if failed

Source code in src/provesid/resolver.py

@cached(service='nci')
def nci_resolver(input_value: str, output_type: str, timeout: int = 30) -> Optional[str]:
    """
    Simple resolver function for converting between identifier types

    This function maintains compatibility with the original nci_resolver function.

    Args:
        input_value: Input chemical identifier
        output_type: Desired output representation
        timeout: Request timeout in seconds

    Returns:
        Resolved representation as string, None if failed
    """
    try:
        resolver = NCIChemicalIdentifierResolver(timeout=timeout)
        return resolver.resolve(input_value, output_type)
    except NCIResolverError:
        return None

`nci_smiles_to_names(smiles)` ¶

Get chemical names for a SMILES string

Parameters:

Name	Type	Description	Default
`smiles`	`str`	SMILES string	required

Returns:

Type	Description
`List[str]`	List of chemical names

Source code in src/provesid/resolver.py

@cached(service='nci')
def nci_smiles_to_names(smiles: str) -> List[str]:
    """
    Get chemical names for a SMILES string

    Args:
        smiles: SMILES string

    Returns:
        List of chemical names
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        names_str = resolver.resolve(smiles, 'names')
        return [name.strip() for name in names_str.split('\n') if name.strip()]
    except NCIResolverError:
        return []

`nci_name_to_smiles(name)` ¶

Convert chemical name to SMILES

Parameters:

Name	Type	Description	Default
`name`	`str`	Chemical name	required

Returns:

Type	Description
`Optional[str]`	SMILES string or None if not found

Source code in src/provesid/resolver.py

@cached(service='nci')
def nci_name_to_smiles(name: str) -> Optional[str]:
    """
    Convert chemical name to SMILES

    Args:
        name: Chemical name

    Returns:
        SMILES string or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        return resolver.resolve(name, 'smiles')
    except NCIResolverError:
        return None

`nci_inchi_to_smiles(inchi)` ¶

Convert InChI to SMILES

Parameters:

Name	Type	Description	Default
`inchi`	`str`	InChI string	required

Returns:

Type	Description
`Optional[str]`	SMILES string or None if not found

Source code in src/provesid/resolver.py

@cached(service='nci')
def nci_inchi_to_smiles(inchi: str) -> Optional[str]:
    """
    Convert InChI to SMILES

    Args:
        inchi: InChI string

    Returns:
        SMILES string or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        return resolver.resolve(inchi, 'smiles')
    except NCIResolverError:
        return None

`nci_cas_to_inchi(cas_rn)` ¶

Convert CAS Registry Number to Standard InChI

Parameters:

Name	Type	Description	Default
`cas_rn`	`str`	CAS Registry Number	required

Returns:

Type	Description
`Optional[str]`	Standard InChI string or None if not found

Source code in src/provesid/resolver.py

@cached(service='nci')
def nci_cas_to_inchi(cas_rn: str) -> Optional[str]:
    """
    Convert CAS Registry Number to Standard InChI

    Args:
        cas_rn: CAS Registry Number

    Returns:
        Standard InChI string or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        return resolver.resolve(cas_rn, 'stdinchi')
    except NCIResolverError:
        return None

`nci_get_molecular_weight(identifier)` ¶

Get molecular weight for any chemical identifier

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Chemical identifier	required

Returns:

Type	Description
`Optional[float]`	Molecular weight as float or None if not found

Source code in src/provesid/resolver.py

@cached(service='nci')
def nci_get_molecular_weight(identifier: str) -> Optional[float]:
    """
    Get molecular weight for any chemical identifier

    Args:
        identifier: Chemical identifier

    Returns:
        Molecular weight as float or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        mw_str = resolver.resolve(identifier, 'mw')
        return float(mw_str)
    except (NCIResolverError, ValueError):
        return None

`nci_get_formula(identifier)` ¶

Get molecular formula for any chemical identifier

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Chemical identifier	required

Returns:

Type	Description
`Optional[str]`	Molecular formula string or None if not found

Source code in src/provesid/resolver.py

@cached(service='nci')
def nci_get_formula(identifier: str) -> Optional[str]:
    """
    Get molecular formula for any chemical identifier

    Args:
        identifier: Chemical identifier

    Returns:
        Molecular formula string or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        return resolver.resolve(identifier, 'formula')
    except NCIResolverError:
        return None

Quick Start¶

from provesid import NCIChemicalIdentifierResolver

# Initialize the resolver
resolver = NCIChemicalIdentifierResolver()

# Convert name to SMILES
smiles = resolver.resolve('aspirin', 'smiles')
print(f"Aspirin SMILES: {smiles}")

# Convert SMILES to InChI
inchi = resolver.resolve('CCO', 'stdinchi')  # Ethanol
print(f"Ethanol InChI: {inchi}")

# Get comprehensive molecular data
mol_data = resolver.get_molecular_data('caffeine')
print(f"Caffeine data: {mol_data}")

Supported Representations¶

The NCI Resolver supports conversion between numerous chemical identifier formats:

Structure Identifiers¶

smiles - Unique SMILES strings
stdinchi - Standard InChI identifiers
stdinchikey - Standard InChI Keys
ficts - NCI/CADD FICTS identifiers
ficus - NCI/CADD FICuS identifiers
uuuuu - NCI/CADD uuuuu identifiers
hashisy - CACTVS HASHISY hashcodes

Chemical Names and Properties¶

names - Chemical names list
iupac_name - IUPAC systematic names
cas - CAS Registry Numbers
mw - Molecular weight
formula - Molecular formula
exactmass - Exact molecular mass

Physical Properties¶

charge - Formal charge
h_bond_acceptor_count - Hydrogen bond acceptor count
h_bond_donor_count - Hydrogen bond donor count
rotor_count - Rotatable bond count
ring_count - Ring count

File Formats¶

sdf - Structure Data File format
image - Chemical structure images

Core Methods¶

Basic Resolution¶

# Single identifier conversion
smiles = resolver.resolve('aspirin', 'smiles')
inchi = resolver.resolve('50-78-2', 'stdinchi')  # CAS to InChI
formula = resolver.resolve('CCO', 'formula')     # SMILES to formula

Multiple Representations¶

# Get multiple representations at once
representations = ['smiles', 'stdinchi', 'mw', 'formula']
results = resolver.resolve_multiple('caffeine', representations)

print(results)
# {
#     'smiles': 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C',
#     'stdinchi': 'InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3',
#     'mw': '194.1906',
#     'formula': 'C8H10N4O2'
# }

Comprehensive Molecular Data¶

# Get extensive molecular information
mol_data = resolver.get_molecular_data('aspirin')

# Access individual properties
print(f"SMILES: {mol_data['smiles']}")
print(f"Formula: {mol_data['formula']}")
print(f"Molecular Weight: {mol_data['mw']}")
print(f"Names: {mol_data['names']}")
print(f"CAS Number: {mol_data['cas']}")

Convenience Functions¶

The module provides simple functions for common conversions:

CAS Number Conversions¶

from provesid.resolver import nci_cas_to_mol, nci_cas_to_inchi

# Convert CAS to comprehensive molecular data
mol_data = nci_cas_to_mol('50-78-2')  # Aspirin CAS
print(mol_data['formula'])  # C9H8O4

# Convert CAS to InChI
inchi = nci_cas_to_inchi('64-17-5')  # Ethanol CAS

Name-Based Conversions¶

from provesid.resolver import nci_name_to_smiles, nci_get_molecular_weight

# Convert name to SMILES
smiles = nci_name_to_smiles('caffeine')

# Get molecular weight from name
mw = nci_get_molecular_weight('water')
print(f"Water MW: {mw}")  # ~18.015

SMILES Conversions¶

from provesid.resolver import nci_smiles_to_names, nci_inchi_to_smiles

# Get names for a SMILES string
names = nci_smiles_to_names('CCO')  # Ethanol
print(names)  # ['ethanol', 'ethyl alcohol', ...]

# Convert InChI to SMILES
smiles = nci_inchi_to_smiles('InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3')

Batch Processing¶

Process Multiple Identifiers¶

# Batch resolve multiple compounds to SMILES
compounds = ['aspirin', 'caffeine', 'ibuprofen', 'water']
smiles_results = resolver.batch_resolve(compounds, 'smiles')

for compound, smiles in smiles_results.items():
    if smiles:
        print(f"{compound}: {smiles}")
    else:
        print(f"{compound}: Not found")

Validation and Filtering¶

# Check which identifiers are valid
test_compounds = ['aspirin', 'invalid_name_xyz', 'caffeine', 'fake_compound']
valid_compounds = []

for compound in test_compounds:
    if resolver.is_valid_identifier(compound):
        valid_compounds.append(compound)
        print(f"✓ {compound}")
    else:
        print(f"✗ {compound}")

print(f"Valid compounds: {valid_compounds}")

Working with Images¶

Generate Structure Images¶

# Get image URL for a compound
image_url = resolver.get_image_url('aspirin', image_format='png', width=400, height=400)
print(f"Aspirin structure image: {image_url}")

# Download image to file
success = resolver.download_image('caffeine', 'caffeine_structure.png')
if success:
    print("Image downloaded successfully")

Error Handling¶

The module provides specific exception types for different error conditions:

from provesid.resolver import (
    NCIResolverError, 
    NCIResolverNotFoundError, 
    NCIResolverTimeoutError
)

try:
    smiles = resolver.resolve('definitely_not_a_chemical', 'smiles')
except NCIResolverNotFoundError:
    print("Chemical identifier not found")
except NCIResolverTimeoutError:
    print("Request timed out")
except NCIResolverError as e:
    print(f"General resolver error: {e}")

Advanced Usage¶

Custom Configuration¶

# Configure with custom settings
custom_resolver = NCIChemicalIdentifierResolver(
    base_url="https://cactus.nci.nih.gov/chemical/structure",
    timeout=60,  # Longer timeout
    pause_time=0.5  # Slower requests
)

# Use custom resolver
result = custom_resolver.resolve('aspirin', 'smiles')

Partial Name Searching¶

# Search for compounds by partial name
matches = resolver.search_by_partial_name('acetyl')
print(f"Found compounds with 'acetyl': {matches}")

Available Representations¶

# Get list of all available representations
representations = resolver.get_available_representations()
print(f"Available formats: {representations}")

Integration Examples¶

Combine with PubChem Data¶

from provesid import PubChemAPI, NCIChemicalIdentifierResolver

def cross_validate_identifiers(compound_name):
    """Validate identifier across multiple services"""
    resolver = NCIChemicalIdentifierResolver()
    api = PubChemAPI()

    # Get SMILES from NCI
    nci_smiles = resolver.resolve(compound_name, 'smiles')

    # Get compound from PubChem using the same name
    pubchem_cids = api.get_cids_by_name(compound_name)

    if pubchem_cids and nci_smiles:
        # Get PubChem SMILES for comparison
        cid = pubchem_cids['IdentifierList']['CID'][0]
        pubchem_props = api.get_compound_properties([cid], ['ConnectivitySMILES'])
        pubchem_smiles = pubchem_props['PropertyTable']['Properties'][0]['ConnectivitySMILES']

        return {
            'name': compound_name,
            'nci_smiles': nci_smiles,
            'pubchem_smiles': pubchem_smiles,
            'pubchem_cid': cid,
            'match': nci_smiles == pubchem_smiles
        }

    return None

# Cross-validate aspirin
validation = cross_validate_identifiers('aspirin')
print(validation)

Data Pipeline Integration¶

def chemical_identifier_pipeline(identifiers, target_format='smiles'):
    """Process multiple identifiers through resolution pipeline"""
    resolver = NCIChemicalIdentifierResolver()
    results = []

    for identifier in identifiers:
        try:
            # Attempt resolution
            result = resolver.resolve(identifier, target_format)

            # Get additional data if successful
            mol_data = resolver.get_molecular_data(identifier)

            results.append({
                'input': identifier,
                'output': result,
                'formula': mol_data.get('formula'),
                'mw': mol_data.get('mw'),
                'status': 'success'
            })

        except Exception as e:
            results.append({
                'input': identifier,
                'output': None,
                'formula': None,
                'mw': None,
                'status': f'error: {str(e)}'
            })

    return results

# Process a mixed list of identifiers
identifiers = ['aspirin', 'CCO', '50-78-2', 'invalid_name']
pipeline_results = chemical_identifier_pipeline(identifiers)

Performance Considerations¶

Rate Limiting¶

The resolver includes automatic rate limiting to respect server limits:

# Default rate limiting (3 requests per second)
resolver = NCIChemicalIdentifierResolver()

# Slower rate for large batch jobs
slow_resolver = NCIChemicalIdentifierResolver(pause_time=1.0)

# Faster rate for development (use with caution)
fast_resolver = NCIChemicalIdentifierResolver(pause_time=0.05)

Caching Strategies¶

Implement caching for frequently accessed data:

import json
from pathlib import Path

class CachedResolver:
    def __init__(self, cache_file='resolver_cache.json'):
        self.resolver = NCIChemicalIdentifierResolver()
        self.cache_file = Path(cache_file)
        self.cache = self._load_cache()

    def _load_cache(self):
        if self.cache_file.exists():
            with open(self.cache_file, 'r') as f:
                return json.load(f)
        return {}

    def _save_cache(self):
        with open(self.cache_file, 'w') as f:
            json.dump(self.cache, f, indent=2)

    def resolve(self, identifier, representation):
        cache_key = f"{identifier}_{representation}"

        if cache_key in self.cache:
            return self.cache[cache_key]

        # Resolve and cache
        result = self.resolver.resolve(identifier, representation)
        self.cache[cache_key] = result
        self._save_cache()

        return result

# Use cached resolver
cached = CachedResolver()
smiles = cached.resolve('aspirin', 'smiles')  # First call - API request
smiles2 = cached.resolve('aspirin', 'smiles')  # Second call - cached

NCI Chemical Identifier Resolver API¶

provesid.resolver ¶

Classes¶

NCIResolverError ¶

NCIResolverNotFoundError ¶

NCIResolverTimeoutError ¶

NCIChemicalIdentifierResolver ¶

Convert SMILES to InChI¶

Get all names for a compound¶

Get molecular weight¶

Get comprehensive molecular data¶

Functions¶

__init__(base_url='https://cactus.nci.nih.gov/chemical/structure', timeout=30, pause_time=0.1, use_cache=True) ¶

clear_cache() ¶

get_cache_info() ¶

resolve(identifier, representation, xml_format=False) ¶

get_available_representations() ¶

resolve_multiple(identifier, representations) ¶

get_molecular_data(identifier) ¶

get_image_url(identifier, image_format='gif', width=200, height=200) ¶

download_image(identifier, filename, image_format='gif', width=200, height=200) ¶

batch_resolve(identifiers, representation) ¶

is_valid_identifier(identifier) ¶

search_by_partial_name(partial_name) ¶

Functions¶

nci_cas_to_mol(cas_rn) ¶

nci_id_to_mol(identifier) ¶

nci_resolver(input_value, output_type, timeout=30) ¶

nci_smiles_to_names(smiles) ¶

nci_name_to_smiles(name) ¶

nci_inchi_to_smiles(inchi) ¶

nci_cas_to_inchi(cas_rn) ¶

nci_get_molecular_weight(identifier) ¶

nci_get_formula(identifier) ¶

Quick Start¶

Supported Representations¶

Structure Identifiers¶

Chemical Names and Properties¶

Physical Properties¶

File Formats¶

Core Methods¶

Basic Resolution¶

Multiple Representations¶

Comprehensive Molecular Data¶

Convenience Functions¶

CAS Number Conversions¶

Name-Based Conversions¶

SMILES Conversions¶

Batch Processing¶

Process Multiple Identifiers¶

Validation and Filtering¶

Working with Images¶

Generate Structure Images¶

Error Handling¶

Advanced Usage¶

Custom Configuration¶

Partial Name Searching¶

Available Representations¶

Integration Examples¶

Combine with PubChem Data¶

Data Pipeline Integration¶

Performance Considerations¶

Rate Limiting¶

Caching Strategies¶

See Also¶

`provesid.resolver` ¶

`NCIResolverError` ¶

`NCIResolverNotFoundError` ¶

`NCIResolverTimeoutError` ¶

`NCIChemicalIdentifierResolver` ¶

`init(base_url='https://cactus.nci.nih.gov/chemical/structure', timeout=30, pause_time=0.1, use_cache=True)` ¶

`clear_cache()` ¶

`get_cache_info()` ¶

`resolve(identifier, representation, xml_format=False)` ¶

`get_available_representations()` ¶

`resolve_multiple(identifier, representations)` ¶

`get_molecular_data(identifier)` ¶

`get_image_url(identifier, image_format='gif', width=200, height=200)` ¶

`download_image(identifier, filename, image_format='gif', width=200, height=200)` ¶

`batch_resolve(identifiers, representation)` ¶

`is_valid_identifier(identifier)` ¶

`search_by_partial_name(partial_name)` ¶

`nci_cas_to_mol(cas_rn)` ¶

`nci_id_to_mol(identifier)` ¶

`nci_resolver(input_value, output_type, timeout=30)` ¶

`nci_smiles_to_names(smiles)` ¶

`nci_name_to_smiles(name)` ¶

`nci_inchi_to_smiles(inchi)` ¶

`nci_cas_to_inchi(cas_rn)` ¶

`nci_get_molecular_weight(identifier)` ¶

`nci_get_formula(identifier)` ¶