Skip to content

NCI Chemical Identifier Resolver API

The NCI Chemical Identifier Resolver provides a comprehensive interface to the NCI CADD Group's Chemical Identifier Resolver web service for converting between different chemical structure identifiers.

provesid.resolver

Classes

NCIResolverError

Bases: Exception

Custom exception for NCI Chemical Identifier Resolver errors

Source code in src/provesid/resolver.py
7
8
9
class NCIResolverError(Exception):
    """Custom exception for NCI Chemical Identifier Resolver errors"""
    pass

NCIResolverNotFoundError

Bases: NCIResolverError

Exception raised when chemical identifier is not found

Source code in src/provesid/resolver.py
11
12
13
class NCIResolverNotFoundError(NCIResolverError):
    """Exception raised when chemical identifier is not found"""
    pass

NCIResolverTimeoutError

Bases: NCIResolverError

Exception raised when request times out

Source code in src/provesid/resolver.py
15
16
17
class NCIResolverTimeoutError(NCIResolverError):
    """Exception raised when request times out"""
    pass

NCIChemicalIdentifierResolver

A Python interface to the NCI Chemical Identifier Resolver web service

This class provides methods to interact with the NCI CADD Group's Chemical Identifier Resolver service for converting between different chemical structure identifiers.

The service can resolve various types of chemical identifiers and convert them into different representations.

URL API scheme: https://cactus.nci.nih.gov/chemical/structure/{identifier}/{representation}

Usage examples

resolver = NCIChemicalIdentifierResolver()

Convert SMILES to InChI

inchi = resolver.resolve('CCO', 'stdinchi')

Get all names for a compound

names = resolver.resolve('aspirin', 'names')

Get molecular weight

mw = resolver.resolve('caffeine', 'mw')

Get comprehensive molecular data

mol_data = resolver.get_molecular_data('50-00-0') # formaldehyde CAS

Source code in src/provesid/resolver.py
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
class NCIChemicalIdentifierResolver:
    """
    A Python interface to the NCI Chemical Identifier Resolver web service

    This class provides methods to interact with the NCI CADD Group's Chemical Identifier
    Resolver service for converting between different chemical structure identifiers.

    The service can resolve various types of chemical identifiers and convert them
    into different representations.

    URL API scheme: https://cactus.nci.nih.gov/chemical/structure/{identifier}/{representation}

    Usage examples:
        resolver = NCIChemicalIdentifierResolver()

        # Convert SMILES to InChI
        inchi = resolver.resolve('CCO', 'stdinchi')

        # Get all names for a compound
        names = resolver.resolve('aspirin', 'names')

        # Get molecular weight
        mw = resolver.resolve('caffeine', 'mw')

        # Get comprehensive molecular data
        mol_data = resolver.get_molecular_data('50-00-0')  # formaldehyde CAS
    """

    def __init__(self, base_url: str = "https://cactus.nci.nih.gov/chemical/structure", 
                 timeout: int = 30, pause_time: float = 0.1):
        """
        Initialize NCI Chemical Identifier Resolver client

        Args:
            base_url: Base URL for the NCI resolver service
            timeout: Request timeout in seconds
            pause_time: Minimum time between API calls in seconds
        """
        self.base_url = base_url.rstrip('/')
        self.timeout = timeout
        self.pause_time = pause_time
        self.last_request_time = 0

        # Available representation methods
        self.representations = {
            # Structure identifiers
            'stdinchi': 'Standard InChI',
            'stdinchikey': 'Standard InChIKey', 
            'smiles': 'Unique SMILES',
            'ficts': 'NCI/CADD FICTS identifier',
            'ficus': 'NCI/CADD FICuS identifier',
            'uuuuu': 'NCI/CADD uuuuu identifier',
            'hashisy': 'CACTVS HASHISY hashcode',
            # File formats
            'sdf': 'SD file format',
            # Names and properties
            'names': 'Chemical names list',
            'iupac_name': 'IUPAC name',
            'cas': 'CAS Registry Number',
            'mw': 'Molecular weight',
            'formula': 'Molecular formula',
            # Images
            'image': 'Chemical structure image',
            # Additional properties (may vary by compound)
            'exactmass': 'Exact mass',
            'charge': 'Formal charge',
            'h_bond_acceptor_count': 'Hydrogen bond acceptor count',
            'h_bond_donor_count': 'Hydrogen bond donor count',
            'rotor_count': 'Rotatable bond count',
            'effective_rotor_count': 'Effective rotor count',
            'ring_count': 'Ring count',
            'ringsys_count': 'Ring system count'
        }

    def _rate_limit(self):
        """Enforce rate limiting between requests"""
        if self.pause_time > 0:
            current_time = time.time()
            time_since_last = current_time - self.last_request_time
            if time_since_last < self.pause_time:
                time.sleep(self.pause_time - time_since_last)
            self.last_request_time = time.time()

    def _make_request(self, url: str) -> str:
        """
        Make HTTP request with error handling and rate limiting

        Args:
            url: Request URL

        Returns:
            Response text

        Raises:
            NCIResolverTimeoutError: If request times out
            NCIResolverNotFoundError: If identifier not found (404)
            NCIResolverError: For other HTTP errors
        """
        self._rate_limit()

        try:
            response = requests.get(url, timeout=self.timeout)

            if response.status_code == 200:
                return response.text.strip()
            elif response.status_code == 404:
                raise NCIResolverNotFoundError("Chemical identifier not found")
            elif response.status_code == 500:
                raise NCIResolverError("Internal server error")
            else:
                raise NCIResolverError(f"HTTP error {response.status_code}: {response.text}")

        except requests.Timeout:
            raise NCIResolverTimeoutError("Request timed out")
        except requests.RequestException as e:
            raise NCIResolverError(f"Request failed: {str(e)}")

    def _build_url(self, identifier: str, representation: str, xml_format: bool = False) -> str:
        """
        Build URL for the NCI resolver service

        Args:
            identifier: Chemical structure identifier
            representation: Desired representation
            xml_format: Whether to request XML format

        Returns:
            Complete URL string
        """
        # URL-encode the identifier for special characters
        encoded_identifier = quote(identifier, safe='')

        # Build URL components
        url_parts = [self.base_url, encoded_identifier, representation]

        if xml_format:
            url_parts.append('xml')

        return '/'.join(url_parts)

    def resolve(self, identifier: str, representation: str, xml_format: bool = False) -> str:
        """
        Resolve a chemical identifier to another representation

        Args:
            identifier: Input chemical identifier (name, SMILES, InChI, CAS, etc.)
            representation: Target representation (see self.representations for options)
            xml_format: Whether to request XML format response

        Returns:
            Resolved representation as string

        Raises:
            ValueError: If representation is not supported
            NCIResolverNotFoundError: If identifier cannot be resolved
            NCIResolverError: For other resolver errors
        """
        if not identifier or not identifier.strip():
            raise NCIResolverError("Empty or invalid identifier provided")

        if representation not in self.representations:
            available = ', '.join(self.representations.keys())
            raise ValueError(f"Unsupported representation '{representation}'. "
                           f"Available: {available}")

        url = self._build_url(identifier, representation, xml_format)
        return self._make_request(url)

    def get_available_representations(self) -> List[str]:
        """
        Get list of available representation types

        Returns:
            List of available representation keys
        """
        return list(self.representations.keys())

    def resolve_multiple(self, identifier: str, representations: List[str]) -> Dict[str, str]:
        """
        Resolve a single identifier to multiple representations

        Args:
            identifier: Input chemical identifier
            representations: List of target representations

        Returns:
            Dictionary mapping representation to resolved value
        """
        results = {}
        for representation in representations:
            try:
                results[representation] = self.resolve(identifier, representation)
            except NCIResolverError as e:
                results[representation] = None
                logging.warning(f"Failed to resolve {identifier} to {representation}: {e}")

        return results

    def get_molecular_data(self, identifier: str) -> Dict[str, Any]:
        """
        Get comprehensive molecular data for a chemical identifier

        This method attempts to retrieve multiple common properties and identifiers
        for a given chemical, similar to the original nci_cas_to_mol function.

        Args:
            identifier: Input chemical identifier

        Returns:
            Dictionary with molecular data and metadata
        """
        # Standard representations to retrieve
        standard_reps = [
            'stdinchi', 'stdinchikey', 'smiles', 'names', 'iupac_name', 
            'cas', 'mw', 'formula', 'ficts', 'ficus', 'uuuuu', 'hashisy'
        ]

        result = {
            'found_by': identifier,
            'success': True,
            'error': None,
            'available_data': {}
        }

        success_count = 0

        for rep in standard_reps:
            try:
                value = self.resolve(identifier, rep)

                # Process specific data types
                if rep == 'names':
                    # Split names by newline and filter empty strings
                    names_list = [name.strip() for name in value.split('\n') if name.strip()]
                    result['available_data'][rep] = names_list
                elif rep == 'mw':
                    # Try to convert molecular weight to float
                    try:
                        result['available_data'][rep] = float(value)
                    except ValueError:
                        result['available_data'][rep] = value
                else:
                    result['available_data'][rep] = value

                success_count += 1

            except NCIResolverError as e:
                result['available_data'][rep] = None
                logging.debug(f"Could not resolve {identifier} to {rep}: {e}")

        # Set overall success status
        if success_count == 0:
            result['success'] = False
            result['error'] = "No representations could be resolved"

        # Add convenience accessors for backwards compatibility
        data = result['available_data']
        result.update({
            'stdinchi': data.get('stdinchi'),
            'stdinchikey': data.get('stdinchikey'), 
            'smiles': data.get('smiles'),
            'names': data.get('names'),
            'iupac_name': data.get('iupac_name'),
            'cas': data.get('cas'),
            'mw': data.get('mw'),
            'formula': data.get('formula'),
            'ficts': data.get('ficts'),
            'ficus': data.get('ficus'), 
            'uuuuu': data.get('uuuuu'),
            'hashisy': data.get('hashisy'),
            'note': 'OK' if result['success'] else 'Error calling the NCI web API'
        })

        return result

    def get_image_url(self, identifier: str, image_format: str = 'gif', 
                     width: int = 200, height: int = 200) -> str:
        """
        Get URL for chemical structure image

        Args:
            identifier: Chemical identifier
            image_format: Image format ('gif' or 'png')
            width: Image width in pixels
            height: Image height in pixels

        Returns:
            URL for the structure image
        """
        url = self._build_url(identifier, 'image')

        # Add image format and size parameters
        params = []
        if image_format.lower() in ['gif', 'png']:
            params.append(f"format={image_format.lower()}")
        if width != 200 or height != 200:
            params.append(f"width={width}")
            params.append(f"height={height}")

        if params:
            url += '?' + '&'.join(params)

        return url

    def download_image(self, identifier: str, filename: str, 
                      image_format: str = 'gif', width: int = 200, height: int = 200) -> bool:
        """
        Download chemical structure image to file

        Args:
            identifier: Chemical identifier
            filename: Output filename
            image_format: Image format ('gif' or 'png')
            width: Image width in pixels
            height: Image height in pixels

        Returns:
            True if download successful, False otherwise
        """
        try:
            image_url = self.get_image_url(identifier, image_format, width, height)
            self._rate_limit()

            response = requests.get(image_url, timeout=self.timeout)
            response.raise_for_status()

            with open(filename, 'wb') as f:
                f.write(response.content)

            return True

        except Exception as e:
            logging.error(f"Failed to download image for {identifier}: {e}")
            return False

    def batch_resolve(self, identifiers: List[str], representation: str) -> Dict[str, str]:
        """
        Resolve multiple identifiers to a single representation

        Args:
            identifiers: List of chemical identifiers
            representation: Target representation

        Returns:
            Dictionary mapping identifier to resolved value (None if failed)
        """
        results = {}

        for identifier in identifiers:
            try:
                results[identifier] = self.resolve(identifier, representation)
            except NCIResolverError as e:
                results[identifier] = None
                logging.warning(f"Failed to resolve {identifier}: {e}")

        return results

    def is_valid_identifier(self, identifier: str) -> bool:
        """
        Check if an identifier can be resolved by the service

        Args:
            identifier: Chemical identifier to test

        Returns:
            True if identifier can be resolved, False otherwise
        """
        try:
            # Try to get SMILES as a basic test
            self.resolve(identifier, 'smiles')
            return True
        except NCIResolverError:
            return False

    def search_by_partial_name(self, partial_name: str) -> List[str]:
        """
        Search for compounds by partial name match
        Note: This is a basic implementation - the NCI service doesn't have
        a dedicated partial matching endpoint, so this tries the exact name first.

        Args:
            partial_name: Partial chemical name

        Returns:
            List of matching names (may be empty)
        """
        try:
            names = self.resolve(partial_name, 'names')
            return [name.strip() for name in names.split('\n') if name.strip()]
        except NCIResolverError:
            return []
Functions
__init__(base_url='https://cactus.nci.nih.gov/chemical/structure', timeout=30, pause_time=0.1)

Initialize NCI Chemical Identifier Resolver client

Parameters:

Name Type Description Default
base_url str

Base URL for the NCI resolver service

'https://cactus.nci.nih.gov/chemical/structure'
timeout int

Request timeout in seconds

30
pause_time float

Minimum time between API calls in seconds

0.1
Source code in src/provesid/resolver.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def __init__(self, base_url: str = "https://cactus.nci.nih.gov/chemical/structure", 
             timeout: int = 30, pause_time: float = 0.1):
    """
    Initialize NCI Chemical Identifier Resolver client

    Args:
        base_url: Base URL for the NCI resolver service
        timeout: Request timeout in seconds
        pause_time: Minimum time between API calls in seconds
    """
    self.base_url = base_url.rstrip('/')
    self.timeout = timeout
    self.pause_time = pause_time
    self.last_request_time = 0

    # Available representation methods
    self.representations = {
        # Structure identifiers
        'stdinchi': 'Standard InChI',
        'stdinchikey': 'Standard InChIKey', 
        'smiles': 'Unique SMILES',
        'ficts': 'NCI/CADD FICTS identifier',
        'ficus': 'NCI/CADD FICuS identifier',
        'uuuuu': 'NCI/CADD uuuuu identifier',
        'hashisy': 'CACTVS HASHISY hashcode',
        # File formats
        'sdf': 'SD file format',
        # Names and properties
        'names': 'Chemical names list',
        'iupac_name': 'IUPAC name',
        'cas': 'CAS Registry Number',
        'mw': 'Molecular weight',
        'formula': 'Molecular formula',
        # Images
        'image': 'Chemical structure image',
        # Additional properties (may vary by compound)
        'exactmass': 'Exact mass',
        'charge': 'Formal charge',
        'h_bond_acceptor_count': 'Hydrogen bond acceptor count',
        'h_bond_donor_count': 'Hydrogen bond donor count',
        'rotor_count': 'Rotatable bond count',
        'effective_rotor_count': 'Effective rotor count',
        'ring_count': 'Ring count',
        'ringsys_count': 'Ring system count'
    }
resolve(identifier, representation, xml_format=False)

Resolve a chemical identifier to another representation

Parameters:

Name Type Description Default
identifier str

Input chemical identifier (name, SMILES, InChI, CAS, etc.)

required
representation str

Target representation (see self.representations for options)

required
xml_format bool

Whether to request XML format response

False

Returns:

Type Description
str

Resolved representation as string

Raises:

Type Description
ValueError

If representation is not supported

NCIResolverNotFoundError

If identifier cannot be resolved

NCIResolverError

For other resolver errors

Source code in src/provesid/resolver.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def resolve(self, identifier: str, representation: str, xml_format: bool = False) -> str:
    """
    Resolve a chemical identifier to another representation

    Args:
        identifier: Input chemical identifier (name, SMILES, InChI, CAS, etc.)
        representation: Target representation (see self.representations for options)
        xml_format: Whether to request XML format response

    Returns:
        Resolved representation as string

    Raises:
        ValueError: If representation is not supported
        NCIResolverNotFoundError: If identifier cannot be resolved
        NCIResolverError: For other resolver errors
    """
    if not identifier or not identifier.strip():
        raise NCIResolverError("Empty or invalid identifier provided")

    if representation not in self.representations:
        available = ', '.join(self.representations.keys())
        raise ValueError(f"Unsupported representation '{representation}'. "
                       f"Available: {available}")

    url = self._build_url(identifier, representation, xml_format)
    return self._make_request(url)
get_available_representations()

Get list of available representation types

Returns:

Type Description
List[str]

List of available representation keys

Source code in src/provesid/resolver.py
187
188
189
190
191
192
193
194
def get_available_representations(self) -> List[str]:
    """
    Get list of available representation types

    Returns:
        List of available representation keys
    """
    return list(self.representations.keys())
resolve_multiple(identifier, representations)

Resolve a single identifier to multiple representations

Parameters:

Name Type Description Default
identifier str

Input chemical identifier

required
representations List[str]

List of target representations

required

Returns:

Type Description
Dict[str, str]

Dictionary mapping representation to resolved value

Source code in src/provesid/resolver.py
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
def resolve_multiple(self, identifier: str, representations: List[str]) -> Dict[str, str]:
    """
    Resolve a single identifier to multiple representations

    Args:
        identifier: Input chemical identifier
        representations: List of target representations

    Returns:
        Dictionary mapping representation to resolved value
    """
    results = {}
    for representation in representations:
        try:
            results[representation] = self.resolve(identifier, representation)
        except NCIResolverError as e:
            results[representation] = None
            logging.warning(f"Failed to resolve {identifier} to {representation}: {e}")

    return results
get_molecular_data(identifier)

Get comprehensive molecular data for a chemical identifier

This method attempts to retrieve multiple common properties and identifiers for a given chemical, similar to the original nci_cas_to_mol function.

Parameters:

Name Type Description Default
identifier str

Input chemical identifier

required

Returns:

Type Description
Dict[str, Any]

Dictionary with molecular data and metadata

Source code in src/provesid/resolver.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
def get_molecular_data(self, identifier: str) -> Dict[str, Any]:
    """
    Get comprehensive molecular data for a chemical identifier

    This method attempts to retrieve multiple common properties and identifiers
    for a given chemical, similar to the original nci_cas_to_mol function.

    Args:
        identifier: Input chemical identifier

    Returns:
        Dictionary with molecular data and metadata
    """
    # Standard representations to retrieve
    standard_reps = [
        'stdinchi', 'stdinchikey', 'smiles', 'names', 'iupac_name', 
        'cas', 'mw', 'formula', 'ficts', 'ficus', 'uuuuu', 'hashisy'
    ]

    result = {
        'found_by': identifier,
        'success': True,
        'error': None,
        'available_data': {}
    }

    success_count = 0

    for rep in standard_reps:
        try:
            value = self.resolve(identifier, rep)

            # Process specific data types
            if rep == 'names':
                # Split names by newline and filter empty strings
                names_list = [name.strip() for name in value.split('\n') if name.strip()]
                result['available_data'][rep] = names_list
            elif rep == 'mw':
                # Try to convert molecular weight to float
                try:
                    result['available_data'][rep] = float(value)
                except ValueError:
                    result['available_data'][rep] = value
            else:
                result['available_data'][rep] = value

            success_count += 1

        except NCIResolverError as e:
            result['available_data'][rep] = None
            logging.debug(f"Could not resolve {identifier} to {rep}: {e}")

    # Set overall success status
    if success_count == 0:
        result['success'] = False
        result['error'] = "No representations could be resolved"

    # Add convenience accessors for backwards compatibility
    data = result['available_data']
    result.update({
        'stdinchi': data.get('stdinchi'),
        'stdinchikey': data.get('stdinchikey'), 
        'smiles': data.get('smiles'),
        'names': data.get('names'),
        'iupac_name': data.get('iupac_name'),
        'cas': data.get('cas'),
        'mw': data.get('mw'),
        'formula': data.get('formula'),
        'ficts': data.get('ficts'),
        'ficus': data.get('ficus'), 
        'uuuuu': data.get('uuuuu'),
        'hashisy': data.get('hashisy'),
        'note': 'OK' if result['success'] else 'Error calling the NCI web API'
    })

    return result
get_image_url(identifier, image_format='gif', width=200, height=200)

Get URL for chemical structure image

Parameters:

Name Type Description Default
identifier str

Chemical identifier

required
image_format str

Image format ('gif' or 'png')

'gif'
width int

Image width in pixels

200
height int

Image height in pixels

200

Returns:

Type Description
str

URL for the structure image

Source code in src/provesid/resolver.py
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
def get_image_url(self, identifier: str, image_format: str = 'gif', 
                 width: int = 200, height: int = 200) -> str:
    """
    Get URL for chemical structure image

    Args:
        identifier: Chemical identifier
        image_format: Image format ('gif' or 'png')
        width: Image width in pixels
        height: Image height in pixels

    Returns:
        URL for the structure image
    """
    url = self._build_url(identifier, 'image')

    # Add image format and size parameters
    params = []
    if image_format.lower() in ['gif', 'png']:
        params.append(f"format={image_format.lower()}")
    if width != 200 or height != 200:
        params.append(f"width={width}")
        params.append(f"height={height}")

    if params:
        url += '?' + '&'.join(params)

    return url
download_image(identifier, filename, image_format='gif', width=200, height=200)

Download chemical structure image to file

Parameters:

Name Type Description Default
identifier str

Chemical identifier

required
filename str

Output filename

required
image_format str

Image format ('gif' or 'png')

'gif'
width int

Image width in pixels

200
height int

Image height in pixels

200

Returns:

Type Description
bool

True if download successful, False otherwise

Source code in src/provesid/resolver.py
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
def download_image(self, identifier: str, filename: str, 
                  image_format: str = 'gif', width: int = 200, height: int = 200) -> bool:
    """
    Download chemical structure image to file

    Args:
        identifier: Chemical identifier
        filename: Output filename
        image_format: Image format ('gif' or 'png')
        width: Image width in pixels
        height: Image height in pixels

    Returns:
        True if download successful, False otherwise
    """
    try:
        image_url = self.get_image_url(identifier, image_format, width, height)
        self._rate_limit()

        response = requests.get(image_url, timeout=self.timeout)
        response.raise_for_status()

        with open(filename, 'wb') as f:
            f.write(response.content)

        return True

    except Exception as e:
        logging.error(f"Failed to download image for {identifier}: {e}")
        return False
batch_resolve(identifiers, representation)

Resolve multiple identifiers to a single representation

Parameters:

Name Type Description Default
identifiers List[str]

List of chemical identifiers

required
representation str

Target representation

required

Returns:

Type Description
Dict[str, str]

Dictionary mapping identifier to resolved value (None if failed)

Source code in src/provesid/resolver.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
def batch_resolve(self, identifiers: List[str], representation: str) -> Dict[str, str]:
    """
    Resolve multiple identifiers to a single representation

    Args:
        identifiers: List of chemical identifiers
        representation: Target representation

    Returns:
        Dictionary mapping identifier to resolved value (None if failed)
    """
    results = {}

    for identifier in identifiers:
        try:
            results[identifier] = self.resolve(identifier, representation)
        except NCIResolverError as e:
            results[identifier] = None
            logging.warning(f"Failed to resolve {identifier}: {e}")

    return results
is_valid_identifier(identifier)

Check if an identifier can be resolved by the service

Parameters:

Name Type Description Default
identifier str

Chemical identifier to test

required

Returns:

Type Description
bool

True if identifier can be resolved, False otherwise

Source code in src/provesid/resolver.py
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
def is_valid_identifier(self, identifier: str) -> bool:
    """
    Check if an identifier can be resolved by the service

    Args:
        identifier: Chemical identifier to test

    Returns:
        True if identifier can be resolved, False otherwise
    """
    try:
        # Try to get SMILES as a basic test
        self.resolve(identifier, 'smiles')
        return True
    except NCIResolverError:
        return False
search_by_partial_name(partial_name)

Search for compounds by partial name match Note: This is a basic implementation - the NCI service doesn't have a dedicated partial matching endpoint, so this tries the exact name first.

Parameters:

Name Type Description Default
partial_name str

Partial chemical name

required

Returns:

Type Description
List[str]

List of matching names (may be empty)

Source code in src/provesid/resolver.py
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
def search_by_partial_name(self, partial_name: str) -> List[str]:
    """
    Search for compounds by partial name match
    Note: This is a basic implementation - the NCI service doesn't have
    a dedicated partial matching endpoint, so this tries the exact name first.

    Args:
        partial_name: Partial chemical name

    Returns:
        List of matching names (may be empty)
    """
    try:
        names = self.resolve(partial_name, 'names')
        return [name.strip() for name in names.split('\n') if name.strip()]
    except NCIResolverError:
        return []

Functions

nci_cas_to_mol(cas_rn)

Convert a CAS RN to a molecule data structure using the NCI web API

This function maintains compatibility with the original nci_cas_to_mol function while using the new NCIChemicalIdentifierResolver class.

Parameters:

Name Type Description Default
cas_rn str

CAS Registry Number

required

Returns:

Type Description
Dict[str, Any]

Dictionary with molecular data

Source code in src/provesid/resolver.py
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
def nci_cas_to_mol(cas_rn: str) -> Dict[str, Any]:
    """
    Convert a CAS RN to a molecule data structure using the NCI web API

    This function maintains compatibility with the original nci_cas_to_mol function
    while using the new NCIChemicalIdentifierResolver class.

    Args:
        cas_rn: CAS Registry Number

    Returns:
        Dictionary with molecular data
    """
    resolver = NCIChemicalIdentifierResolver()
    return resolver.get_molecular_data(cas_rn)

nci_id_to_mol(identifier)

Convert any chemical identifier to a molecule data structure

Parameters:

Name Type Description Default
identifier str

Chemical identifier (CAS, name, SMILES, InChI, etc.)

required

Returns:

Type Description
Dict[str, Any]

Dictionary with molecular data

Source code in src/provesid/resolver.py
429
430
431
432
433
434
435
436
437
438
439
440
def nci_id_to_mol(identifier: str) -> Dict[str, Any]:
    """
    Convert any chemical identifier to a molecule data structure

    Args:
        identifier: Chemical identifier (CAS, name, SMILES, InChI, etc.)

    Returns:
        Dictionary with molecular data
    """
    resolver = NCIChemicalIdentifierResolver()
    return resolver.get_molecular_data(identifier)

nci_resolver(input_value, output_type, timeout=30)

Simple resolver function for converting between identifier types

This function maintains compatibility with the original nci_resolver function.

Parameters:

Name Type Description Default
input_value str

Input chemical identifier

required
output_type str

Desired output representation

required
timeout int

Request timeout in seconds

30

Returns:

Type Description
Optional[str]

Resolved representation as string, None if failed

Source code in src/provesid/resolver.py
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
def nci_resolver(input_value: str, output_type: str, timeout: int = 30) -> Optional[str]:
    """
    Simple resolver function for converting between identifier types

    This function maintains compatibility with the original nci_resolver function.

    Args:
        input_value: Input chemical identifier
        output_type: Desired output representation
        timeout: Request timeout in seconds

    Returns:
        Resolved representation as string, None if failed
    """
    try:
        resolver = NCIChemicalIdentifierResolver(timeout=timeout)
        return resolver.resolve(input_value, output_type)
    except NCIResolverError:
        return None

nci_smiles_to_names(smiles)

Get chemical names for a SMILES string

Parameters:

Name Type Description Default
smiles str

SMILES string

required

Returns:

Type Description
List[str]

List of chemical names

Source code in src/provesid/resolver.py
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
def nci_smiles_to_names(smiles: str) -> List[str]:
    """
    Get chemical names for a SMILES string

    Args:
        smiles: SMILES string

    Returns:
        List of chemical names
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        names_str = resolver.resolve(smiles, 'names')
        return [name.strip() for name in names_str.split('\n') if name.strip()]
    except NCIResolverError:
        return []

nci_name_to_smiles(name)

Convert chemical name to SMILES

Parameters:

Name Type Description Default
name str

Chemical name

required

Returns:

Type Description
Optional[str]

SMILES string or None if not found

Source code in src/provesid/resolver.py
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
def nci_name_to_smiles(name: str) -> Optional[str]:
    """
    Convert chemical name to SMILES

    Args:
        name: Chemical name

    Returns:
        SMILES string or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        return resolver.resolve(name, 'smiles')
    except NCIResolverError:
        return None

nci_inchi_to_smiles(inchi)

Convert InChI to SMILES

Parameters:

Name Type Description Default
inchi str

InChI string

required

Returns:

Type Description
Optional[str]

SMILES string or None if not found

Source code in src/provesid/resolver.py
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
def nci_inchi_to_smiles(inchi: str) -> Optional[str]:
    """
    Convert InChI to SMILES

    Args:
        inchi: InChI string

    Returns:
        SMILES string or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        return resolver.resolve(inchi, 'smiles')
    except NCIResolverError:
        return None

nci_cas_to_inchi(cas_rn)

Convert CAS Registry Number to Standard InChI

Parameters:

Name Type Description Default
cas_rn str

CAS Registry Number

required

Returns:

Type Description
Optional[str]

Standard InChI string or None if not found

Source code in src/provesid/resolver.py
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
def nci_cas_to_inchi(cas_rn: str) -> Optional[str]:
    """
    Convert CAS Registry Number to Standard InChI

    Args:
        cas_rn: CAS Registry Number

    Returns:
        Standard InChI string or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        return resolver.resolve(cas_rn, 'stdinchi')
    except NCIResolverError:
        return None

nci_get_molecular_weight(identifier)

Get molecular weight for any chemical identifier

Parameters:

Name Type Description Default
identifier str

Chemical identifier

required

Returns:

Type Description
Optional[float]

Molecular weight as float or None if not found

Source code in src/provesid/resolver.py
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
def nci_get_molecular_weight(identifier: str) -> Optional[float]:
    """
    Get molecular weight for any chemical identifier

    Args:
        identifier: Chemical identifier

    Returns:
        Molecular weight as float or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        mw_str = resolver.resolve(identifier, 'mw')
        return float(mw_str)
    except (NCIResolverError, ValueError):
        return None

nci_get_formula(identifier)

Get molecular formula for any chemical identifier

Parameters:

Name Type Description Default
identifier str

Chemical identifier

required

Returns:

Type Description
Optional[str]

Molecular formula string or None if not found

Source code in src/provesid/resolver.py
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
def nci_get_formula(identifier: str) -> Optional[str]:
    """
    Get molecular formula for any chemical identifier

    Args:
        identifier: Chemical identifier

    Returns:
        Molecular formula string or None if not found
    """
    try:
        resolver = NCIChemicalIdentifierResolver()
        return resolver.resolve(identifier, 'formula')
    except NCIResolverError:
        return None

Quick Start

from provesid import NCIChemicalIdentifierResolver

# Initialize the resolver
resolver = NCIChemicalIdentifierResolver()

# Convert name to SMILES
smiles = resolver.resolve('aspirin', 'smiles')
print(f"Aspirin SMILES: {smiles}")

# Convert SMILES to InChI
inchi = resolver.resolve('CCO', 'stdinchi')  # Ethanol
print(f"Ethanol InChI: {inchi}")

# Get comprehensive molecular data
mol_data = resolver.get_molecular_data('caffeine')
print(f"Caffeine data: {mol_data}")

Supported Representations

The NCI Resolver supports conversion between numerous chemical identifier formats:

Structure Identifiers

  • smiles - Unique SMILES strings
  • stdinchi - Standard InChI identifiers
  • stdinchikey - Standard InChI Keys
  • ficts - NCI/CADD FICTS identifiers
  • ficus - NCI/CADD FICuS identifiers
  • uuuuu - NCI/CADD uuuuu identifiers
  • hashisy - CACTVS HASHISY hashcodes

Chemical Names and Properties

  • names - Chemical names list
  • iupac_name - IUPAC systematic names
  • cas - CAS Registry Numbers
  • mw - Molecular weight
  • formula - Molecular formula
  • exactmass - Exact molecular mass

Physical Properties

  • charge - Formal charge
  • h_bond_acceptor_count - Hydrogen bond acceptor count
  • h_bond_donor_count - Hydrogen bond donor count
  • rotor_count - Rotatable bond count
  • ring_count - Ring count

File Formats

  • sdf - Structure Data File format
  • image - Chemical structure images

Core Methods

Basic Resolution

# Single identifier conversion
smiles = resolver.resolve('aspirin', 'smiles')
inchi = resolver.resolve('50-78-2', 'stdinchi')  # CAS to InChI
formula = resolver.resolve('CCO', 'formula')     # SMILES to formula

Multiple Representations

# Get multiple representations at once
representations = ['smiles', 'stdinchi', 'mw', 'formula']
results = resolver.resolve_multiple('caffeine', representations)

print(results)
# {
#     'smiles': 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C',
#     'stdinchi': 'InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3',
#     'mw': '194.1906',
#     'formula': 'C8H10N4O2'
# }

Comprehensive Molecular Data

# Get extensive molecular information
mol_data = resolver.get_molecular_data('aspirin')

# Access individual properties
print(f"SMILES: {mol_data['smiles']}")
print(f"Formula: {mol_data['formula']}")
print(f"Molecular Weight: {mol_data['mw']}")
print(f"Names: {mol_data['names']}")
print(f"CAS Number: {mol_data['cas']}")

Convenience Functions

The module provides simple functions for common conversions:

CAS Number Conversions

from provesid.resolver import nci_cas_to_mol, nci_cas_to_inchi

# Convert CAS to comprehensive molecular data
mol_data = nci_cas_to_mol('50-78-2')  # Aspirin CAS
print(mol_data['formula'])  # C9H8O4

# Convert CAS to InChI
inchi = nci_cas_to_inchi('64-17-5')  # Ethanol CAS

Name-Based Conversions

from provesid.resolver import nci_name_to_smiles, nci_get_molecular_weight

# Convert name to SMILES
smiles = nci_name_to_smiles('caffeine')

# Get molecular weight from name
mw = nci_get_molecular_weight('water')
print(f"Water MW: {mw}")  # ~18.015

SMILES Conversions

from provesid.resolver import nci_smiles_to_names, nci_inchi_to_smiles

# Get names for a SMILES string
names = nci_smiles_to_names('CCO')  # Ethanol
print(names)  # ['ethanol', 'ethyl alcohol', ...]

# Convert InChI to SMILES
smiles = nci_inchi_to_smiles('InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3')

Batch Processing

Process Multiple Identifiers

# Batch resolve multiple compounds to SMILES
compounds = ['aspirin', 'caffeine', 'ibuprofen', 'water']
smiles_results = resolver.batch_resolve(compounds, 'smiles')

for compound, smiles in smiles_results.items():
    if smiles:
        print(f"{compound}: {smiles}")
    else:
        print(f"{compound}: Not found")

Validation and Filtering

# Check which identifiers are valid
test_compounds = ['aspirin', 'invalid_name_xyz', 'caffeine', 'fake_compound']
valid_compounds = []

for compound in test_compounds:
    if resolver.is_valid_identifier(compound):
        valid_compounds.append(compound)
        print(f"✓ {compound}")
    else:
        print(f"✗ {compound}")

print(f"Valid compounds: {valid_compounds}")

Working with Images

Generate Structure Images

# Get image URL for a compound
image_url = resolver.get_image_url('aspirin', image_format='png', width=400, height=400)
print(f"Aspirin structure image: {image_url}")

# Download image to file
success = resolver.download_image('caffeine', 'caffeine_structure.png')
if success:
    print("Image downloaded successfully")

Error Handling

The module provides specific exception types for different error conditions:

from provesid.resolver import (
    NCIResolverError, 
    NCIResolverNotFoundError, 
    NCIResolverTimeoutError
)

try:
    smiles = resolver.resolve('definitely_not_a_chemical', 'smiles')
except NCIResolverNotFoundError:
    print("Chemical identifier not found")
except NCIResolverTimeoutError:
    print("Request timed out")
except NCIResolverError as e:
    print(f"General resolver error: {e}")

Advanced Usage

Custom Configuration

# Configure with custom settings
custom_resolver = NCIChemicalIdentifierResolver(
    base_url="https://cactus.nci.nih.gov/chemical/structure",
    timeout=60,  # Longer timeout
    pause_time=0.5  # Slower requests
)

# Use custom resolver
result = custom_resolver.resolve('aspirin', 'smiles')

Partial Name Searching

# Search for compounds by partial name
matches = resolver.search_by_partial_name('acetyl')
print(f"Found compounds with 'acetyl': {matches}")

Available Representations

# Get list of all available representations
representations = resolver.get_available_representations()
print(f"Available formats: {representations}")

Integration Examples

Combine with PubChem Data

from provesid import PubChemAPI, NCIChemicalIdentifierResolver

def cross_validate_identifiers(compound_name):
    """Validate identifier across multiple services"""
    resolver = NCIChemicalIdentifierResolver()
    api = PubChemAPI()

    # Get SMILES from NCI
    nci_smiles = resolver.resolve(compound_name, 'smiles')

    # Get compound from PubChem using the same name
    pubchem_cids = api.get_cids_by_name(compound_name)

    if pubchem_cids and nci_smiles:
        # Get PubChem SMILES for comparison
        cid = pubchem_cids['IdentifierList']['CID'][0]
        pubchem_props = api.get_compound_properties([cid], ['ConnectivitySMILES'])
        pubchem_smiles = pubchem_props['PropertyTable']['Properties'][0]['ConnectivitySMILES']

        return {
            'name': compound_name,
            'nci_smiles': nci_smiles,
            'pubchem_smiles': pubchem_smiles,
            'pubchem_cid': cid,
            'match': nci_smiles == pubchem_smiles
        }

    return None

# Cross-validate aspirin
validation = cross_validate_identifiers('aspirin')
print(validation)

Data Pipeline Integration

def chemical_identifier_pipeline(identifiers, target_format='smiles'):
    """Process multiple identifiers through resolution pipeline"""
    resolver = NCIChemicalIdentifierResolver()
    results = []

    for identifier in identifiers:
        try:
            # Attempt resolution
            result = resolver.resolve(identifier, target_format)

            # Get additional data if successful
            mol_data = resolver.get_molecular_data(identifier)

            results.append({
                'input': identifier,
                'output': result,
                'formula': mol_data.get('formula'),
                'mw': mol_data.get('mw'),
                'status': 'success'
            })

        except Exception as e:
            results.append({
                'input': identifier,
                'output': None,
                'formula': None,
                'mw': None,
                'status': f'error: {str(e)}'
            })

    return results

# Process a mixed list of identifiers
identifiers = ['aspirin', 'CCO', '50-78-2', 'invalid_name']
pipeline_results = chemical_identifier_pipeline(identifiers)

Performance Considerations

Rate Limiting

The resolver includes automatic rate limiting to respect server limits:

# Default rate limiting (3 requests per second)
resolver = NCIChemicalIdentifierResolver()

# Slower rate for large batch jobs
slow_resolver = NCIChemicalIdentifierResolver(pause_time=1.0)

# Faster rate for development (use with caution)
fast_resolver = NCIChemicalIdentifierResolver(pause_time=0.05)

Caching Strategies

Implement caching for frequently accessed data:

import json
from pathlib import Path

class CachedResolver:
    def __init__(self, cache_file='resolver_cache.json'):
        self.resolver = NCIChemicalIdentifierResolver()
        self.cache_file = Path(cache_file)
        self.cache = self._load_cache()

    def _load_cache(self):
        if self.cache_file.exists():
            with open(self.cache_file, 'r') as f:
                return json.load(f)
        return {}

    def _save_cache(self):
        with open(self.cache_file, 'w') as f:
            json.dump(self.cache, f, indent=2)

    def resolve(self, identifier, representation):
        cache_key = f"{identifier}_{representation}"

        if cache_key in self.cache:
            return self.cache[cache_key]

        # Resolve and cache
        result = self.resolver.resolve(identifier, representation)
        self.cache[cache_key] = result
        self._save_cache()

        return result

# Use cached resolver
cached = CachedResolver()
smiles = cached.resolve('aspirin', 'smiles')  # First call - API request
smiles2 = cached.resolve('aspirin', 'smiles')  # Second call - cached

See Also