Batch Methods Implementation Notes¶
Summary¶
Three batch methods have been implemented for the PyCompTox Chemical Search API:
search_by_exact_batch_values(values)- Batch search by exact valuessearch_ms_ready_by_mass_range_batch(masses, error)- Batch search by mass rangessearch_ms_ready_by_dtxcid_batch(dtxcids)- Batch search MS-ready by DTXCIDs
Implementation Details¶
1. search_by_exact_batch_values¶
Endpoint: POST /chemical/search/equal/
Input Format:
- List of strings (chemical names, DTXSIDs, CAS numbers, etc.)
- Maximum 200 values per batch
- Values are joined with \\n separator
Current Behavior: - The API appears to treat the batch as a single search string - May need further investigation on the correct format
Usage:
values = ["DTXSID7020182", "Bisphenol A", "80-05-7"]
results = client.search_by_exact_batch_values(values)
2. search_ms_ready_by_mass_range_batch¶
Endpoint: POST /chemical/msready/search/by-mass/
Input Format:
Output Format: - Dictionary with mass values as keys - Each key maps to a list of DTXSID identifiers
Usage:
masses = [200.9, 201.0, 201.1]
results = client.search_ms_ready_by_mass_range_batch(masses, error=0.01)
# Returns: {"200.9": ["DTXSID1...", ...], "201.0": [...], ...}
3. search_ms_ready_by_dtxcid_batch¶
Endpoint: POST /chemical/msready/search/by-dtxcid/
Input Format: - List of DTXCID strings
Output Format: - List of MS-ready DTXSID identifiers - The API returns all MS-ready forms for all input DTXCIDs
Usage:
dtxcids = ["DTXCID30182", "DTXCID20182"]
results = client.search_ms_ready_by_dtxcid_batch(dtxcids)
# Returns: ["DTXSID1...", "DTXSID2...", ...]
Testing Results¶
Test 1: Batch Exact Search¶
- ✓ Method implemented and working
- ⚠ API behavior may not match documentation
- The API appears to concatenate values and search as one string
Test 2: Mass Range Batch¶
- ✓ Implemented correctly
- ✓ Returns dictionary with mass keys
- ✓ Each mass maps to list of DTXSIDs
Test 3: DTXCID Batch¶
- ✓ Implemented correctly
- ✓ Returns flat list of all MS-ready forms
- ✓ Successfully found 24 MS-ready forms for 3 input DTXCIDs
Known Issues¶
- search_by_exact_batch_values: The API may not support true batch searching as documented. It appears to treat the newline-separated string as a single search query. This may require:
- Further API documentation review
- Contact with API maintainers
- Alternative implementation approach
Recommendations¶
- For search_by_exact_batch_values: Consider implementing a wrapper that:
- Calls the single-value search endpoint for each value
- Aggregates results
-
Applies rate limiting between calls
-
Rate Limiting: When using batch methods, consider setting appropriate delays:
-
Error Handling: Implement retry logic for batch operations
-
Validation: All three methods include input validation for empty lists
Future Enhancements¶
- Add retry logic for failed batch operations
- Implement chunking for large batches
- Add progress callbacks for long-running batch operations
- Consider async implementation for better performance
- Add result caching to avoid duplicate API calls
Code Location¶
- Implementation:
src/pycomptox/search.py - Tests:
tests/test_batch_methods.py - Lines: ~470-600 in search.py