Skip to content

Advanced Caching in PROVESID

PROVESID now features an advanced caching system with unlimited storage, persistent caching across sessions, size monitoring, and import/export functionality.

Key Features

🚀 Unlimited Caching

  • No more 512-entry limits: Cache as many API calls as you need
  • Persistent storage: Cache survives restarts and reinstalls
  • Automatic: Zero configuration required - just import and use

📊 Size Monitoring

  • Smart warnings: Get notified when cache exceeds 5GB (configurable)
  • Size tracking: Monitor cache size in bytes, MB, and GB
  • File counting: Track number of cached entries

💾 Export/Import

  • Backup your cache: Export to pickle or JSON files
  • Share cache files: Import cache data from shared files
  • Team collaboration: Share expensive API results with team members
  • Offline mode: Use cached data when APIs are down

🛠️ Cache Management

  • Clear when needed: Remove all cached data
  • Get statistics: Detailed cache information
  • Configure warnings: Adjust size thresholds
  • Enable/disable monitoring: Control warning behavior

Quick Start

import provesid

# All APIs now use unlimited caching automatically
pubchem_api = provesid.PubChemAPI()
nci_resolver = provesid.NCIChemicalIdentifierResolver()
cas_api = provesid.CASCommonChem()

# All API calls are cached forever
result1 = pubchem_api.get_compound_by_cid(2244)  # Cached
result2 = nci_resolver.resolve('aspirin', 'smiles')  # Cached  
result3 = cas_api.cas_to_detail('50-00-0')  # Cached

# Cache management works across all APIs
info = provesid.get_cache_info()
print(f"Cache size: {info['total_size_mb']:.2f} MB")

# Export your valuable cache (includes all API data)
provesid.export_cache('my_research_cache.pkl')

# Import shared cache (works for all APIs)
provesid.import_cache('shared_cache.pkl')

# Clear when needed (clears all API caches)
provesid.clear_cache()

Migration from Previous Versions

Before (limited cache):

api = PubChemAPI(cache_size=512)  # Limited to 512 entries

Now (unlimited cache):

api = PubChemAPI()  # Unlimited cache automatically
# No cache_size parameter needed!

Your existing code will work unchanged, but now with unlimited caching!

Cache Functions Reference

provesid.get_cache_info() -> dict

Get comprehensive cache statistics:

info = provesid.get_cache_info()
print(info)
# {
#     'cache_directory': '/path/to/cache',
#     'memory_entries': 42,
#     'disk_entries': 42, 
#     'total_size_bytes': 1048576,
#     'total_size_mb': 1.0,
#     'total_size_gb': 0.001,
#     'file_count': 42,
#     'warning_threshold_gb': 5.0,
#     'warnings_enabled': True
# }

provesid.get_cache_size() -> dict

Get detailed size information:

size = provesid.get_cache_size()
print(f"Cache: {size['mb']:.2f} MB ({size['files']} files)")

provesid.export_cache(path, format='pickle') -> bool

Export cache to file:

# Export as pickle (recommended)
success = provesid.export_cache('cache_backup.pkl')

# Export as JSON (human-readable, but limited data types)
success = provesid.export_cache('cache_backup.json', format='json')

provesid.import_cache(path, merge=True) -> bool

Import cache from file:

# Merge with existing cache
success = provesid.import_cache('cache_backup.pkl')

# Replace existing cache
success = provesid.import_cache('cache_backup.pkl', merge=False)

provesid.clear_cache()

Clear all cached data:

provesid.clear_cache()

provesid.set_cache_warning_threshold(size_gb)

Set size warning threshold:

# Warn when cache exceeds 10 GB
provesid.set_cache_warning_threshold(10.0)

provesid.enable_cache_warnings(enabled=True)

Enable/disable size warnings:

# Disable warnings
provesid.enable_cache_warnings(False)

# Re-enable warnings  
provesid.enable_cache_warnings(True)

Use Cases

1. Long Research Projects

# Start your research project
api = provesid.PubChemAPI()

# Make expensive API calls - all cached automatically
for compound in my_compound_list:
    data = api.get_compound_by_cid(compound)
    properties = api.get_compound_properties(compound, ['MolecularWeight', 'LogP'])

# Export cache at end of day
provesid.export_cache('research_day1.pkl')

# Next day: import and continue
provesid.import_cache('research_day1.pkl')
# All previous calls are cached!

2. Team Collaboration

# Team member 1: Gather data from multiple APIs
pubchem_api = provesid.PubChemAPI()
nci_resolver = provesid.NCIChemicalIdentifierResolver()

for cid in expensive_compound_list:
    pubchem_api.get_compound_by_cid(cid)

for cas in cas_number_list:
    nci_resolver.get_molecular_data(cas)

# Share the cache (includes data from all APIs)
provesid.export_cache('team_shared_cache.pkl')

# Team member 2: Use shared data
provesid.import_cache('team_shared_cache.pkl')
# Instant access to all the data without API calls!

3. Offline Development

# When online: gather data
api = provesid.PubChemAPI()
test_data = [api.get_compound_by_cid(cid) for cid in test_compounds]
provesid.export_cache('offline_cache.pkl')

# When offline: use cached data
provesid.import_cache('offline_cache.pkl')
api = provesid.PubChemAPI()
# All test compounds available from cache
result = api.get_compound_by_cid(2244)  # Works offline!

4. Cache Monitoring

import provesid

# Monitor cache size during processing
def process_compounds(compound_list):
    api = provesid.PubChemAPI()

    for i, compound in enumerate(compound_list):
        result = api.get_compound_by_cid(compound)

        # Check cache size every 100 compounds
        if i % 100 == 0:
            size = provesid.get_cache_size()
            print(f"Processed {i} compounds, cache: {size['mb']:.1f} MB")

            # Export backup every 1000 compounds
            if i % 1000 == 0 and i > 0:
                provesid.export_cache(f'backup_{i}.pkl')

Performance Benefits

Before (Limited Cache)

First 512 calls: Fast (cached)
Call 513+: Slow (cache full, LRU eviction)
After restart: Slow (cache lost)

Now (Unlimited Cache)

All calls: Fast after first time
After restart: Fast (persistent storage)
Across sessions: Fast (cache preserved)
Team sharing: Instant (import cache)

Cache Storage Location

Cache files are stored in: - Windows: %TEMP%\provesid_cache\ - macOS/Linux: /tmp/provesid_cache/

Each cached API call is stored as a separate file with metadata tracking.

Best Practices

1. Regular Backups

# Export cache regularly during long-running processes
if batch_count % 10 == 0:
    provesid.export_cache(f'backup_batch_{batch_count}.pkl')

2. Share Team Caches

# At end of data collection phase
provesid.export_cache('project_phase1_cache.pkl')
# Share this file with team members

3. Monitor Size

# Check cache size for large projects
size = provesid.get_cache_size()
if size['gb'] > 2.0:
    print(f"Large cache: {size['gb']:.2f} GB - consider archiving")

4. Clean Up When Done

# Clear cache when switching projects
provesid.clear_cache()

Troubleshooting

Cache Warnings

If you see cache size warnings:

# Option 1: Increase threshold
provesid.set_cache_warning_threshold(10.0)  # 10 GB

# Option 2: Export and clear
provesid.export_cache('archive.pkl')
provesid.clear_cache()

# Option 3: Disable warnings
provesid.enable_cache_warnings(False)

Import/Export Failures

# Always check return values
success = provesid.export_cache('backup.pkl')
if not success:
    print("Export failed - check disk space and permissions")

success = provesid.import_cache('backup.pkl') 
if not success:
    print("Import failed - check file exists and is valid")

Cache Location Issues

# Check cache location
info = provesid.get_cache_info()
print(f"Cache directory: {info['cache_directory']}")

# Verify directory is writable
import os
cache_dir = info['cache_directory']
print(f"Directory writable: {os.access(cache_dir, os.W_OK)}")

Technical Details

Cache Implementation

  • Storage: Pickle serialization for Python objects
  • Indexing: SHA256 hashes of function calls + arguments
  • Metadata: JSON tracking for size and timestamps
  • Memory: LRU memory cache backed by persistent disk storage

Security Considerations

  • Cache files contain API response data
  • Exported cache files are not encrypted
  • Consider security when sharing cache files
  • Cache directory permissions follow system defaults

Performance

  • Memory: Fast lookup for recently accessed items
  • Disk: Automatic persistence with minimal overhead
  • Network: Eliminates repeated API calls entirely
  • Startup: Quick loading of existing cache metadata