Advanced Caching in PROVESID¶

PROVESID now features an advanced caching system with unlimited storage, persistent caching across sessions, size monitoring, and import/export functionality.

Key Features¶

🚀 Unlimited Caching¶

No more 512-entry limits: Cache as many API calls as you need
Persistent storage: Cache survives restarts and reinstalls
Automatic: Zero configuration required - just import and use

📊 Size Monitoring¶

Smart warnings: Get notified when cache exceeds 5GB (configurable)
Size tracking: Monitor cache size in bytes, MB, and GB
File counting: Track number of cached entries

💾 Export/Import¶

Backup your cache: Export to pickle or JSON files
Share cache files: Import cache data from shared files
Team collaboration: Share expensive API results with team members
Offline mode: Use cached data when APIs are down

🛠️ Cache Management¶

Clear when needed: Remove all cached data
Get statistics: Detailed cache information
Configure warnings: Adjust size thresholds
Enable/disable monitoring: Control warning behavior

Quick Start¶

import provesid

# All APIs now use unlimited caching automatically
pubchem_api = provesid.PubChemAPI()
nci_resolver = provesid.NCIChemicalIdentifierResolver()
cas_api = provesid.CASCommonChem()

# All API calls are cached forever
result1 = pubchem_api.get_compound_by_cid(2244)  # Cached
result2 = nci_resolver.resolve('aspirin', 'smiles')  # Cached  
result3 = cas_api.cas_to_detail('50-00-0')  # Cached

# Cache management works across all APIs
info = provesid.get_cache_info()
print(f"Cache size: {info['total_size_mb']:.2f} MB")

# Export your valuable cache (includes all API data)
provesid.export_cache('my_research_cache.pkl')

# Import shared cache (works for all APIs)
provesid.import_cache('shared_cache.pkl')

# Clear when needed (clears all API caches)
provesid.clear_cache()

Migration from Previous Versions¶

Before (limited cache):

api = PubChemAPI(cache_size=512)  # Limited to 512 entries

Now (unlimited cache):

api = PubChemAPI()  # Unlimited cache automatically
# No cache_size parameter needed!

Your existing code will work unchanged, but now with unlimited caching!

Cache Functions Reference¶

`provesid.get_cache_info() -> dict`¶

Get comprehensive cache statistics:

info = provesid.get_cache_info()
print(info)
# {
#     'cache_directory': '/path/to/cache',
#     'memory_entries': 42,
#     'disk_entries': 42, 
#     'total_size_bytes': 1048576,
#     'total_size_mb': 1.0,
#     'total_size_gb': 0.001,
#     'file_count': 42,
#     'warning_threshold_gb': 5.0,
#     'warnings_enabled': True
# }

`provesid.get_cache_size() -> dict`¶

Get detailed size information:

size = provesid.get_cache_size()
print(f"Cache: {size['mb']:.2f} MB ({size['files']} files)")

`provesid.export_cache(path, format='pickle') -> bool`¶

Export cache to file:

# Export as pickle (recommended)
success = provesid.export_cache('cache_backup.pkl')

# Export as JSON (human-readable, but limited data types)
success = provesid.export_cache('cache_backup.json', format='json')

`provesid.import_cache(path, merge=True) -> bool`¶

Import cache from file:

# Merge with existing cache
success = provesid.import_cache('cache_backup.pkl')

# Replace existing cache
success = provesid.import_cache('cache_backup.pkl', merge=False)

`provesid.clear_cache()`¶

Clear all cached data:

provesid.clear_cache()

`provesid.set_cache_warning_threshold(size_gb)`¶

Set size warning threshold:

# Warn when cache exceeds 10 GB
provesid.set_cache_warning_threshold(10.0)

`provesid.enable_cache_warnings(enabled=True)`¶

Enable/disable size warnings:

# Disable warnings
provesid.enable_cache_warnings(False)

# Re-enable warnings  
provesid.enable_cache_warnings(True)

Use Cases¶

1. Long Research Projects¶

# Start your research project
api = provesid.PubChemAPI()

# Make expensive API calls - all cached automatically
for compound in my_compound_list:
    data = api.get_compound_by_cid(compound)
    properties = api.get_compound_properties(compound, ['MolecularWeight', 'LogP'])

# Export cache at end of day
provesid.export_cache('research_day1.pkl')

# Next day: import and continue
provesid.import_cache('research_day1.pkl')
# All previous calls are cached!

2. Team Collaboration¶

# Team member 1: Gather data from multiple APIs
pubchem_api = provesid.PubChemAPI()
nci_resolver = provesid.NCIChemicalIdentifierResolver()

for cid in expensive_compound_list:
    pubchem_api.get_compound_by_cid(cid)

for cas in cas_number_list:
    nci_resolver.get_molecular_data(cas)

# Share the cache (includes data from all APIs)
provesid.export_cache('team_shared_cache.pkl')

# Team member 2: Use shared data
provesid.import_cache('team_shared_cache.pkl')
# Instant access to all the data without API calls!

3. Offline Development¶

# When online: gather data
api = provesid.PubChemAPI()
test_data = [api.get_compound_by_cid(cid) for cid in test_compounds]
provesid.export_cache('offline_cache.pkl')

# When offline: use cached data
provesid.import_cache('offline_cache.pkl')
api = provesid.PubChemAPI()
# All test compounds available from cache
result = api.get_compound_by_cid(2244)  # Works offline!

4. Cache Monitoring¶

import provesid

# Monitor cache size during processing
def process_compounds(compound_list):
    api = provesid.PubChemAPI()

    for i, compound in enumerate(compound_list):
        result = api.get_compound_by_cid(compound)

        # Check cache size every 100 compounds
        if i % 100 == 0:
            size = provesid.get_cache_size()
            print(f"Processed {i} compounds, cache: {size['mb']:.1f} MB")

            # Export backup every 1000 compounds
            if i % 1000 == 0 and i > 0:
                provesid.export_cache(f'backup_{i}.pkl')

Performance Benefits¶

Before (Limited Cache)¶

First 512 calls: Fast (cached)
Call 513+: Slow (cache full, LRU eviction)
After restart: Slow (cache lost)

Now (Unlimited Cache)¶

All calls: Fast after first time
After restart: Fast (persistent storage)
Across sessions: Fast (cache preserved)
Team sharing: Instant (import cache)

Cache Storage Location¶

Cache files are stored in: - Windows: %TEMP%\provesid_cache\ - macOS/Linux: /tmp/provesid_cache/

Each cached API call is stored as a separate file with metadata tracking.

Best Practices¶

1. Regular Backups¶

# Export cache regularly during long-running processes
if batch_count % 10 == 0:
    provesid.export_cache(f'backup_batch_{batch_count}.pkl')

# At end of data collection phase
provesid.export_cache('project_phase1_cache.pkl')
# Share this file with team members

3. Monitor Size¶

# Check cache size for large projects
size = provesid.get_cache_size()
if size['gb'] > 2.0:
    print(f"Large cache: {size['gb']:.2f} GB - consider archiving")

4. Clean Up When Done¶

# Clear cache when switching projects
provesid.clear_cache()

Troubleshooting¶

Cache Warnings¶

If you see cache size warnings:

# Option 1: Increase threshold
provesid.set_cache_warning_threshold(10.0)  # 10 GB

# Option 2: Export and clear
provesid.export_cache('archive.pkl')
provesid.clear_cache()

# Option 3: Disable warnings
provesid.enable_cache_warnings(False)

Import/Export Failures¶

# Always check return values
success = provesid.export_cache('backup.pkl')
if not success:
    print("Export failed - check disk space and permissions")

success = provesid.import_cache('backup.pkl') 
if not success:
    print("Import failed - check file exists and is valid")

Cache Location Issues¶

# Check cache location
info = provesid.get_cache_info()
print(f"Cache directory: {info['cache_directory']}")

# Verify directory is writable
import os
cache_dir = info['cache_directory']
print(f"Directory writable: {os.access(cache_dir, os.W_OK)}")

Technical Details¶

Cache Implementation¶

Storage: Pickle serialization for Python objects
Indexing: SHA256 hashes of function calls + arguments
Metadata: JSON tracking for size and timestamps
Memory: LRU memory cache backed by persistent disk storage

Security Considerations¶

Cache files contain API response data
Exported cache files are not encrypted
Consider security when sharing cache files
Cache directory permissions follow system defaults

Performance¶

Memory: Fast lookup for recently accessed items
Disk: Automatic persistence with minimal overhead
Network: Eliminates repeated API calls entirely
Startup: Quick loading of existing cache metadata