LiSSA implements a sophisticated caching system to improve performance and ensure reproducibility of results. The caching system consists of the following components:
- Cache Interface (
cachepackage)Cache: Core generic interface defining cache operations, parameterized by cache key typeCacheKey: Base interface for cache keys with JSON serialization support and local key generationCacheParameter: Interface defining cache configuration and key creation logic- Specialized Cache Keys:
ClassifierCacheKey: Cache key for classifier operations (model name, seed, temperature, mode, content)EmbeddingCacheKey: Cache key for embedding operations (model name, content)
- Cache Parameters:
ClassifierCacheParameter: Configuration for classifier caches (model name, seed, temperature)EmbeddingCacheParameter: Configuration for embedding caches (model name)
- Cache Implementations
LocalCache: File-based cache implementation that stores data in JSON format- Implements dirty tracking to optimize writes
- Automatically saves changes on shutdown
- Supports atomic writes using temporary files
RedisCache: Redis-based cache implementation with fallback to local cache- Uses Redis for high-performance caching
- Falls back to local cache if Redis is unavailable
- Supports both string and object serialization
- Cache Management
CacheManager: Central manager for cache instances- Manages cache directory configuration
- Provides singleton access to cache instances
- Handles cache creation and retrieval based on origin and cache parameters
- Ensures cache uniqueness by validating parameters
- Caching Usage
The caching system is used in several key components:
- Embedding Creators: Caches vector embeddings to avoid recalculating them
- Uses
EmbeddingCacheParameterto identify unique embedding configurations - Cache keys are automatically generated based on content using the model name
- Uses
- Classifiers: Caches LLM responses for classification tasks
- Uses
ClassifierCacheParameterto identify unique classifier configurations - Cache keys include model name, seed, temperature, and content
- Uses
- Preprocessors: Caches preprocessing results for text summarization and other operations
- Uses
ClassifierCacheParameterfor LLM-based preprocessing
- Uses
- Embedding Creators: Caches vector embeddings to avoid recalculating them
Cache keys uniquely identify cached items and consist of two parts:
- JSON Key: Serialized representation including all cache parameters (model, seed, temperature, content, mode)
- Local Key: Generated UUID-based key for in-memory identification and logging
Cache parameters define the configuration that makes a cache unique:
- ClassifierCacheParameter: Model name, seed, and temperature for reproducible LLM results
- EmbeddingCacheParameter: Model name only (embeddings are deterministic)
Parameters are used to:
- Generate unique cache file names (via
parameters()method) - Create cache keys from content (via
createCacheKey()method) - Validate cache consistency when retrieving existing caches
The Cache interface provides two API levels:
- String-based API (preferred): Pass content as string, cache handles key generation internally
get(String key, Class<T> clazz)put(String key, T value)containsKey(String key)
- Internal Key API (DO NOT USE): Direct cache key manipulation for special cases
getViaInternalKey(K key, Class<T> clazz)putViaInternalKey(K key, T value)- Only use for backward compatibility or special handling scenarios
-
Configuration
{ "cache_dir": "./cache/path" // Directory for cache storage } -
Redis Setup To use Redis for caching, you need to set up a Redis server. Here's a recommended Docker Compose configuration:
services: redis: image: redis/redis-stack:latest container_name: redis restart: unless-stopped ports: - "127.0.0.1:6379:6379" # Redis server port - "127.0.0.1:5540:8001" # RedisInsight web interface volumes: - ./redis_data:/data # Persistent storage
The Redis server will be available at
redis://localhost:6379. You can also access the RedisInsight web interface athttp://localhost:5540for monitoring and management.To use Redis with LiSSA:
- Start the Redis server using Docker Compose
- The system will automatically use Redis if available
- If Redis is unavailable, it will fall back to local file-based caching (useful for replication packages)
-
Best Practices
- Use the cache directory specified in the configuration
- Clear the cache directory if you encounter issues
- For production environments:
- Use Redis for better performance
- Configure Redis persistence for data durability
- Monitor Redis memory usage
- Set up Redis replication for high availability
- Monitor cache size and implement cleanup strategies if needed