feat(orchestrator): add capacity limit and memory-pressure eviction to template mmap cache#2958
Conversation
…o template mmap cache
There was a problem hiding this comment.
Code Review
Using unix.Sysinfo's Freeram to check available memory is incorrect because it represents completely unused memory rather than MemAvailable, which will cause premature cache evictions. Additionally, evicting entries in a tight loop without throttling will result in the eviction of all cache entries because system memory statistics are not updated instantaneously after an eviction.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1f1a5a102c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…dd unit tests - Replace unix.Sysinfo Freeram with /proc/meminfo MemAvailable to get true available memory (includes reclaimable page cache) - Evict only one LRU entry per 1s tick instead of looping until threshold is met; mmap.Unmap is not instantaneous so the OS stats lag behind - Add 6 unit tests covering WithCapacity LRU eviction, MemAvailable parser correctness/error handling, and pressure-eviction logic
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cab034cabc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Changes Summary
model.go
1.Add environment variable TEMPLATE_CACHE_MAX_ENTRIES (integer, default 0 = unlimited)
When a positive value is set, apply ttlcache.WithCapacity.
Cache evicts entries by LRU policy once the entry limit is reached.
2.Add environment variable TEMPLATE_CACHE_MIN_FREE_MEMORY_MB (int64, default 0 = disabled)
When enabled, start a background goroutine to poll host memory via unix.Sysinfo every 5 seconds.
Continuously evict LRU entries one by one until free memory is above the configured threshold.
cache.go
1.In NewCache, conditionally enable ttlcache.WithCapacity according to the new cache config.
2.In Cache.Start, launch the startMemoryPressureEviction goroutine if the memory threshold is non-zero.
Eviction logic: select entries with the earliest ExpiresAt timestamp (TTL resets on each access).
Stop eviction when memory pressure is relieved or the cache is empty.
Backward Compatibility
Both new environment variables use 0 as default, which fully keeps the original runtime behavior.
Existing cloud deployments require no extra configuration or code changes.