Skip to content

Add compact plugin for auto context compression#305

Open
GoDiao wants to merge 1 commit intoalgorithmicsuperintelligence:mainfrom
GoDiao:feature/compact-plugin
Open

Add compact plugin for auto context compression#305
GoDiao wants to merge 1 commit intoalgorithmicsuperintelligence:mainfrom
GoDiao:feature/compact-plugin

Conversation

@GoDiao
Copy link
Copy Markdown

@GoDiao GoDiao commented May 6, 2026

Summary

Closes #249

Adds a compact plugin that automatically compresses conversation context when it exceeds a configurable token budget, inspired by Claude Code's /compact mechanism.

How it works

  1. Token budget check - Estimates token count of the conversation; if below threshold, passes through unchanged (zero overhead)
  2. Context window detection - Tries to get context window from provider's /models endpoint (e.g. Ollama, vLLM expose context_length), then falls back to config
  3. Split regions - Older turns are compressed via LLM into a structured summary; recent N turns are preserved verbatim
  4. Structured summary - The LLM produces a summary with Scope, Key decisions, User preferences, Pending work, Key files, and Context
  5. Graceful fallback - If LLM compression fails, returns original query unchanged

Composability

Works with the & operator for pipeline composition:

compact&moa    -> compact first, then Mixture of Agents
compact&bon    -> compact first, then Best of N

Configuration

Config Env Var Default Description
compact_context_window COMPACT_CONTEXT_WINDOW 128000 Max context tokens
compact_threshold COMPACT_THRESHOLD 0.75 Trigger ratio (0.0-1.0)
compact_keep_recent COMPACT_KEEP_RECENT 4 Turns to preserve verbatim

Tests

  • 32 unit/integration tests in tests/test_compact_plugin.py
  • Plugin registration test added to tests/test_plugins.py
50 tests passed (18 plugin + 32 compact)

End-to-end verification

Tested with a live API (MiMo):

  • 31-turn conversation (10,119 tokens) compressed to 580 tokens (94% reduction)
  • 29 older turns summarized, 2 recent turns preserved verbatim
  • Context window detection: provider /models returned 400, gracefully fell back to config default

Test plan

  • All existing plugin tests pass (18/18)
  • Unit tests for token estimation, conversation parsing, config priority
  • Integration tests for passthrough, compression, LLM failure fallback
  • Edge cases: summary tag extraction, malformed env vars, embedded tags
  • Manual end-to-end test with live API - plugin loads and executes correctly

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 6, 2026

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@GoDiao
Copy link
Copy Markdown
Author

GoDiao commented May 7, 2026

Hi! The CI failures in conversation-logging-tests and integration-tests appear to be unrelated to this PR. They're failing due to HuggingFace authentication issues when trying to access the gated model google/gemma-3-270m-it:

openai.InternalServerError: Error code: 500 - You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/google/gemma-3-270m-it.

The unit-tests and test_plugins checks passed successfully. These failures seem to be a CI environment issue (missing HuggingFace token) rather than anything introduced by this change. Happy to fix if there's anything on our side though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a compact plugin that does auto compact on context like claude code

2 participants