Skip to content

[FEATURE] Conversation compaction to manage context limits #48

@jwesleye

Description

@jwesleye

Feature Description

Automatically compact or summarize older parts of long conversations to stay within context window limits while preserving important information.

Problem/Motivation

AWS Strands agents have context window limits (e.g., 200K tokens for Claude Sonnet 4.5). During long sessions:

  • Conversations can exceed the context limit
  • Users may lose access to older parts of the conversation
  • No warning when approaching limits
  • No automatic management of conversation length

Proposed Solution

Core Features

  1. Context monitoring - Track current conversation token usage
  2. Limit warnings - Alert users when approaching context limits (80%, 90%, 95%)
  3. Manual compaction - compact command to summarize older messages
  4. Auto-compaction - Optional automatic compaction when reaching threshold
  5. Smart summarization - Preserve key information while reducing token count

Implementation Options

Option A: Rolling Window

  • Keep last N messages in full detail
  • Summarize or remove older messages
  • Simple, predictable behavior

Option B: Intelligent Summarization

  • Use the agent itself to summarize old conversation chunks
  • Preserve important context (code, decisions, key facts)
  • More sophisticated but higher quality

Option C: Hybrid

  • Keep recent messages (last 20-30) in full
  • Summarize middle sections in chunks
  • Drop very old messages beyond threshold

User Experience

# During conversation - warning appears
⚠️  Context usage: 180K / 200K tokens (90%)
Consider using 'compact' command to summarize older messages.

# Manual compaction
You: compact
Compacting conversation (keeping last 30 messages, summarizing 50 older messages)...
✓ Reduced from 180K to 95K tokens (47% reduction)
✓ Preserved 30 recent messages + summary of earlier conversation

# View context status
You: context
Current usage: 95K / 200K tokens (47%)
Messages: 80 total (30 full + 1 summary block)
Oldest message: 2 hours ago

Configuration

# In ~/.chatrc
context:
  max_tokens: 200000          # Model's context limit
  warning_thresholds: [0.8, 0.9, 0.95]  # Show warnings at 80%, 90%, 95%
  auto_compact: false         # Enable automatic compaction
  auto_compact_threshold: 0.85  # Compact at 85% if auto enabled
  preserve_recent: 30         # Always keep last N messages in full
  compaction_method: hybrid   # rolling | summarize | hybrid

Benefits

  • ✅ Never hit context limits unexpectedly
  • ✅ Maintain usable conversation history
  • ✅ Clear visibility into context usage
  • ✅ User control over compaction strategy
  • ✅ Preserve important information

Related Commands

  • context - Show current context usage and statistics
  • compact - Manually trigger compaction
  • compact --preview - Preview what would be compacted
  • compact --method=<rolling|summarize> - Choose compaction strategy

Technical Considerations

Token Counting:

  • Need accurate token counting for current conversation
  • Use tiktoken or similar for Claude/GPT models
  • Track running total as conversation progresses

Summarization Quality:

  • Test different summarization prompts
  • Ensure key information preserved (code, decisions, facts)
  • Include metadata (timestamp, message count) in summaries

Backward Compatibility:

  • Make all features opt-in initially
  • Graceful degradation if token counting unavailable
  • Don't break existing sessions

Edge Cases:

  • Very first message after compaction (context reset)
  • Session resume after compaction
  • Saving/loading compacted sessions

Priority

  • Critical
  • High - Prevents frustrating context limit errors
  • Medium
  • Low

Dependencies

  • Token counting library (tiktoken, anthropic tokenizer)
  • Optional: summarization prompt engineering
  • Configuration system (already exists)

Testing Plan

  1. Test with conversations of various lengths
  2. Verify token counting accuracy
  3. Test summarization quality
  4. Ensure session save/resume works with compaction
  5. Performance test with very long conversations

Future Enhancements

  • Semantic chunking (group related messages)
  • Important message pinning (never compact)
  • Compaction history tracking
  • Export compacted conversations with summaries

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions