Feature Description
Automatically compact or summarize older parts of long conversations to stay within context window limits while preserving important information.
Problem/Motivation
AWS Strands agents have context window limits (e.g., 200K tokens for Claude Sonnet 4.5). During long sessions:
- Conversations can exceed the context limit
- Users may lose access to older parts of the conversation
- No warning when approaching limits
- No automatic management of conversation length
Proposed Solution
Core Features
- Context monitoring - Track current conversation token usage
- Limit warnings - Alert users when approaching context limits (80%, 90%, 95%)
- Manual compaction -
compact command to summarize older messages
- Auto-compaction - Optional automatic compaction when reaching threshold
- Smart summarization - Preserve key information while reducing token count
Implementation Options
Option A: Rolling Window
- Keep last N messages in full detail
- Summarize or remove older messages
- Simple, predictable behavior
Option B: Intelligent Summarization
- Use the agent itself to summarize old conversation chunks
- Preserve important context (code, decisions, key facts)
- More sophisticated but higher quality
Option C: Hybrid
- Keep recent messages (last 20-30) in full
- Summarize middle sections in chunks
- Drop very old messages beyond threshold
User Experience
# During conversation - warning appears
⚠️ Context usage: 180K / 200K tokens (90%)
Consider using 'compact' command to summarize older messages.
# Manual compaction
You: compact
Compacting conversation (keeping last 30 messages, summarizing 50 older messages)...
✓ Reduced from 180K to 95K tokens (47% reduction)
✓ Preserved 30 recent messages + summary of earlier conversation
# View context status
You: context
Current usage: 95K / 200K tokens (47%)
Messages: 80 total (30 full + 1 summary block)
Oldest message: 2 hours ago
Configuration
# In ~/.chatrc
context:
max_tokens: 200000 # Model's context limit
warning_thresholds: [0.8, 0.9, 0.95] # Show warnings at 80%, 90%, 95%
auto_compact: false # Enable automatic compaction
auto_compact_threshold: 0.85 # Compact at 85% if auto enabled
preserve_recent: 30 # Always keep last N messages in full
compaction_method: hybrid # rolling | summarize | hybrid
Benefits
- ✅ Never hit context limits unexpectedly
- ✅ Maintain usable conversation history
- ✅ Clear visibility into context usage
- ✅ User control over compaction strategy
- ✅ Preserve important information
Related Commands
context - Show current context usage and statistics
compact - Manually trigger compaction
compact --preview - Preview what would be compacted
compact --method=<rolling|summarize> - Choose compaction strategy
Technical Considerations
Token Counting:
- Need accurate token counting for current conversation
- Use tiktoken or similar for Claude/GPT models
- Track running total as conversation progresses
Summarization Quality:
- Test different summarization prompts
- Ensure key information preserved (code, decisions, facts)
- Include metadata (timestamp, message count) in summaries
Backward Compatibility:
- Make all features opt-in initially
- Graceful degradation if token counting unavailable
- Don't break existing sessions
Edge Cases:
- Very first message after compaction (context reset)
- Session resume after compaction
- Saving/loading compacted sessions
Priority
Dependencies
- Token counting library (tiktoken, anthropic tokenizer)
- Optional: summarization prompt engineering
- Configuration system (already exists)
Testing Plan
- Test with conversations of various lengths
- Verify token counting accuracy
- Test summarization quality
- Ensure session save/resume works with compaction
- Performance test with very long conversations
Future Enhancements
- Semantic chunking (group related messages)
- Important message pinning (never compact)
- Compaction history tracking
- Export compacted conversations with summaries
Feature Description
Automatically compact or summarize older parts of long conversations to stay within context window limits while preserving important information.
Problem/Motivation
AWS Strands agents have context window limits (e.g., 200K tokens for Claude Sonnet 4.5). During long sessions:
Proposed Solution
Core Features
compactcommand to summarize older messagesImplementation Options
Option A: Rolling Window
Option B: Intelligent Summarization
Option C: Hybrid
User Experience
Configuration
Benefits
Related Commands
context- Show current context usage and statisticscompact- Manually trigger compactioncompact --preview- Preview what would be compactedcompact --method=<rolling|summarize>- Choose compaction strategyTechnical Considerations
Token Counting:
Summarization Quality:
Backward Compatibility:
Edge Cases:
Priority
Dependencies
Testing Plan
Future Enhancements