ProxyMem: Cross-Session Memory

ProxyMem is a proxy-based memory layer that provides cross-session context persistence for LLM agents. By operating at the proxy layer, ProxyMem captures session data, generates structured summaries via LLM analysis, and enriches future requests with relevant historical context.

Overview

ProxyMem operates through three main mechanisms:

Session Capture: Records session interactions as they pass through the proxy
Session Analysis: Uses a designated LLM to generate structured summaries after session completion
Context Injection: Enriches new session requests with relevant historical context from the database

Benefits

Persistent Context: Maintain awareness of prior work across sessions
Project-Scoped: Memory is isolated per user and project
Automatic Summaries: LLM-generated summaries capture goals, decisions, modified files, and remaining tasks
Privacy Controls: Configurable redaction patterns and user/client deny lists

Quick Start

Enable Memory Feature

To use ProxyMem, you must first enable it globally:

# Enable memory feature
python -m src.core.cli --memory-available

# Enable with default memory on for all sessions
python -m src.core.cli --memory-available --memory-default-enabled

Interactive Commands

Once memory is globally available, users can control it per-session:

Command	Description
`!/memory-on`	Enable memory capture for the current session
`!/memory-off`	Disable memory capture for the current session
`!/memory-status`	Show current memory state for the session
`!/memory-requeue`	Force the current session to requeue for summary generation

Configuration

CLI Arguments

Flag	Description	Default
`--memory-available`	Enable the ProxyMem feature globally	`false`
`--memory-default-enabled`	Enable memory by default for new sessions	`false`
`--memory-summary-model`	Model for generating summaries (`backend:model`)	None
`--memory-context-model`	Model for context retrieval (`backend:model`)	None
`--memory-summary-prompt`	Path to custom summary prompt file	None
`--memory-context-prompt`	Path to custom context prompt file	None
`--memory-database-path`	Path to SQLite database	`./var/memory.sqlite3`
`--memory-session-timeout`	Minutes of inactivity before session completion	`30`
`--memory-summarization-delay`	Seconds to wait before summary generation	`120`
`--memory-max-sessions-to-consider`	Max recent sessions to consider for context	`10`
`--memory-retention-days`	Days to retain session summaries	`90`
`--memory-max-context-tokens`	Maximum tokens for injected context	`2000`
`--memory-max-summary-tokens`	Max tokens for summary prompt context	`800`
`--memory-max-transcript-chars`	Max transcript length before chunking	`50000`
`--memory-summary-completion-tokens`	Max completion tokens for summary generation	`10000`
`--memory-context-relevance-threshold`	Minimum relevance score for context	`0.5`
`--memory-max-buffer-size-bytes`	Capture buffer size per session	`10485760`
`--memory-analysis-queue-maxsize`	Analysis queue max size	`100`
`--memory-analysis-timeout`	Summary generation timeout (seconds)	`30`
`--memory-max-concurrent-analyses`	Max concurrent analyses	`4`
`--memory-context-template`	Template for injected context	None
`--memory-single-user-mode`	Use a fixed user ID for all sessions	`false`
`--memory-fixed-user-id`	Fixed user ID (required when single-user-mode)	None
`--memory-persist-transcript`	Persist transcripts for summaries	`false`
`--memory-summary-prompt-version`	Summary prompt version tag	`v1`
`--memory-summary-schema-version`	Summary schema version tag	`v1`
`--memory-require-project-discovery`	Require project discovery for injection	`true`
`--memory-allow-missing-project`	Allow injection without project root	`false`
`--memory-project-discovery-mode`	Project discovery mode	`any`

Environment Variables

All CLI flags can be set via environment variables with the MEMORY_ prefix:

export MEMORY_AVAILABLE=true
export MEMORY_DEFAULT_ENABLED=true
export MEMORY_SUMMARY_MODEL=openai:gpt-4o-mini
export MEMORY_CONTEXT_MODEL=openai:gpt-4o-mini
export MEMORY_DATABASE_PATH=./var/memory.sqlite3
export MEMORY_SESSION_TIMEOUT_MINUTES=30
export MEMORY_RETENTION_DAYS=90
export MEMORY_MAX_CONTEXT_TOKENS=2000
export MEMORY_MAX_SUMMARY_TOKENS=800
export MEMORY_MAX_TRANSCRIPT_CHARS=50000
export MEMORY_SUMMARY_COMPLETION_TOKENS=10000
export MEMORY_CONTEXT_RELEVANCE_THRESHOLD=0.5
export MEMORY_PERSIST_TRANSCRIPT=false
export MEMORY_REQUIRE_PROJECT_DISCOVERY=true
export MEMORY_PROJECT_DISCOVERY_MODE=any

Note: MEMORY_SESSION_TIMEOUT is accepted as a legacy alias for MEMORY_SESSION_TIMEOUT_MINUTES.

Configuration File

Add memory settings to your config.yaml:

memory:
  available: true
  default_enabled: false
  summary_model: "openai:gpt-4o-mini"
  context_model: "openai:gpt-4o-mini"
  database_path: "./var/memory.sqlite3"
  session_timeout_minutes: 30
  retention_days: 90
  max_context_tokens: 2000
  max_sessions_to_consider: 10
  context_relevance_threshold: 0.5

Configuration Precedence

Configuration values are resolved in the following order (highest priority first):

CLI arguments
Environment variables
Configuration file
Default values

How It Works

Session Capture

When memory is enabled for a session:

User prompts are captured as they pass through the proxy
Assistant responses are captured after receiving from the backend
Interactions are buffered in memory (up to configurable limit)
Metadata (model, tokens, timestamps) is recorded with each interaction

Session Completion

Sessions are marked complete when:

Timeout: No activity for the configured timeout period (default: 30 minutes)
Explicit Close: The client closes the connection

Upon completion, the session is queued for background analysis.

Summary Generation

The summary generator:

Builds a transcript from captured interactions
Applies redaction patterns to remove sensitive data
Chunks large transcripts if they exceed limits
Calls the configured summary model with an XML-structured prompt
Validates the XML response against the schema
Parses and persists the summary to the database

Generated summaries include:

Title: One-sentence description
Scope: High-level area/component description
Goals: Main objectives of the session
Key Decisions: Important architectural or design decisions
Modified Files: Files created, modified, or deleted
Git Operations: Commits, branches, merges, etc.
Tests Run: Test executions with pass/fail status
Errors: Significant errors encountered
Remaining Tasks: Open or blocked tasks
Completion Status: completed, partial, or abandoned

Context Injection

For new sessions with memory enabled:

Recent session summaries are retrieved for the user and project
Summaries are scored by relevance to the current prompt
High-relevance context is formatted and injected
Context appears as a virtual message before the first user message

Context injection only occurs once per session (on the first request).

Privacy and Security

Multi-User Isolation

All memory operations require an authenticated user_id
Summaries are scoped to user_id (and optionally tenant_id)
Context retrieval only returns summaries for the requesting user
Project scoping ensures cross-project isolation

Redaction Patterns

Configure regex patterns to redact sensitive data:

memory:
  redaction_patterns:
    - "(?i)(api[_-]?key|password|secret|token)\\s*[=:]\\s*['\"]?[^\\s'\"]*"
    - "Bearer\\s+[A-Za-z0-9_-]+"

Redaction is applied:

Before calling the summary model
Before persisting summaries
Before logging any content

User and Client Deny Lists

Block specific users or clients from memory:

# Via CLI
--memory-disable-user user123 --memory-disable-user user456
--memory-disable-client "Roo Cline"

# Via config file
memory:
  disabled_users:
    - "user123"
    - "user456"
  disabled_clients:
    - "Roo Cline"

Single-User Mode

For personal deployments, bypass user authentication:

python -m src.core.cli --memory-available --memory-single-user-mode --memory-fixed-user-id "personal"

Database Schema

ProxyMem uses the proxy's unified database layer for storage. By default, this is SQLite, but PostgreSQL is also supported for production deployments. See Database Configuration for details.

The schema includes:

session_summaries: Stores all session summary data

Indexed by user_id, session_start, project_id
Full XML analysis preserved for debugging

user_project_dirs: Maps user+project_root pairs to stable project IDs

The database is automatically created on first use, and migrations are applied automatically on startup.

Retention and Maintenance

Sessions older than retention_days are automatically deleted
Cleanup runs periodically (default: daily)
No full transcripts are persisted—only structured summaries

Custom Prompts

Summary Prompt

Override the default summary prompt:

--memory-summary-prompt ./config/prompts/custom_summary.md

Available template variables:

{session_transcript} - Full session transcript
{session_id}, {user_id}, {tenant_id}
{project_id}, {project_root}
{model}, {branch}, {head_sha}
{analysis_timestamp}
{summary_schema_version}, {summary_prompt_version}
{max_tokens}

Context Prompt

Override the default context prompt:

--memory-context-prompt ./config/prompts/custom_context.md

Available template variables:

{user_prompt} - Current user message
{session_summaries} - Formatted historical summaries
{user_id}, {tenant_id}
{project_id}, {project_root}
{max_tokens}

Troubleshooting

Memory Not Enabling

Check that:

--memory-available is set
User is not in disabled_users list
Client is not in disabled_clients list
Project root is discovered (if require_project_discovery is true)
User ID is present (unless in single-user mode)

No Context Injected

Context may be skipped when:

No historical sessions exist for the user/project
No summaries meet the relevance threshold
Project root is required but not discovered
Context retrieval times out

Use !/memory-status to check the current state.

Database Issues

If the database becomes corrupted:

Stop the proxy
Backup or remove ./var/memory.sqlite3
Restart—schema will be recreated automatically

Performance Considerations

Buffer Limits

The capture buffer has a configurable maximum size (default: 10MB per session). If exceeded, the session is marked as partial and remaining interactions are dropped.

Analysis Queue

Summary generation runs in a background queue with:

Configurable queue size (default: 100)
Per-job timeout (default: 30 seconds)
Maximum concurrent analyses (default: 4)

Backpressure is applied when the queue is full—sessions may be dropped.

Context Injection Overhead

Context retrieval adds latency to the first request of each session. To minimize impact:

Keep max_sessions_to_consider reasonable (default: 10)
Use a fast model for context generation
Set appropriate context_relevance_threshold

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProxyMem: Cross-Session Memory

Overview

Benefits

Quick Start

Enable Memory Feature

Interactive Commands

Configuration

CLI Arguments

Environment Variables

Configuration File

Configuration Precedence

How It Works

Session Capture

Session Completion

Summary Generation

Context Injection

Privacy and Security

Multi-User Isolation

Redaction Patterns

User and Client Deny Lists

Single-User Mode

Database Schema

Retention and Maintenance

Custom Prompts

Summary Prompt

Context Prompt

Troubleshooting

Memory Not Enabling

No Context Injected

Database Issues

Performance Considerations

Buffer Limits

Analysis Queue

Context Injection Overhead

Related Documentation

FilesExpand file tree

proxymem-memory.md

Latest commit

History

proxymem-memory.md

File metadata and controls

ProxyMem: Cross-Session Memory

Overview

Benefits

Quick Start

Enable Memory Feature

Interactive Commands

Configuration

CLI Arguments

Environment Variables

Configuration File

Configuration Precedence

How It Works

Session Capture

Session Completion

Summary Generation

Context Injection

Privacy and Security

Multi-User Isolation

Redaction Patterns

User and Client Deny Lists

Single-User Mode

Database Schema

Retention and Maintenance

Custom Prompts

Summary Prompt

Context Prompt

Troubleshooting

Memory Not Enabling

No Context Injected

Database Issues

Performance Considerations

Buffer Limits

Analysis Queue

Context Injection Overhead

Related Documentation