perf(sessions): collapse redundant tool_call_update snapshots#3079
Conversation
Agents re-send the full accumulated tool output on every tool_call_update, so a long-running tool (e.g. a property-based test run: 9.4k updates for one call, each growing to ~230KB) leaves hundreds of MB of near-identical snapshots in a loaded transcript — only the last is ever rendered. Keep just the latest update per toolCallId when converting stored entries to events. Measured across 7 large ai-gateway tasks: retained JS heap after GC dropped from 1220MB to 177MB (~7x), renderer RSS from 2.5GB to 0.9GB. Live streaming is unaffected (collapse runs only on the disk-load path).
…e transfer The renderer-side collapse capped retained memory but the full redundant log still crossed IPC and got parsed in the renderer, spiking heap to ~2GB and janking the load. Add readLocalLogsCollapsed on the workspace-server: it drops superseded tool_call_update lines (line-based, no JSON.parse) and returns only the latest per toolCallId, keeping the original line count for resume/gap tracking. fetchSessionLogs uses it when available. Measured on the 327MB property-based-test task: cold-open peak heap ~2GB -> ~190MB (no spike); renderer receives ~3MB instead of 327MB. Load wall-clock is still gated by Node reading+scanning the 327MB on disk (~2.9s) — the source fix (agent emitting tool_call_update deltas) is what shrinks the log itself.
…ocal log The session log writer appended every tool_call_update, and agents re-send the full accumulated tool output on each one, so a long-running tool wrote hundreds of MB of near-identical growing snapshots to the local cache. Buffer the latest in-progress update per toolCallId and write only that — flushed by a terminal (completed/failed) update, any non-tool event, a 2s max-hold window (bounds crash loss), or flushAll on shutdown. The API/S3 path is unchanged; the reader already collapses on load. New local logs are born proportional to real content instead of O(updates x size). Unit-tested; needs a live agent run to confirm born-small logs end to end.
|
React Doctor found no issues in the changed files. 🎉 Reviewed by React Doctor for commit |
|
f965d67 to
c05af11
Compare
…, don't drop, superseded tool updates Two fixes on top of the tool_call_update collapse: Host-router wiring: the renderer's SessionService reaches logs through the host router, which lacked the readLocalLogsCollapsed procedure. A tRPC proxy client is truthy for any path, so the feature-detect passed and every local read threw NOT_FOUND at call time, silently falling back to S3 — and loading empty transcripts for tasks with no cloud logs. Adds the procedure, and fetchSessionLogs now falls back to the plain read when the collapsed call fails instead of misreporting the local log as unreadable. Merge semantics: the conversation reducer Object.assigns every tool_call_update into the tool call, and updates carry disjoint fields (streamed rawInput snapshots, input-derived title/diff content, then a terminal update with only status/rawOutput). Keeping only the last update left ~half of all tool calls with no rawInput anywhere and stripped Edit/Write diffs on reload. All three layers (core read, workspace-server transfer, agent write buffer) now merge updates per toolCallId — shallow, later fields win — which reproduces exactly the state a full replay builds. The agent's API path still receives every update unmutated; the transfer layer now parses tool-update lines (other lines pass through untouched), which also removes the regex mis-bucketing risk. Validated replay-equivalent against the five biggest real local logs (up to 88MB): merged output deep-equals a full replay per tool call, with ~95% of the size win retained (88MB -> 17MB). Generated-By: PostHog Code Task-Id: fe96a3c1-92c7-41ef-9bd1-f2b890ebe566
Problem
Opening several large tasks balloons renderer memory (2GB+ observed in prod). Agents re-send the full accumulated tool output on every
tool_call_update, and the transcript retains all of them. One tool call in a real task had 9,417 updates growing to ~230KB each — 312MB of near-identical snapshots, of which only the merged final state (~3MB) is ever rendered. Connected sessions never evict, so N big transcripts stay fully resident at once.Changes
Collapse superseded
tool_call_updatesnapshots — merged into one update pertoolCallId(shallow, later fields win, matching the renderer reducer'sObject.assign) — at three layers:convertStoredEntriesToEvents) — caps retained memory.workspace-serverreadLocalLogsCollapsed, exposed to the renderer via the host router) — collapses before IPC so a tool-heavy log doesn't cross at full size; keeps the original line count for resume/gap tracking.SessionLogWriter) — buffers in-progress updates and writes one merged update pertoolCallId(flushed by a terminal update, any non-tool event, a 2s max-hold, or shutdown) so new local logs are born small. The API path still receives every update uncoalesced.Updates are merged rather than dropped because they carry disjoint fields at different times (streamed
rawInputsnapshots, input-derived title/diff content, then a terminal update with only status/rawOutput); keeping only the last one would strip inputs and Edit/Write diffs from ~half of all tool calls on reload.fetchSessionLogsfalls back to the plain read if the collapsed procedure fails on a host (tRPC proxy clients can't be feature-detected by presence).Live streaming display is unchanged (updates still stream via the subscription). No migration required: existing logs collapse on load. Shrinking existing logs on disk (first-load speed / disk reclaim) is an optional follow-up.
How did you test this?
Unit tests for each layer (merge-union per
toolCallId, terminal-merge, order preservation, API-path entries unmutated; server-side line collapse + line-count preservation; agent coalescing flush paths incl. the hold-window split). Plus CDP measurement, a real-data replay through the built writer, and a full end-to-end run in the dev app against the real prod tasks that reproduced the balloon:The 7 tasks include the one whose 9.4k-update tool call alone added ~1.8GB on
main; it now adds ~130MB.The merge-collapse was additionally validated for correctness against the five biggest real local session logs (up to 88MB): the merged output deep-equals a full replay of every update, per tool call. The table's numbers were measured on the earlier keep-last variant; merging retains ~95% of the size reduction (e.g. 88MB log → 17MB merged vs 14MB keep-last), so they remain representative.
Honest caveats: the memory win (read + transfer collapse) is validated end-to-end in the app; the agent write-layer's born-small new logs are validated by replaying real agent output, not yet by a fresh app-spawned run. Load wall-clock for already-on-disk 327MB logs is unchanged (Node still scans the file); the memory win applies regardless.
Automatic notifications