Skip to content

perf(sessions): collapse redundant tool_call_update snapshots#3079

Merged
adboio merged 6 commits into
mainfrom
perf/collapse-tool-update-snapshots
Jul 2, 2026
Merged

perf(sessions): collapse redundant tool_call_update snapshots#3079
adboio merged 6 commits into
mainfrom
perf/collapse-tool-update-snapshots

Conversation

@richardsolomou

@richardsolomou richardsolomou commented Jul 2, 2026

Copy link
Copy Markdown
Member

Problem

Opening several large tasks balloons renderer memory (2GB+ observed in prod). Agents re-send the full accumulated tool output on every tool_call_update, and the transcript retains all of them. One tool call in a real task had 9,417 updates growing to ~230KB each — 312MB of near-identical snapshots, of which only the merged final state (~3MB) is ever rendered. Connected sessions never evict, so N big transcripts stay fully resident at once.

Changes

Collapse superseded tool_call_update snapshots — merged into one update per toolCallId (shallow, later fields win, matching the renderer reducer's Object.assign) — at three layers:

  • Read (renderer, on load, convertStoredEntriesToEvents) — caps retained memory.
  • Transfer (workspace-server readLocalLogsCollapsed, exposed to the renderer via the host router) — collapses before IPC so a tool-heavy log doesn't cross at full size; keeps the original line count for resume/gap tracking.
  • Write (agent SessionLogWriter) — buffers in-progress updates and writes one merged update per toolCallId (flushed by a terminal update, any non-tool event, a 2s max-hold, or shutdown) so new local logs are born small. The API path still receives every update uncoalesced.

Updates are merged rather than dropped because they carry disjoint fields at different times (streamed rawInput snapshots, input-derived title/diff content, then a terminal update with only status/rawOutput); keeping only the last one would strip inputs and Edit/Write diffs from ~half of all tool calls on reload. fetchSessionLogs falls back to the plain read if the collapsed procedure fails on a host (tRPC proxy clients can't be feature-detected by presence).

Live streaming display is unchanged (updates still stream via the subscription). No migration required: existing logs collapse on load. Shrinking existing logs on disk (first-load speed / disk reclaim) is an optional follow-up.

How did you test this?

Unit tests for each layer (merge-union per toolCallId, terminal-merge, order preservation, API-path entries unmutated; server-side line collapse + line-count preservation; agent coalescing flush paths incl. the hold-window split). Plus CDP measurement, a real-data replay through the built writer, and a full end-to-end run in the dev app against the real prod tasks that reproduced the balloon:

Metric before (main) after
Retained JS heap, 7 big tasks + forced GC 1220 MB 174 MB
Peak heap, 7 big tasks 2785 MB 307 MB
Data over IPC on load ~327 MB ~3 MB
Agent write (replay of a real 239MB / 142,893-update log) 142,893 tool updates 4,785 (4.7x smaller on disk)

The 7 tasks include the one whose 9.4k-update tool call alone added ~1.8GB on main; it now adds ~130MB.

The merge-collapse was additionally validated for correctness against the five biggest real local session logs (up to 88MB): the merged output deep-equals a full replay of every update, per tool call. The table's numbers were measured on the earlier keep-last variant; merging retains ~95% of the size reduction (e.g. 88MB log → 17MB merged vs 14MB keep-last), so they remain representative.

Honest caveats: the memory win (read + transfer collapse) is validated end-to-end in the app; the agent write-layer's born-small new logs are validated by replaying real agent output, not yet by a fresh app-spawned run. Load wall-clock for already-on-disk 327MB logs is unchanged (Node still scans the file); the memory win applies regardless.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Agents re-send the full accumulated tool output on every tool_call_update, so a
long-running tool (e.g. a property-based test run: 9.4k updates for one call,
each growing to ~230KB) leaves hundreds of MB of near-identical snapshots in a
loaded transcript — only the last is ever rendered. Keep just the latest update
per toolCallId when converting stored entries to events.

Measured across 7 large ai-gateway tasks: retained JS heap after GC dropped from
1220MB to 177MB (~7x), renderer RSS from 2.5GB to 0.9GB. Live streaming is
unaffected (collapse runs only on the disk-load path).
…e transfer

The renderer-side collapse capped retained memory but the full redundant log
still crossed IPC and got parsed in the renderer, spiking heap to ~2GB and
janking the load. Add readLocalLogsCollapsed on the workspace-server: it drops
superseded tool_call_update lines (line-based, no JSON.parse) and returns only
the latest per toolCallId, keeping the original line count for resume/gap
tracking. fetchSessionLogs uses it when available.

Measured on the 327MB property-based-test task: cold-open peak heap ~2GB -> ~190MB
(no spike); renderer receives ~3MB instead of 327MB. Load wall-clock is still
gated by Node reading+scanning the 327MB on disk (~2.9s) — the source fix
(agent emitting tool_call_update deltas) is what shrinks the log itself.
…ocal log

The session log writer appended every tool_call_update, and agents re-send the
full accumulated tool output on each one, so a long-running tool wrote hundreds
of MB of near-identical growing snapshots to the local cache. Buffer the latest
in-progress update per toolCallId and write only that — flushed by a terminal
(completed/failed) update, any non-tool event, a 2s max-hold window (bounds
crash loss), or flushAll on shutdown. The API/S3 path is unchanged; the reader
already collapses on load. New local logs are born proportional to real content
instead of O(updates x size).

Unit-tested; needs a live agent run to confirm born-small logs end to end.
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit a3bf5ab.

@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Comments Outside Diff (1)

  1. packages/agent/src/session-log-writer.test.ts, line 554-594 (link)

    P2 Time-based flush path has no test coverage

    The three new tests cover the "flushed by non-tool event," "flushed by terminal update," and "flushed by flushAll" paths, but the Date.now() - existing.bufferedAt > TOOL_UPDATE_MAX_HOLD_MS branch in appendRawLine (the 2s time-based flush) has zero coverage. That code path — write the OLD existing entry and start a fresh timer when a new update arrives after the hold window — could silently regress. Adding a test that mocks Date.now (or uses vi.useFakeTimers) and sends updates more than 2 s apart would cover it.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "perf(agent): coalesce in-progress tool_c..." | Re-trigger Greptile

Comment thread packages/agent/src/session-log-writer.ts Outdated
Comment thread packages/core/src/sessions/sessionService.ts Outdated
@richardsolomou richardsolomou requested a review from a team July 2, 2026 03:27
@charlesvien charlesvien force-pushed the perf/collapse-tool-update-snapshots branch from f965d67 to c05af11 Compare July 2, 2026 05:46
…, don't drop, superseded tool updates

Two fixes on top of the tool_call_update collapse:

Host-router wiring: the renderer's SessionService reaches logs through
the host router, which lacked the readLocalLogsCollapsed procedure. A
tRPC proxy client is truthy for any path, so the feature-detect passed
and every local read threw NOT_FOUND at call time, silently falling
back to S3 — and loading empty transcripts for tasks with no cloud
logs. Adds the procedure, and fetchSessionLogs now falls back to the
plain read when the collapsed call fails instead of misreporting the
local log as unreadable.

Merge semantics: the conversation reducer Object.assigns every
tool_call_update into the tool call, and updates carry disjoint fields
(streamed rawInput snapshots, input-derived title/diff content, then a
terminal update with only status/rawOutput). Keeping only the last
update left ~half of all tool calls with no rawInput anywhere and
stripped Edit/Write diffs on reload. All three layers (core read,
workspace-server transfer, agent write buffer) now merge updates per
toolCallId — shallow, later fields win — which reproduces exactly the
state a full replay builds. The agent's API path still receives every
update unmutated; the transfer layer now parses tool-update lines
(other lines pass through untouched), which also removes the regex
mis-bucketing risk.

Validated replay-equivalent against the five biggest real local logs
(up to 88MB): merged output deep-equals a full replay per tool call,
with ~95% of the size win retained (88MB -> 17MB).

Generated-By: PostHog Code
Task-Id: fe96a3c1-92c7-41ef-9bd1-f2b890ebe566
@adboio adboio merged commit ca0787f into main Jul 2, 2026
23 checks passed
@adboio adboio deleted the perf/collapse-tool-update-snapshots branch July 2, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants