perf(sessions): collapse redundant tool_call_update snapshots by richardsolomou · Pull Request #3079 · PostHog/code

richardsolomou · 2026-07-02T03:12:38Z

Problem

Opening several large tasks balloons renderer memory (2GB+ observed in prod). Agents re-send the full accumulated tool output on every tool_call_update, and the transcript retains all of them. One tool call in a real task had 9,417 updates growing to ~230KB each — 312MB of near-identical snapshots, of which only the merged final state (~3MB) is ever rendered. Connected sessions never evict, so N big transcripts stay fully resident at once.

Changes

Collapse superseded tool_call_update snapshots — merged into one update per toolCallId (shallow, later fields win, matching the renderer reducer's Object.assign) — at three layers:

Read (renderer, on load, convertStoredEntriesToEvents) — caps retained memory.
Transfer (workspace-server readLocalLogsCollapsed, exposed to the renderer via the host router) — collapses before IPC so a tool-heavy log doesn't cross at full size; keeps the original line count for resume/gap tracking.
Write (agent SessionLogWriter) — buffers in-progress updates and writes one merged update per toolCallId (flushed by a terminal update, any non-tool event, a 2s max-hold, or shutdown) so new local logs are born small. The API path still receives every update uncoalesced.

Updates are merged rather than dropped because they carry disjoint fields at different times (streamed rawInput snapshots, input-derived title/diff content, then a terminal update with only status/rawOutput); keeping only the last one would strip inputs and Edit/Write diffs from ~half of all tool calls on reload. fetchSessionLogs falls back to the plain read if the collapsed procedure fails on a host (tRPC proxy clients can't be feature-detected by presence).

Live streaming display is unchanged (updates still stream via the subscription). No migration required: existing logs collapse on load. Shrinking existing logs on disk (first-load speed / disk reclaim) is an optional follow-up.

How did you test this?

Unit tests for each layer (merge-union per toolCallId, terminal-merge, order preservation, API-path entries unmutated; server-side line collapse + line-count preservation; agent coalescing flush paths incl. the hold-window split). Plus CDP measurement, a real-data replay through the built writer, and a full end-to-end run in the dev app against the real prod tasks that reproduced the balloon:

Metric	before (main)	after
Retained JS heap, 7 big tasks + forced GC	1220 MB	174 MB
Peak heap, 7 big tasks	2785 MB	307 MB
Data over IPC on load	~327 MB	~3 MB
Agent write (replay of a real 239MB / 142,893-update log)	142,893 tool updates	4,785 (4.7x smaller on disk)

The 7 tasks include the one whose 9.4k-update tool call alone added ~1.8GB on main; it now adds ~130MB.

The merge-collapse was additionally validated for correctness against the five biggest real local session logs (up to 88MB): the merged output deep-equals a full replay of every update, per tool call. The table's numbers were measured on the earlier keep-last variant; merging retains ~95% of the size reduction (e.g. 88MB log → 17MB merged vs 14MB keep-last), so they remain representative.

Honest caveats: the memory win (read + transfer collapse) is validated end-to-end in the app; the agent write-layer's born-small new logs are validated by replaying real agent output, not yet by a fresh app-spawned run. Load wall-clock for already-on-disk 327MB logs is unchanged (Node still scans the file); the memory win applies regardless.

Automatic notifications

Publish to changelog?
Alert Sales and Marketing teams?

Agents re-send the full accumulated tool output on every tool_call_update, so a long-running tool (e.g. a property-based test run: 9.4k updates for one call, each growing to ~230KB) leaves hundreds of MB of near-identical snapshots in a loaded transcript — only the last is ever rendered. Keep just the latest update per toolCallId when converting stored entries to events. Measured across 7 large ai-gateway tasks: retained JS heap after GC dropped from 1220MB to 177MB (~7x), renderer RSS from 2.5GB to 0.9GB. Live streaming is unaffected (collapse runs only on the disk-load path).

…e transfer The renderer-side collapse capped retained memory but the full redundant log still crossed IPC and got parsed in the renderer, spiking heap to ~2GB and janking the load. Add readLocalLogsCollapsed on the workspace-server: it drops superseded tool_call_update lines (line-based, no JSON.parse) and returns only the latest per toolCallId, keeping the original line count for resume/gap tracking. fetchSessionLogs uses it when available. Measured on the 327MB property-based-test task: cold-open peak heap ~2GB -> ~190MB (no spike); renderer receives ~3MB instead of 327MB. Load wall-clock is still gated by Node reading+scanning the 327MB on disk (~2.9s) — the source fix (agent emitting tool_call_update deltas) is what shrinks the log itself.

…ocal log The session log writer appended every tool_call_update, and agents re-send the full accumulated tool output on each one, so a long-running tool wrote hundreds of MB of near-identical growing snapshots to the local cache. Buffer the latest in-progress update per toolCallId and write only that — flushed by a terminal (completed/failed) update, any non-tool event, a 2s max-hold window (bounds crash loss), or flushAll on shutdown. The API/S3 path is unchanged; the reader already collapses on load. New local logs are born proportional to real content instead of O(updates x size). Unit-tested; needs a live agent run to confirm born-small logs end to end.

github-actions · 2026-07-02T03:13:04Z

React Doctor found no issues in the changed files. 🎉

_{Reviewed by React Doctor for commit a3bf5ab.}

greptile-apps · 2026-07-02T03:17:59Z

Comments Outside Diff (1)

packages/agent/src/session-log-writer.test.ts, line 554-594 (link)

Time-based flush path has no test coverage

The three new tests cover the "flushed by non-tool event," "flushed by terminal update," and "flushed by flushAll" paths, but the Date.now() - existing.bufferedAt > TOOL_UPDATE_MAX_HOLD_MS branch in appendRawLine (the 2s time-based flush) has zero coverage. That code path — write the OLD existing entry and start a fresh timer when a new update arrives after the hold window — could silently regress. Adding a test that mocks Date.now (or uses vi.useFakeTimers) and sends updates more than 2 s apart would cover it.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "perf(agent): coalesce in-progress tool_c..." | Re-trigger Greptile}

…, don't drop, superseded tool updates Two fixes on top of the tool_call_update collapse: Host-router wiring: the renderer's SessionService reaches logs through the host router, which lacked the readLocalLogsCollapsed procedure. A tRPC proxy client is truthy for any path, so the feature-detect passed and every local read threw NOT_FOUND at call time, silently falling back to S3 — and loading empty transcripts for tasks with no cloud logs. Adds the procedure, and fetchSessionLogs now falls back to the plain read when the collapsed call fails instead of misreporting the local log as unreadable. Merge semantics: the conversation reducer Object.assigns every tool_call_update into the tool call, and updates carry disjoint fields (streamed rawInput snapshots, input-derived title/diff content, then a terminal update with only status/rawOutput). Keeping only the last update left ~half of all tool calls with no rawInput anywhere and stripped Edit/Write diffs on reload. All three layers (core read, workspace-server transfer, agent write buffer) now merge updates per toolCallId — shallow, later fields win — which reproduces exactly the state a full replay builds. The agent's API path still receives every update unmutated; the transfer layer now parses tool-update lines (other lines pass through untouched), which also removes the regex mis-bucketing risk. Validated replay-equivalent against the five biggest real local logs (up to 88MB): merged output deep-equals a full replay per tool call, with ~95% of the size win retained (88MB -> 17MB). Generated-By: PostHog Code Task-Id: fe96a3c1-92c7-41ef-9bd1-f2b890ebe566

richardsolomou added 3 commits July 2, 2026 05:43

greptile-apps Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread packages/agent/src/session-log-writer.ts Outdated

Comment thread packages/core/src/sessions/sessionService.ts Outdated

docs(sessions): trim repeated tool_call_update-collapse rationale

6a78d43

richardsolomou requested a review from a team July 2, 2026 03:27

docs(agent): correct tool-update hold-window comment (not a crash bound)

c05af11

charlesvien force-pushed the perf/collapse-tool-update-snapshots branch from f965d67 to c05af11 Compare July 2, 2026 05:46

adboio approved these changes Jul 2, 2026

View reviewed changes

adboio merged commit ca0787f into main Jul 2, 2026
23 checks passed

adboio deleted the perf/collapse-tool-update-snapshots branch July 2, 2026 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(sessions): collapse redundant tool_call_update snapshots#3079

perf(sessions): collapse redundant tool_call_update snapshots#3079
adboio merged 6 commits into
mainfrom
perf/collapse-tool-update-snapshots

richardsolomou commented Jul 2, 2026 •

edited by adboio

Loading

Uh oh!

github-actions Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jul 2, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

richardsolomou commented Jul 2, 2026 • edited by adboio Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this?

Automatic notifications

Uh oh!

github-actions Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

richardsolomou commented Jul 2, 2026 •

edited by adboio

Loading

github-actions Bot commented Jul 2, 2026 •

edited

Loading

greptile-apps Bot commented Jul 2, 2026 •

edited

Loading