Skip to content

perf(sessions): faster task open, less render churn and memory#3063

Merged
charlesvien merged 28 commits into
mainfrom
perf/streaming-runtime
Jul 2, 2026
Merged

perf(sessions): faster task open, less render churn and memory#3063
charlesvien merged 28 commits into
mainfrom
perf/streaming-runtime

Conversation

@richardsolomou

@richardsolomou richardsolomou commented Jul 1, 2026

Copy link
Copy Markdown
Member

Problem

The agent session transcript is the app's hot path, and the renderer did redundant per-token work and grew memory unbounded over a session:

  • Every streamed event fired 2–4 separate store notifications — roughly one render cycle per token.
  • The root-mounted sidebar re-rendered on every token, and several hot components (the transcript container, the session view) re-rendered every event just to read one field off the whole session.
  • immer's autofreeze deep-walked the whole, ever-growing events array on every append.
  • A session's transcript stayed fully resident after you navigated away.
  • On each turn end the conversation builder discarded its state and re-parsed every event.
  • Code-block highlighting re-parsed on every remount; streaming-markdown re-split each render; a 20 Hz elapsed-timer re-rendered twice as often as it displayed.
  • Resizing a panel wrote the whole layout to localStorage on every drag frame.
  • The code-review surface was pulled into the initial bundle even when never opened.

Changes

Each fix is its own commit:

  • Batch streamed events into one flush per frame instead of processing (and re-rendering) per token. Order preserved; flushes before permission handling and on teardown.
  • Sidebar: subscribe to a signature of just the fields it renders (render-count test: 20 → 0). Plus narrow primitive selectors (useSessionIsCloud, useSessionHandoffInProgress) so the transcript container and session view stop re-rendering every event to read one flag.
  • Disable immer autofreeze on the session store. Autofreeze re-walked the entire events array on every append (O(n) per event); events stay individually frozen for immutability, which is O(1) each.
  • Evict backgrounded transcripts ~20s after their view unmounts, rehydrating from disk on return. Only disconnected, idle sessions are eligible; rehydration only restores when the transcript is still empty (a reconnect wins) and clears the evicted flag only on success so a transient read failure retries.
  • Finalize the conversation builder in place on turn end instead of re-parsing every event, falling back to a full rebuild when the append-only prefix is invalid or events are out of ts-order.
  • Cache code-block syntax highlighting across mounts (bounded LRU, collision-checked) so virtualized scroll stops re-parsing.
  • Memoize the streaming-markdown block split.
  • Halve the generating-indicator tick (20 → 10 updates/sec).
  • Debounce panel-layout persistence (~60 writes/drag → 1; flushes on pagehide).
  • Cap diff tokenization line length to guard minified/single-giant-line files.
  • Lazy-load the code-review surface so it splits out of the initial bundle.
  • Open a big task by painting the log tail first. Opening read + transferred + parsed the entire ndjson log before any paint (~540ms for an 18.5MB real task, dominated by disk read + IPC transfer). Now the open path paints the last ~1.5MB immediately (tens of ms) and the full read + reconnect replaces it with the authoritative session. New readLocalLogsTail RPC + a throwaway fast-paint; the full connect path is unchanged.

Overlaps with, and is intended to supersede, @KarloAldrete's open PRs #2957 (pre-freeze + eviction) and #2710 (sidebar).

Considered and left out: the active-turn row-context split — largely subsumed by event batching (rows now re-render per frame, not per token).

How did you test this?

  • Unit tests added: event batching (order + flush-on-teardown), sidebar render-count + signature purity, highlight-cache reuse, debounced storage, evict/rehydrate residency (incl. rehydrate-from-disk and retry-after-failed-read), finalize-in-place equivalence across every scenario, the log tail read, and the tail-first paint.

  • Full suites pass: @posthog/core 1859, @posthog/ui 1135; all 22 packages typecheck.

  • Measured impact — isolated deterministic micro-benchmarks (main behavior vs branch), plus an on-app before/after over CDP:

    Fix Before After Delta
    Open a big task — time to first content (18.5MB / 9k-event task, on-app cold) 3527ms 114ms ~31x
    Append 10k events (immer autofreeze walk) 2290ms 56ms 41x
    Re-highlight a 400-line block on remount (cache) 19.2ms 0.04ms ~520x
    Store updates per 100-event in-frame burst 100 1 100x
    Turn-end rebuild (3k events) 0.30ms 0.04ms 6.9x (sub-ms)

    Notes: fast-open v1 improves perceived latency (the tail paints at 114ms) — the full read still completes behind it (~3.5s cold), so total work isn't reduced yet (tail-only + backfill is the follow-up). The batching figure is best-case for a full in-frame burst; sparse streams coalesce less. Idle transcript scroll held 120fps / 0 dropped frames.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

splitMarkdownBlocks ran unmemoized in the render body, re-scanning the
whole message string on every render (~10-20/sec while text smooths in).
Memoize on content so the linear re-scan only happens when the text
actually changes.

Finding #6.
The elapsed-time interval fired every 50ms — 20 state updates/sec for the
whole duration a prompt is pending (minutes for cloud tasks). The display
shows tenths of a second, so 100ms updates every rendered digit while
halving the re-render rate.

Finding #8.
DIFFS_HIGHLIGHTER_OPTIONS set only the theme. A minified or single-giant-
line file in a tool-call diff makes the diff worker tokenize the entire
line, stalling that diff. Cap at 1000 chars, matching the guard the diffs
library exposes for exactly this case.

Finding #13.
Each streamed ACP event ran handleSessionEvent immediately in the tRPC
onData callback. Electron IPC delivers each event as its own task, so a
fast turn produced one processing pass — and roughly one React commit —
per token.

Buffer events per taskRunId and flush on a ~16ms timer, replaying the
existing per-event handler in arrival order. Logic and ordering are
unchanged; only when a burst is processed changes, coalescing it into one
pass. The buffer is flushed synchronously before a permission request is
handled and on channel teardown/reset so nothing observes a stale
transcript or drops trailing events.

Finding #1.
highlightSyntax parses with Lezer on the main thread. The only cache was
each HighlightedCode instance's useMemo, so a code block scrolled out of
and back into the virtualized transcript re-parsed every time. Add a
bounded module-level LRU keyed on (theme, language, content) so remounts
reuse the parsed segments.

Finding #4.
react-resizable-panels fires onLayout every frame during a divider drag,
and persist serialized the whole layout tree to localStorage on each one.
Panels are uncontrolled (defaultSize) and in-memory state is untouched, so
debouncing only the write keeps live resize instant while collapsing a
drag's ~60 synchronous writes into one. Pending writes flush on pagehide.

Finding #12.
useSidebarData consumed the whole sessions record via useSessions(), which
immer replaces on every appended event. Because the sidebar is root-mounted,
that re-rendered the tree on every token during a turn — even though
deriveTaskData only reads isPromptPending, pendingPermissions.size,
cloudStatus and cloudOutput.pr_url.

Add computeSidebarSessionSignature (a primitive digest of just those fields)
and useSidebarSessionMap, which subscribes with a signature-based equality so
the taskId to session map only changes when a rendered field does. A
render-count test covers the streamed-token case.

Part of Finding #2 (sidebar instance).
immer autofreezes produced state, which for the append-only events array
meant walking a growing array on every streamed event. Freezing each event
at creation (hydration factory) and on append lets immer stop at the first
frozen node, so per-append cost no longer grows with transcript length.

Verified no code mutates a stored event (full core + builder suites pass
with frozen events).

Part of Finding #3 (autofreeze cost).
Two existing tests asserted synchronous side effects that are now deferred:
- sessionServiceHost steer-echo routing asserted appendEvents synchronously
  after an onData event, which #1 now batches onto a frame flush.
- panelLayoutStore persistence read localStorage synchronously after a write,
  which #12 now debounces.

Flush the frame timer / pagehide to observe the same behavior.

Follow-up to #1 and #12.
A session's events array only grew and stayed resident after you navigated
away, so several big chats open at once climbed the renderer heap toward the
memory-eviction crash reason.

Free a transcript ~20s after its view unmounts and reload it from disk on
return (useSessionEventsResidency + SessionService.ensureEventsLoaded /
scheduleEventEviction). Only disconnected, idle sessions are eligible, so no
streamed event can append to an evicted transcript, and rehydration only
restores when the transcript is still empty (a reconnect that refilled it
wins). Reuses the existing log fetch/parse; state is cleared on reset.

Part of Finding #3 (unbounded memory).
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit 5c8aa00.

@greptile-apps

greptile-apps Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Reviews (1): Last reviewed commit: "perf(sessions): evict backgrounded trans..." | Re-trigger Greptile

Comment thread packages/ui/src/utils/syntax-highlight.ts Outdated
Comment thread packages/ui/src/utils/syntax-highlight.ts
Comment thread packages/ui/src/utils/syntax-highlight.ts Outdated
Comment thread packages/core/src/sessions/sessionService.ts
Comment thread packages/ui/src/features/sessions/components/ConversationView.tsx Outdated
On the idle transition the builder discarded its state and re-parsed every
event via buildConversationItems. Finalize the persistent builder in place
instead — append the remaining events, then run the same finalization — so
the cost is proportional to what's new, not the whole transcript.

Falls back to the full rebuild when the append-only prefix is no longer valid
or events are out of ts-order (a full rebuild sorts; the incremental builder
processes in arrival order), keeping output identical. New tests stream then
finalize in place across every scenario, incl. resume-after-finalize.

Finding #7.
ReviewPage/CloudReviewPage were static-imported by TaskDetail and
TabContentRenderer, pulling the whole code-review UI (ReviewShell, diff rows,
comment UI, review hooks) into the initial bundle. Route both through a
Suspense-wrapped lazy import so that code splits into its own chunk, loaded
on first review open.

The shared diff/highlight libraries stay eager (the always-visible transcript
uses them), so this splits the review-specific UI, not those libs.

Finding #14.
ChatThread (the transcript container), SessionView, and the queued-message
hook each read a single field (isCloud / handoffInProgress) off the whole
session via useSessionForTask, which has no equality fn and returns a new
reference on every streamed event — re-rendering those components every frame.

Add primitive useSessionIsCloud / useSessionHandoffInProgress selectors (same
pattern as the existing useAdapterForTask) so they only re-render when that
field actually changes.

Part of Finding #2 (coarse selector), remaining consumers.
@richardsolomou richardsolomou marked this pull request as ready for review July 1, 2026 16:11
@greptile-apps

greptile-apps Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Reviews (2): Last reviewed commit: "perf(sessions): narrow hot single-field ..." | Re-trigger Greptile

Address review feedback on the perf changes:

- Highlight cache: store the code alongside the segments and compare it on
  lookup, so a 32-bit hash collision re-parses instead of returning another
  snippet's segments (which would render the wrong code text).
- Transcript rehydration: clear the evicted flag only once the transcript is
  actually populated. fetchSessionLogs swallows read errors and returns empty
  rather than throwing, so deleting the flag up front stranded the transcript
  empty on a transient failure; now an empty read stays evicted and a later
  visit retries.
- Extract the duplicated DIFFS_HIGHLIGHTER_OPTIONS into a shared module.

Adds a residency test for the retry-after-failed-read path.
Measured follow-up to the pre-freeze change. Pre-freezing events barely
helped (~1%): immer autofreeze re-walks the whole events array on every
append regardless of whether the elements are already frozen, so appending
10k events took ~2.3s. Disabling autofreeze drops that to ~56ms (41x). The
per-event Object.freeze stays (O(1) each) so events remain immutable.

Autofreeze is a dev-time mutation guard with no runtime value; full core
(1859) and ui (1135) suites pass with it off.

Refines Finding #3.
readLocalLogsTail(taskRunId, maxBytes) reads only the last maxBytes of a
session's ndjson log (dropping the partial first line) so a big transcript
can open by reading a small tail instead of the whole file. Wired end to
end: LocalLogsService -> workspace-server + host-router routers -> apps/code
LOGS_SERVICE forward -> core trpc deps (optional; core feature-detects).

Not yet used; the open-path switch follows.

Part of fast task open.
Opening a task read + transferred + parsed the entire ndjson log before any
paint — measured ~540ms for an 18.5MB log, dominated by disk read + IPC
transfer (both scale with log bytes; parse was ~30ms). Now the open path
paints the last ~1.5MB of the log immediately, so the latest turns show in
tens of ms, and the existing full read + reconnect replaces it with the
authoritative session (correct processed-line tracking + live connect).

Safe by construction: the full reconnect path is unchanged and always wins;
tail-first only adds an earlier throwaway paint. Falls back cleanly when the
host lacks the tail read or there's no local log.

Part of fast task open.
@richardsolomou richardsolomou changed the title perf(sessions): cut streaming render churn and transcript memory perf(sessions): faster task open, less render churn and memory Jul 1, 2026
richardsolomou and others added 6 commits July 2, 2026 00:43
The autofreeze note carried before/after benchmark numbers that read as
change-commentary; the sessionEvents freeze comment credited an immer
deep-freeze walk that no longer runs now that the store disables autofreeze.
@charlesvien charlesvien merged commit 90473e4 into main Jul 2, 2026
23 checks passed
@charlesvien charlesvien deleted the perf/streaming-runtime branch July 2, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants