-
Notifications
You must be signed in to change notification settings - Fork 18
feat: LLM streaming with UX improvements #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…provements
- Render plain <Text> during streaming instead of Markdown to eliminate
visual glitches from incomplete markup (unclosed fences, bold, etc.)
- Add copy button with "✓ Copied" feedback to fenced code blocks
- Remove auto-scroll during streaming and on completion so users can
read at their own pace (matches Claude/ChatGPT behavior)
- Scroll to bottom only when user sends a new message
Replace the plain <Text> streaming renderer with the existing <Markdown> component so formatting (bold, code blocks, lists, etc.) appears progressively as tokens arrive, eliminating the visual "pop" when the stream completes. Add normalizeStreamingMarkdown() to close unclosed code fences mid-stream, and stripThinkTags() to hide <think> blocks during streaming. Remove now-dead streamingTextStyle.
| } | ||
| } | ||
|
|
||
| async function sendMessageStreaming(newMessage: ChatMessage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requestCompletionStreaming and sendMessageStreaming seem like very similar functions. It would be beneficial to extract a shared helper instead of having duplicate code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call — refactored this. Extracted a shared streamCompletion(messages) helper that contains all the streaming callback logic (delta accumulation, RAF-throttled UI updates, tool call handling, KA content gathering, error handling). Both requestCompletionStreaming and sendMessageStreaming now delegate to it. This also fixed a subtle bug where sendMessageStreaming was missing the KA content gathering logic that requestCompletionStreaming had.
| currentData = ""; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the server stream ends without sending an explicit event: done (e.g. server crash, network drop), onDone is never called. This leaves the UI stuck: streamingContent is never cleared, isGenerating stays true, and the input remains disabled. Consider calling onDone() as a fallback after the while loop exits, or at minimum call onError so the UI can recover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch — fixed. Added a streamFinalized flag in makeStreamingCompletionRequest that gets set when either a done or error SSE event is received. After the read loop exits, if the flag is still false (server crash, network drop), we call onError("Connection lost — the server stopped responding") as a fallback so the UI can recover. Also added a finally block in the caller to cancel any pending requestAnimationFrame on unexpected errors, preventing stale UI updates.
| {isGenerating && streamingContent === null && <Chat.Thinking />} | ||
| {streamingContent !== null && ( | ||
| <Chat.Message icon="assistant"> | ||
| <Markdown> | ||
| {normalizeStreamingMarkdown(stripThinkTags(streamingContent))} | ||
| </Markdown> | ||
| </Chat.Message> | ||
| )} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no auto-scroll to follow the streaming content as it arrived. If the response is long enough to overflow the view, the user has to manually scroll down. Consider adding a throttled scrollToEnd in the onDelta callback, and/or a scrollToEnd in onDone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, but this is actually intentional! Both Claude and ChatGPT follow the same pattern — they don't force auto-scroll during streaming. The reasoning is that if a user scrolls up to re-read earlier content while a response is still generating, auto-scrolling would yank them back down, which is disruptive. The user stays in control of their scroll position, and can scroll down manually when they're ready to see the latest output.
…eam drop recovery - Extract shared `streamCompletion()` helper to eliminate duplication between `requestCompletionStreaming` and `sendMessageStreaming` (also fixes missing KA content gathering in the latter) - Add `streamFinalized` flag in SSE client parser so the UI recovers when the server stream ends without an explicit done/error event (crash, network drop) - Add `finally` block to cancel pending requestAnimationFrame on unexpected errors, preventing stale UI updates
Add viewport-relative spacers to the chat ScrollView so that scrollToEnd() positions the user's new message near the top of the viewport instead of the bottom, giving the AI response a full page of space to stream into. - Track ScrollView height via onLayout - Bottom spacer (85% viewport) gives scrollToEnd room - Top spacer (15% viewport) matches spacing for the first message - Spacers hidden on landing screen (messages.length === 0)
…positioning - Replace 15%/85% viewport spacers with contentContainerStyle.minHeight for scroll-to-top on message send — eliminates over-scroll into blank space and viewport jump after streaming completes - Use two-phase scroll: onLayout captures Y + sets minHeight, onContentSizeChange fires scrollTo once content is tall enough - Increase inter-message spacing (16→28px) and top padding to 28px for consistent gap between nav bar and user message across all screen sizes - Change Messages.tsx prop type from ViewProps to ScrollViewProps so onContentSizeChange and contentContainerStyle are properly typed

Summary
SSE streaming
Server-sent events transport for real-time LLM responses on web (native path unchanged).
Includes server-side
processStreamingCompletionwith.stream()→.invoke()fallback, client-side SSE parser, and rAF-throttled UI updates.Plain text during streaming
Renders raw
<Text>instead of Markdown while tokens arrive, eliminating visual glitches from incomplete markup (unclosed fences, bold, partial links).Switches to full Markdown on completion.
Code block copy button
Fenced code blocks display a copy icon (top-right) that swaps to ✓ Copied feedback for 2 seconds.
Scroll UX — user message to top on send
No auto-scroll during streaming (matches Claude/ChatGPT behavior).
When the user sends a message, viewport-relative spacers inside the ScrollView cause
scrollToEnd()to position the user's message near the top of the viewport, giving the AI response a full page of space to stream into.Top spacer (15% viewport) ensures consistent spacing from the nav bar for both first and subsequent messages. Bottom spacer (85% viewport) provides the scroll target. Spacers are hidden on the landing screen.
Files changed
apps/agent/src/shared/chat.tsprocessStreamingCompletion(server),makeStreamingCompletionRequest(client),writeSSEhelperapps/agent/src/server/index.tsAccept: text/event-streamon/llm)apps/agent/src/app/(protected)/chat.tsxstreamingContentstate, streaming functions, plain text rendering, scroll-on-send, viewport spacersapps/agent/src/components/Markdown.tsxCopyCodeButtoncomponent, customfencerender ruleTest plan