Skip to content

Python: fix: buffer out-of-order tool results in _sanitize_tool_history#4946

Open
ranst91 wants to merge 2 commits intomicrosoft:mainfrom
ranst91:fix/buffer-out-of-order-tool-results
Open

Python: fix: buffer out-of-order tool results in _sanitize_tool_history#4946
ranst91 wants to merge 2 commits intomicrosoft:mainfrom
ranst91:fix/buffer-out-of-order-tool-results

Conversation

@ranst91
Copy link
Copy Markdown

@ranst91 ranst91 commented Mar 27, 2026

Problem

When CopilotKit sends conversation history to the agent, tool results sometimes arrive before their corresponding assistant message due to how CopilotKit merges MESSAGES_SNAPSHOT events with locally-tracked messages. For
example:

[tool result for pieChart] ← arrives first, pending=None
[assistant: called pieChart] ← arrives after

_sanitize_tool_history was silently dropping any tool result that arrived when no pending call IDs were tracked (i.e. before its assistant message). This left pieChart (or any other frontend/declaration-only tool) unresolved
in subsequent turns, causing OpenAI to reject the request with:

▎ No tool output found for function call call_xxx

Fix

Instead of dropping orphaned tool results, buffer them by call_id. When the matching assistant message is later seen, re-inject the buffered results immediately after it — in the correct position for the Responses API to
resolve them.

Why this matters

This surfaces on any multi-turn conversation after using a generative UI component (pieChart, barChart, etc.) or any other frontend tool whose result is synthesized by the client rather than the agent. The second user message
after such a turn would always fail.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings March 27, 2026 10:06
@github-actions github-actions bot changed the title fix: buffer out-of-order tool results in _sanitize_tool_history Python: fix: buffer out-of-order tool results in _sanitize_tool_history Mar 27, 2026
@ranst91
Copy link
Copy Markdown
Author

ranst91 commented Mar 27, 2026

@microsoft-github-policy-service agree [company="CopilotKit"]

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the AG-UI message normalization pipeline to handle out-of-order tool results (tool function_result messages arriving before the corresponding assistant function_call message), preventing unresolved tool-call histories that can cause provider validation failures.

Changes:

  • Buffer “orphaned” tool results (by call_id) when they arrive before an assistant tool-call message.
  • When the matching assistant message is later processed, re-inject the buffered tool results immediately after it.

Comment on lines +86 to +91
for call_id in list(tool_ids):
if call_id in orphaned_tool_results:
sanitized.append(orphaned_tool_results.pop(call_id))
if pending_tool_call_ids:
pending_tool_call_ids.discard(call_id)

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orphaned_tool_results stores the entire tool Message per call_id, and the reinjection loop appends a buffered message once per matching call_id. If a single tool message contains multiple function_result contents (this happens in core where a tool message can carry a list of results), this can re-insert the same message multiple times and/or re-introduce unrelated function_result entries, which can violate provider validation. Consider buffering/splitting at the function_result content level (e.g., call_id -> Content) and re-emitting tool messages that include only the results matching the current assistant’s tool_ids, ensuring each buffered message/content is appended at most once.

Suggested change
for call_id in list(tool_ids):
if call_id in orphaned_tool_results:
sanitized.append(orphaned_tool_results.pop(call_id))
if pending_tool_call_ids:
pending_tool_call_ids.discard(call_id)
if tool_ids:
# Group buffered tool messages by underlying Message object so that:
# - Each original tool message is re-emitted at most once.
# - We can filter contents to only the function_result entries matching
# the current assistant message's tool_ids.
grouped_by_message: dict[int, dict[str, Any]] = {}
for call_id in list(tool_ids):
msg_for_call = orphaned_tool_results.get(call_id)
if not msg_for_call:
continue
msg_key = id(msg_for_call)
group = grouped_by_message.setdefault(
msg_key, {"message": msg_for_call, "call_ids": set()}
)
group["call_ids"].add(call_id)
for group in grouped_by_message.values():
msg_for_group: Message = cast(Message, group["message"])
call_ids_for_msg: set[str] = cast(set[str], group["call_ids"])
# Only keep function_result contents whose call_id matches one of the
# tool_ids for this assistant message. This avoids re-emitting unrelated
# function_result entries that belong to other tool calls.
filtered_contents = [
c
for c in (msg_for_group.contents or [])
if getattr(c, "type", None) == "function_result"
and getattr(c, "call_id", None) is not None
and str(c.call_id) in call_ids_for_msg
]
if filtered_contents:
sanitized.append(
Message(role=msg_for_group.role, contents=filtered_contents)
)
# Mark these call_ids as consumed from both orphaned_tool_results and
# pending_tool_call_ids so they are not processed again.
for consumed_call_id in call_ids_for_msg:
orphaned_tool_results.pop(consumed_call_id, None)
if pending_tool_call_ids:
pending_tool_call_ids.discard(consumed_call_id)

Copilot uses AI. Check for mistakes.
@ranst91
Copy link
Copy Markdown
Author

ranst91 commented Mar 27, 2026

@microsoft-github-policy-service agree company="CopilotKit"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants