-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Copilot Lies! #3279
Copy link
Copy link
Open
Labels
area:agentsSub-agents, fleet, autopilot, plan mode, background agents, and custom agentsSub-agents, fleet, autopilot, plan mode, background agents, and custom agentsarea:modelsModel selection, availability, switching, rate limits, and model-specific behaviorModel selection, availability, switching, rate limits, and model-specific behavior
Metadata
Metadata
Assignees
Labels
area:agentsSub-agents, fleet, autopilot, plan mode, background agents, and custom agentsSub-agents, fleet, autopilot, plan mode, background agents, and custom agentsarea:modelsModel selection, availability, switching, rate limits, and model-specific behaviorModel selection, availability, switching, rate limits, and model-specific behavior
Type
Fields
Give feedbackNo fields configured for Bug.
Describe the bug
The agent emits verbal commitments about its future behavior — "from now on",
"I'll only X", "going forward", "every X will Y", "I'll be more careful" —
without any underlying action to enforce them (no rule written, no persistent
state changed, no tool call). The agent has no memory across turns, so these
phrases cannot change future behavior. The user receives a false signal of
alignment.
The pattern is most common after a user correction, where the phrasing
acknowledges the correction without addressing it.
Fix: every verbal commitment about future behavior must be paired with the
concrete enforcement action that makes it real (file write, persistent rule,
todo, code change, tool invocation) in the same turn. If no enforcement
action is possible, the agent must stop and raise a flag instead of emitting
the commitment.
Affected version
No response
Steps to reproduce the behavior
No response
Expected behavior
No response
Additional context
No response