feat: add CancellationToken for graceful agent execution cancellation#1772
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Issue: Code coverage gap noted by Codecov (74.4% patch coverage). The Codecov report indicates 11 missing lines in
While the integration tests cover many scenarios, targeted unit tests with mock signals would improve coverage and make it easier to verify each checkpoint works correctly in isolation. |
Review SummaryAssessment: Request Changes This is a well-designed feature that adds graceful cancellation support to agents. The implementation is thread-safe, has clear checkpoints, and includes comprehensive tests. However, there's a critical issue that needs to be addressed before merging. Review Categories
The core implementation is solid - the four checkpoint design is sensible and the thread-safety approach using |
ae46812 to
c9535ce
Compare
c9535ce to
05bc0bf
Compare
🔬 Adversarial Testing: Cancel + Interrupt InteractionsI ran additional adversarial tests focusing on how Result: ✅ No critical bugs found — 2 design observations documented below. 📊 Test Summary
🔍 Finding 1: Cancel is NOT checked during sequential tool executionObservation (Design Choice — Not a Bug)With Evidence: # Cancel called after first tool starts → BOTH tools complete fully
execution_log = ['start_0.1', 'end_0.1', 'start_0.5', 'end_0.5']Where in code: Impact: For long-running tools, Is this a bug? No — this appears intentional. Stopping a running tool mid-execution could leave state inconsistent. 🔍 Finding 2: Cancel during interrupt state doesn't clear interruptObservation (UX Question — May Need Documentation)When the agent is in
Reproduction: result = await agent.invoke_async("start") # Triggers interrupt
assert result.stop_reason == "interrupt"
agent.cancel() # User wants to cancel
# This FAILS with TypeError!
await agent.invoke_async("normal prompt") # Must resume from interrupt
# To recover, must manually:
agent._interrupt_state.deactivate()
agent._cancel_signal.clear()Is this a bug? Probably not — but it's a UX question:
The current behavior is defensible but should be documented. ✅ Confirmed: No Race ConditionsThread safety verified with 50+ concurrent threads calling
✅ Confirmed: No Information LossEvery The implementation correctly adds cancel 🎯 State Machine AnalysisConclusion: The implementation is solid. The two findings are design observations, not bugs — both behaviors are defensible and appear intentional. Consider documenting the cancel + interrupt interaction for users. |
Move cancel_signal out of invocation_state to avoid leaking internal implementation details to hooks, tools, and model providers. The signal is now passed as a dedicated parameter to stream_messages and process_stream, while event_loop continues accessing it directly via agent._cancel_signal.
f6ff47f to
6057e55
Compare
|
@jgoyani1 Hello, thank you very much for contributing this feature. I tested it as soon as the new version was released. However, I noticed an issue with the token usage statistics when a request is canceled. I did a quick test and found that after cancellation, the reported token count is inaccurate. It doesn’t include the size of each prompt, and it also misses the tokens that have already been generated. Right now, it seems to be counting only a very small portion of the total. Normal output without interruptionTotal: 12087 Interrupted near the end of outputTotal: 35 Accurate token statistics would be very helpful for billing-related applications. |
|
The root cause is in process_stream (streaming.py). When cancellation is detected mid-stream, the function returns immediately with usage=Usage(inputTokens=0, outputTokens=0, totalTokens=0) because:
Would investigate more on how to fetch this values and then post an update here if its possible to get the token count on the go while streaming. |
thx,here’s my current approach: After the output is interrupted, I can’t get the complete usage data returned by the model provider. So for now, I’m doing a local rough estimate with tiktoken using cl100k_base. For the input side, I count tokens by concatenating the full context sent to the model each time, including the system prompt, conversation history, tool results, and so on. For the output side, I count tokens by concatenating the text chunks that were already streamed back. There are two issues with this approach: cl100k_base is the tokenizer used for OpenAI models, but its tokenization rules differ from those used by Claude, DeepSeek, Kimi, Grok, and others, so the numbers will inevitably be off. The estimated results are only for display and reference. They do not represent the actual token count used for billing by the provider. |
Motivation
Agents need a way to be stopped from external contexts — web request handlers, background threads, timeout logic. Currently there is no graceful cancellation mechanism, so callers have no way to interrupt a running agent without killing the process.
Resolves: #81
Public API Changes
New
agent.cancel()method for graceful cancellation:Cancellation is checked at two strategic checkpoints:
{"text": "Cancelled by user"}toolResultfor each pendingtoolUseto maintain valid conversation stateThe agent is reusable after cancellation — the cancel signal is automatically cleared when the invocation completes.
Use Cases
agent.cancel()