Skip to content

.NET: Fix RequestInfoEvent lost when resuming workflow from checkpoint#4955

Open
peibekwe wants to merge 2 commits intomainfrom
peibekwe/requestinfo-fix
Open

.NET: Fix RequestInfoEvent lost when resuming workflow from checkpoint#4955
peibekwe wants to merge 2 commits intomainfrom
peibekwe/requestinfo-fix

Conversation

@peibekwe
Copy link
Copy Markdown
Contributor

Motivation and Context

When resuming a workflow from a JSON checkpoint, RestoreCheckpointAsync restored state and republished pending RequestInfoEvents immediately — before any event stream subscriber was attached. The events were sent to the EventSink, but no observer was listening yet, so they were silently dropped. This caused WatchStreamAsync to never yield the pending requests, and GetStatusAsync to remain NotStarted.

Fixes #2485

Description

This fix splits checkpoint restoration into a state-only restore step and a deferred republish step, so pending events are now republished only after the event stream subscribes, ensuring they are always delivered to the consumer. The fix covers both OffThread and Lockstep execution modes, runtime mid-flight restores, and subworkflow scenarios with qualified request ports.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@markwallace-microsoft markwallace-microsoft added .NET workflows Related to Workflows in agent-framework labels Mar 27, 2026
@github-actions github-actions bot changed the title Fix RequestInfoEvent lost when resuming workflow from checkpoint .NET: Fix RequestInfoEvent lost when resuming workflow from checkpoint Mar 27, 2026
@peibekwe peibekwe marked this pull request as ready for review March 27, 2026 20:05
Copilot AI review requested due to automatic review settings March 27, 2026 20:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a checkpoint-resume regression where pending RequestInfoEvents were republished before any event-stream subscriber was attached, causing resumed runs to silently drop those events and remain stuck (e.g., WatchStreamAsync yields nothing and status stays NotStarted).

Changes:

  • Split checkpoint restore into “state restore” vs. “deferred republish” for initial resumes; republish now occurs after event stream subscription.
  • Added a republish hook (ISuperStepRunner.RepublishPendingEventsAsync) and invoked it from both off-thread and lockstep event streams at subscription time.
  • Added/updated state persistence for subworkflow/forwarding scenarios (WorkflowHostExecutor pending response port mapping, RequestInfoExecutor wrapped requests) and introduced new regression tests.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointResumeTests.cs Adds regression coverage for resume/restore with pending requests across OffThread/Lockstep and subworkflow forwarding.
dotnet/src/Microsoft.Agents.AI.Workflows/WorkflowSession.cs Uses an internal resume path to suppress republishing to avoid duplicate consumer-visible events in session scenarios.
dotnet/src/Microsoft.Agents.AI.Workflows/Specialized/WorkflowHostExecutor.cs Persists request-id → original port mapping so qualified/unqualified response routing survives checkpoint restore.
dotnet/src/Microsoft.Agents.AI.Workflows/Specialized/RequestInfoExecutor.cs Persists wrapped-request mapping across checkpoints so forwarded external requests can be correctly rewired after restore.
dotnet/src/Microsoft.Agents.AI.Workflows/InProc/InProcessRunner.cs Defers pending-request republish on initial resume via a gated flag; keeps runtime restore republish behavior.
dotnet/src/Microsoft.Agents.AI.Workflows/InProc/InProcessExecutionEnvironment.cs Adds an internal resume API that can suppress pending-event republish.
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs Calls RepublishPendingEventsAsync immediately after subscribing to runner events.
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/LockstepRunEventStream.cs Buffers events across subscription timing and adds early-drain behavior for “pending requests only” resumes.
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/ISuperStepRunner.cs Adds the RepublishPendingEventsAsync contract for event streams.
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/AsyncRunHandle.cs Signals the run loop on resume when there are unserviced requests; clears buffered events for lockstep restores.
dotnet/src/Microsoft.Agents.AI.Workflows/Checkpointing/ICheckpointingHandle.cs Clarifies runtime-restore vs. initial-resume expectations around replaying pending requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

.NET workflows Related to Workflows in agent-framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.NET: Dotnet - RequestInfo objects are not re-emitted when rehydrating a workflow from JSON.

3 participants