Skip to content

[STF] Add re-launchable popped graphs to stackable_ctx#9178

Draft
caugonnet wants to merge 2 commits into
NVIDIA:mainfrom
caugonnet:stf_launchable_graphs
Draft

[STF] Add re-launchable popped graphs to stackable_ctx#9178
caugonnet wants to merge 2 commits into
NVIDIA:mainfrom
caugonnet:stf_launchable_graphs

Conversation

@caugonnet
Copy link
Copy Markdown
Contributor

@caugonnet caugonnet commented May 29, 2026

Splits graph_ctx_node finalization into phases so a popped nested graph can be instantiated once and launched many times before the matching epilogue runs. Adds three public surfaces on stackable_ctx:

  • pop_prologue() / pop_epilogue() returning a launchable_graph_handle that exposes exec(), stream(), graph(), and launch();
  • launchable_graph_scope, an RAII guard that pairs push() with a lazy pop_prologue() and runs pop_epilogue() in its destructor;
  • pop_prologue_shared() returning a copyable/storable launchable_graph whose destructor runs pop_epilogue() when the last copy dies.

The non-nested finalize path now flows through prepare_graph -> ensure_instantiated -> launch_once -> finalize_after_launch; the existing nested-graph behavior is preserved verbatim in finalize_nested(). push() / pop() guard against being called while a pop_prologue is still pending its matching pop_epilogue.

Coverage lives in the stackable_ctx.cuh inline UNITTESTs: repeated launch, manual cudaGraphLaunch via exec()/stream(), zero-launch, handle invalidation, RAII scope, shared basic/copies/container/manual epilogue, and a CTK-12.4 pop_prologue + repeat_graph_scope test.

Description

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Splits graph_ctx_node finalization into phases so a popped nested graph
can be instantiated once and launched many times before the matching
epilogue runs. Adds three public surfaces on stackable_ctx:

  * pop_prologue() / pop_epilogue() returning a launchable_graph_handle
    that exposes exec(), stream(), graph(), and launch();
  * launchable_graph_scope, an RAII guard that pairs push() with a
    lazy pop_prologue() and runs pop_epilogue() in its destructor;
  * pop_prologue_shared() returning a copyable/storable launchable_graph
    whose destructor runs pop_epilogue() when the last copy dies.

The non-nested finalize path now flows through prepare_graph ->
ensure_instantiated -> launch_once -> finalize_after_launch; the
existing nested-graph behavior is preserved verbatim in
finalize_nested(). push() / pop() guard against being called while a
pop_prologue is still pending its matching pop_epilogue.

Coverage lives in the stackable_ctx.cuh inline UNITTESTs: repeated
launch, manual cudaGraphLaunch via exec()/stream(), zero-launch,
handle invalidation, RAII scope, shared basic/copies/container/manual
epilogue, and a CTK-12.4 pop_prologue + repeat_graph_scope test.
@caugonnet caugonnet self-assigned this May 29, 2026
@caugonnet caugonnet added the stf Sequential Task Flow programming model label May 29, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 29, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 29, 2026
@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test 43486c3

@github-actions
Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 45m 34s: Pass: 98%/55 | Total: 18h 01m | Max: 45m 29s | Hits: 24%/108141

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stf Sequential Task Flow programming model

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant