[STF] Add re-launchable popped graphs to stackable_ctx#9178
Draft
caugonnet wants to merge 2 commits into
Draft
Conversation
Splits graph_ctx_node finalization into phases so a popped nested graph
can be instantiated once and launched many times before the matching
epilogue runs. Adds three public surfaces on stackable_ctx:
* pop_prologue() / pop_epilogue() returning a launchable_graph_handle
that exposes exec(), stream(), graph(), and launch();
* launchable_graph_scope, an RAII guard that pairs push() with a
lazy pop_prologue() and runs pop_epilogue() in its destructor;
* pop_prologue_shared() returning a copyable/storable launchable_graph
whose destructor runs pop_epilogue() when the last copy dies.
The non-nested finalize path now flows through prepare_graph ->
ensure_instantiated -> launch_once -> finalize_after_launch; the
existing nested-graph behavior is preserved verbatim in
finalize_nested(). push() / pop() guard against being called while a
pop_prologue is still pending its matching pop_epilogue.
Coverage lives in the stackable_ctx.cuh inline UNITTESTs: repeated
launch, manual cudaGraphLaunch via exec()/stream(), zero-launch,
handle invalidation, RAII scope, shared basic/copies/container/manual
epilogue, and a CTK-12.4 pop_prologue + repeat_graph_scope test.
Contributor
Contributor
Author
|
/ok to test 43486c3 |
Contributor
😬 CI Workflow Results🟥 Finished in 45m 34s: Pass: 98%/55 | Total: 18h 01m | Max: 45m 29s | Hits: 24%/108141See results here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Splits graph_ctx_node finalization into phases so a popped nested graph can be instantiated once and launched many times before the matching epilogue runs. Adds three public surfaces on stackable_ctx:
The non-nested finalize path now flows through prepare_graph -> ensure_instantiated -> launch_once -> finalize_after_launch; the existing nested-graph behavior is preserved verbatim in finalize_nested(). push() / pop() guard against being called while a pop_prologue is still pending its matching pop_epilogue.
Coverage lives in the stackable_ctx.cuh inline UNITTESTs: repeated launch, manual cudaGraphLaunch via exec()/stream(), zero-launch, handle invalidation, RAII scope, shared basic/copies/container/manual epilogue, and a CTK-12.4 pop_prologue + repeat_graph_scope test.
Description
closes
Checklist