cudax/stf: migrate stackable/ from cuda_safe_call to cuda_try by andralex · Pull Request #9165 · NVIDIA/cccl

andralex · 2026-05-28T15:02:02Z

Summary

Migration PR3 of the cuda_safe_call → cuda_try rollout for cudax/__stf/. Targets cudax/include/cuda/experimental/__stf/stackable/ (15 sites). 13 sites converted, 2 KEEPs documented.

Companion to PR #9146 (allocators) and PR #9150 (utility).

Changes

`stackable_ctx_impl.cuh` — `graph_ctx_node` constructor + `finalize()`

13 cuda_safe_call sites → cuda_try. The constructor builds a CUDA graph in stages, so transactional cleanup is added:

Nested non-conditional branch (453, 457, 460, 465). The freshly created dummy_graph is destroyed intentionally mid-block. A SCOPE(fail) guarded by dummy_graph_owned frees it on early throw; the flag is disarmed right after the intentional destroy.
Outer graph (472). Owned by us only in the non-nested case; in nested cases graph is either parent_graph or a child of parent_graph (both owned upstream). A SCOPE(fail) destroys it on early throw, gated by bool graph_owned_by_us = !nested_graph;. The flag is disarmed the instant graph_ctx adopts it via auto gctx = graph_ctx(sub_graph, ...); — matching graph_ctx's documented ownership contract:

Constructor taking a user-provided graph. User code is not supposed to destroy the graph later.
Conditional branch (483, 496, 498, 518). cudaGraphConditionalHandleCreate, the two CTK variants of cudaGraphAddNode, and cudaGraphAddKernelNode. The handle and any added nodes live inside graph, so they are implicitly cleaned up by the outer SCOPE(fail).
finalize() (587, 590, 606, 616). cudaGraphAddDependencies (both CTK branches), cudaGraphDebugDotPrint, cudaGraphLaunch. Straight cuda_try conversion; no rollback applies.

Two `cuda_safe_call`s intentionally remain in `SCOPE(fail)` bodies

Lines 463 and 506. SCOPE bodies are noexcept, so cuda_safe_call (abort-on-failure) is the correct tool there.

`stackable_ctx.cuh` — 2 KEEPs in test fixtures

The two cuda_safe_call(cudaStreamSynchronize(stream)) calls inside UNITTEST lambdas passed to task(exec_place::host(), ...)->*lambda are kept and annotated. The host-task dispatch path's exception safety has not been audited, so an abort there remains safer than an unannotated throw escaping into the runtime.

Residual hazards intentionally documented inline

Both are no worse than the prior behavior (which aborted the entire process). Both are deferred:

Orphaned child node in parent_graph if a cuda_try throws after cudaGraphAddChildGraphNode. Clean removal would need cudaGraphDestroyNode plus dependency rewiring. Harmless until parent_graph is destroyed.
Stale conditional handle in *config.conditional_handle if a cuda_try throws after cudaGraphConditionalHandleCreate. CUDA has no destroy API for conditional handles (they are tied to their graph, which the SCOPE(fail) destroys). Caller must treat the handle as invalid in the catch-block.

Test plan

CI green on the cudax matrix entries that build STF stackable tests
No new functional behavior on the success path — all changes are throw-vs-abort and rollback on throw

In stackable_ctx_impl.cuh, replace cuda_safe_call with cuda_try in the graph_ctx_node constructor and finalize() so CUDA errors are reported as exceptions rather than aborting the process. The constructor builds a CUDA graph in stages, so add transactional cleanup: - In the nested non-conditional branch, the freshly created dummy_graph is destroyed intentionally mid-block. Guard it with a SCOPE(fail) that frees it only while dummy_graph_owned is true, and disarm the flag right after the intentional destroy. - The outer `graph` is owned by us only in the non-nested case (in the nested cases it is either parent_graph or a child of parent_graph, both owned upstream). A SCOPE(fail) destroys it on early throw and is disarmed the instant graph_ctx adopts it via `auto gctx = graph_ctx(sub_graph, ...);`, matching graph_ctx's documented ownership contract ("User code is not supposed to destroy the graph later"). - The conditional handle (cudaGraphConditionalHandleCreate) and any nodes added to `graph` (cudaGraphAddNode, cudaGraphAddKernelNode) are implicitly cleaned up by the outer SCOPE(fail) destroying `graph`. Two residual hazards are intentionally documented inline rather than fixed in this commit: - cudaGraphAddChildGraphNode leaves an orphaned child node inside parent_graph on later throw; cleanly removing it would need cudaGraphDestroyNode and dependency rewiring. - cudaGraphConditionalHandleCreate writes a handle into a caller-owned pointer; CUDA has no destroy API for conditional handles, so on throw the handle is left invalid (its backing graph is destroyed). Both are no worse than the prior behavior (which aborted). The four cuda_safe_call sites in finalize() (cudaGraphAddDependencies on both CTK branches, cudaGraphDebugDotPrint, cudaGraphLaunch) become plain cuda_try; no resource rollback applies. The two cuda_safe_call sites inside the new SCOPE(fail) bodies are intentional: SCOPE bodies are noexcept, so cuda_safe_call is the correct tool there. In stackable_ctx.cuh, the two cuda_safe_call sites inside UNITTEST host-task lambdas are kept and annotated. Those lambdas are dispatched by the STF host-task path, whose exception-safety has not been audited, so an abort remains safer than an unannotated throw.

copy-pr-bot · 2026-05-28T15:02:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-28T15:06:10Z

📝 Walkthrough

Summary by CodeRabbit

Bug Fixes
- Improved error handling in CUDA graph operations with refined exception path auditing.
- Enhanced resource cleanup and lifetime management for nested graph contexts.
Documentation
- Added clarifying comments explaining error handling strategies.

Walkthrough

CUDA Graph API error handling in stackable graph context construction, finalization, and node wiring is refactored to use cuda_try instead of cuda_safe_call. Nested-graph construction introduces explicit ownership tracking and RAII scope guards to ensure cleanup on failure. Graph ownership is disarmed after successful construction to prevent incorrect destruction of parent-owned graphs. Test host-task dispatch paths are documented to clarify exception-safety expectations.

Changes

Stackable graph context error handling

Layer / File(s)	Summary
Scope guard integration and nested-graph RAII construction `cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh`	Adds scope-guard include and refactors nested-graph construction to use `cuda_try` for `cudaGraphCreate`, `cudaGraphAddChildGraphNode`, and `cudaGraphChildGraphNodeGetGraph`. Introduces `graph_owned_by_us` tracking and `SCOPE(fail)` guard to ensure `dummy_graph` destruction on error during child-graph wiring.
Graph node wiring updates `cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh`	Conditional node addition and reset-kernel node addition switched from `cuda_safe_call` to `cuda_try`, preserving version-dependent call-site differences under `#if _CCCL_CTK_AT_LEAST(13, 0)`.
Graph ownership lifecycle management `cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh`	After successful `graph_ctx` construction, explicitly disarms ownership guard by setting `graph_owned_by_us = false` to prevent incorrect destruction of parent/nested-owned graphs.
Graph finalization and launch `cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh`	`cudaGraphAddDependencies`, `cudaGraphDebugDotPrint`, and `cudaGraphLaunch` updated to use `cuda_try` instead of `cuda_safe_call`, including version-dependent argument differences.
Test exception-handling documentation `cudax/include/cuda/experimental/__stf/stackable/stackable_ctx.cuh`	Comments added to two unit test host-task lambdas explaining that `cuda_safe_call(cudaStreamSynchronize(...))` is intentionally preserved because exceptions are not audited along the dispatch path and abort is preferable to unannotated throws escaping into the runtime.

Possibly related PRs

NVIDIA/cccl#8891: The switch from cuda_safe_call to cuda_try for CUDA Graph API calls is directly tied to enhancements in cuda_try output-parameter inference.

Suggested labels

stf

Suggested reviewers

caugonnet
alliepiper

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh (1)
40-40: 💤 Low value

suggestion: Include uses quote syntax instead of angle brackets per coding guideline ("All header inclusions must use angle bracket syntax"). However, this matches the existing pattern in lines 33-39, so fixing would be a file-wide refactor.
-#include "cuda/experimental/__stf/utility/scope_guard.cuh"
+#include <cuda/experimental/__stf/utility/scope_guard.cuh>
As per coding guidelines: "All header inclusions must use angle bracket syntax, e.g.,
"

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f5eedaee-c34f-4310-b104-9770f494cea2

📥 Commits

Reviewing files that changed from the base of the PR and between 4a3613f and 1f791e5.

📒 Files selected for processing (2)

cudax/include/cuda/experimental/__stf/stackable/stackable_ctx.cuh
cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh

andralex · 2026-05-28T15:20:14Z

/ok to test 1f791e5

github-actions · 2026-05-28T16:53:14Z

🥳 CI Workflow Results

🟩 Finished in 1h 31m: Pass: 100%/55 | Total: 1d 14h | Max: 1h 31m | Hits: 12%/296793

See results here.

andralex requested a review from a team as a code owner May 28, 2026 15:02

andralex requested a review from caugonnet May 28, 2026 15:02

github-project-automation Bot added this to CCCL May 28, 2026

github-project-automation Bot moved this to Todo in CCCL May 28, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 28, 2026

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudax/stf: migrate stackable/ from cuda_safe_call to cuda_try#9165

cudax/stf: migrate stackable/ from cuda_safe_call to cuda_try#9165
andralex wants to merge 1 commit into
NVIDIA:mainfrom
andralex:andralex/stf-cuda-try-stackable

andralex commented May 28, 2026

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Summary by CodeRabbit

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

andralex commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andralex commented May 28, 2026

Summary

Changes

stackable_ctx_impl.cuh — graph_ctx_node constructor + finalize()

Two cuda_safe_calls intentionally remain in SCOPE(fail) bodies

stackable_ctx.cuh — 2 KEEPs in test fixtures

Residual hazards intentionally documented inline

Test plan

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Summary by CodeRabbit

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

andralex commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 31m: Pass: 100%/55 | Total: 1d 14h | Max: 1h 31m | Hits: 12%/296793

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`stackable_ctx_impl.cuh` — `graph_ctx_node` constructor + `finalize()`

Two `cuda_safe_call`s intentionally remain in `SCOPE(fail)` bodies

`stackable_ctx.cuh` — 2 KEEPs in test fixtures