Skip to content

cudax/stf: migrate stackable/ from cuda_safe_call to cuda_try#9165

Open
andralex wants to merge 1 commit into
NVIDIA:mainfrom
andralex:andralex/stf-cuda-try-stackable
Open

cudax/stf: migrate stackable/ from cuda_safe_call to cuda_try#9165
andralex wants to merge 1 commit into
NVIDIA:mainfrom
andralex:andralex/stf-cuda-try-stackable

Conversation

@andralex
Copy link
Copy Markdown
Contributor

Summary

Migration PR3 of the cuda_safe_callcuda_try rollout for cudax/__stf/. Targets cudax/include/cuda/experimental/__stf/stackable/ (15 sites). 13 sites converted, 2 KEEPs documented.

Companion to PR #9146 (allocators) and PR #9150 (utility).

Changes

stackable_ctx_impl.cuhgraph_ctx_node constructor + finalize()

13 cuda_safe_call sites → cuda_try. The constructor builds a CUDA graph in stages, so transactional cleanup is added:

  • Nested non-conditional branch (453, 457, 460, 465). The freshly created dummy_graph is destroyed intentionally mid-block. A SCOPE(fail) guarded by dummy_graph_owned frees it on early throw; the flag is disarmed right after the intentional destroy.

  • Outer graph (472). Owned by us only in the non-nested case; in nested cases graph is either parent_graph or a child of parent_graph (both owned upstream). A SCOPE(fail) destroys it on early throw, gated by bool graph_owned_by_us = !nested_graph;. The flag is disarmed the instant graph_ctx adopts it via auto gctx = graph_ctx(sub_graph, ...); — matching graph_ctx's documented ownership contract:

    Constructor taking a user-provided graph. User code is not supposed to destroy the graph later.

  • Conditional branch (483, 496, 498, 518). cudaGraphConditionalHandleCreate, the two CTK variants of cudaGraphAddNode, and cudaGraphAddKernelNode. The handle and any added nodes live inside graph, so they are implicitly cleaned up by the outer SCOPE(fail).

  • finalize() (587, 590, 606, 616). cudaGraphAddDependencies (both CTK branches), cudaGraphDebugDotPrint, cudaGraphLaunch. Straight cuda_try conversion; no rollback applies.

Two cuda_safe_calls intentionally remain in SCOPE(fail) bodies

Lines 463 and 506. SCOPE bodies are noexcept, so cuda_safe_call (abort-on-failure) is the correct tool there.

stackable_ctx.cuh — 2 KEEPs in test fixtures

The two cuda_safe_call(cudaStreamSynchronize(stream)) calls inside UNITTEST lambdas passed to task(exec_place::host(), ...)->*lambda are kept and annotated. The host-task dispatch path's exception safety has not been audited, so an abort there remains safer than an unannotated throw escaping into the runtime.

Residual hazards intentionally documented inline

Both are no worse than the prior behavior (which aborted the entire process). Both are deferred:

  1. Orphaned child node in parent_graph if a cuda_try throws after cudaGraphAddChildGraphNode. Clean removal would need cudaGraphDestroyNode plus dependency rewiring. Harmless until parent_graph is destroyed.

  2. Stale conditional handle in *config.conditional_handle if a cuda_try throws after cudaGraphConditionalHandleCreate. CUDA has no destroy API for conditional handles (they are tied to their graph, which the SCOPE(fail) destroys). Caller must treat the handle as invalid in the catch-block.

Test plan

  • CI green on the cudax matrix entries that build STF stackable tests
  • No new functional behavior on the success path — all changes are throw-vs-abort and rollback on throw

In stackable_ctx_impl.cuh, replace cuda_safe_call with cuda_try in the
graph_ctx_node constructor and finalize() so CUDA errors are reported as
exceptions rather than aborting the process.

The constructor builds a CUDA graph in stages, so add transactional
cleanup:

  - In the nested non-conditional branch, the freshly created
    dummy_graph is destroyed intentionally mid-block.  Guard it with a
    SCOPE(fail) that frees it only while dummy_graph_owned is true, and
    disarm the flag right after the intentional destroy.

  - The outer `graph` is owned by us only in the non-nested case (in the
    nested cases it is either parent_graph or a child of parent_graph,
    both owned upstream).  A SCOPE(fail) destroys it on early throw and
    is disarmed the instant graph_ctx adopts it via
    `auto gctx = graph_ctx(sub_graph, ...);`, matching graph_ctx's
    documented ownership contract ("User code is not supposed to destroy
    the graph later").

  - The conditional handle (cudaGraphConditionalHandleCreate) and any
    nodes added to `graph` (cudaGraphAddNode, cudaGraphAddKernelNode)
    are implicitly cleaned up by the outer SCOPE(fail) destroying
    `graph`.

Two residual hazards are intentionally documented inline rather than
fixed in this commit:

  - cudaGraphAddChildGraphNode leaves an orphaned child node inside
    parent_graph on later throw; cleanly removing it would need
    cudaGraphDestroyNode and dependency rewiring.

  - cudaGraphConditionalHandleCreate writes a handle into a caller-owned
    pointer; CUDA has no destroy API for conditional handles, so on
    throw the handle is left invalid (its backing graph is destroyed).

Both are no worse than the prior behavior (which aborted).

The four cuda_safe_call sites in finalize() (cudaGraphAddDependencies
on both CTK branches, cudaGraphDebugDotPrint, cudaGraphLaunch) become
plain cuda_try; no resource rollback applies.

The two cuda_safe_call sites inside the new SCOPE(fail) bodies are
intentional: SCOPE bodies are noexcept, so cuda_safe_call is the
correct tool there.

In stackable_ctx.cuh, the two cuda_safe_call sites inside
UNITTEST host-task lambdas are kept and annotated.  Those lambdas
are dispatched by the STF host-task path, whose exception-safety has
not been audited, so an abort remains safer than an unannotated throw.
@andralex andralex requested a review from a team as a code owner May 28, 2026 15:02
@andralex andralex requested a review from caugonnet May 28, 2026 15:02
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 28, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 28, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling in CUDA graph operations with refined exception path auditing.
    • Enhanced resource cleanup and lifetime management for nested graph contexts.
  • Documentation

    • Added clarifying comments explaining error handling strategies.

Walkthrough

CUDA Graph API error handling in stackable graph context construction, finalization, and node wiring is refactored to use cuda_try instead of cuda_safe_call. Nested-graph construction introduces explicit ownership tracking and RAII scope guards to ensure cleanup on failure. Graph ownership is disarmed after successful construction to prevent incorrect destruction of parent-owned graphs. Test host-task dispatch paths are documented to clarify exception-safety expectations.

Changes

Stackable graph context error handling

Layer / File(s) Summary
Scope guard integration and nested-graph RAII construction
cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh
Adds scope-guard include and refactors nested-graph construction to use cuda_try for cudaGraphCreate, cudaGraphAddChildGraphNode, and cudaGraphChildGraphNodeGetGraph. Introduces graph_owned_by_us tracking and SCOPE(fail) guard to ensure dummy_graph destruction on error during child-graph wiring.
Graph node wiring updates
cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh
Conditional node addition and reset-kernel node addition switched from cuda_safe_call to cuda_try, preserving version-dependent call-site differences under #if _CCCL_CTK_AT_LEAST(13, 0).
Graph ownership lifecycle management
cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh
After successful graph_ctx construction, explicitly disarms ownership guard by setting graph_owned_by_us = false to prevent incorrect destruction of parent/nested-owned graphs.
Graph finalization and launch
cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh
cudaGraphAddDependencies, cudaGraphDebugDotPrint, and cudaGraphLaunch updated to use cuda_try instead of cuda_safe_call, including version-dependent argument differences.
Test exception-handling documentation
cudax/include/cuda/experimental/__stf/stackable/stackable_ctx.cuh
Comments added to two unit test host-task lambdas explaining that cuda_safe_call(cudaStreamSynchronize(...)) is intentionally preserved because exceptions are not audited along the dispatch path and abort is preferable to unannotated throws escaping into the runtime.

Possibly related PRs

  • NVIDIA/cccl#8891: The switch from cuda_safe_call to cuda_try for CUDA Graph API calls is directly tied to enhancements in cuda_try output-parameter inference.

Suggested labels

stf

Suggested reviewers

  • caugonnet
  • alliepiper

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh (1)

40-40: 💤 Low value

suggestion: Include uses quote syntax instead of angle brackets per coding guideline ("All header inclusions must use angle bracket syntax"). However, this matches the existing pattern in lines 33-39, so fixing would be a file-wide refactor.

-#include "cuda/experimental/__stf/utility/scope_guard.cuh"
+#include <cuda/experimental/__stf/utility/scope_guard.cuh>

As per coding guidelines: "All header inclusions must use angle bracket syntax, e.g.,

"


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f5eedaee-c34f-4310-b104-9770f494cea2

📥 Commits

Reviewing files that changed from the base of the PR and between 4a3613f and 1f791e5.

📒 Files selected for processing (2)
  • cudax/include/cuda/experimental/__stf/stackable/stackable_ctx.cuh
  • cudax/include/cuda/experimental/__stf/stackable/stackable_ctx_impl.cuh

@andralex
Copy link
Copy Markdown
Contributor Author

/ok to test 1f791e5

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 31m: Pass: 100%/55 | Total: 1d 14h | Max: 1h 31m | Hits: 12%/296793

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant