Skip to content

[STF] Add extended C context creation API#9162

Open
caugonnet wants to merge 10 commits into
NVIDIA:mainfrom
caugonnet:stf_c_ctx_create_ex
Open

[STF] Add extended C context creation API#9162
caugonnet wants to merge 10 commits into
NVIDIA:mainfrom
caugonnet:stf_c_ctx_create_ex

Conversation

@caugonnet
Copy link
Copy Markdown
Contributor

@caugonnet caugonnet commented May 28, 2026

Expose stream/backend/resource options for C STF contexts so callers can reuse async resources and bind user streams explicitly.

Description

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Expose stream/backend/resource options for C STF contexts so callers can reuse async resources and bind user streams explicitly.
@caugonnet caugonnet self-assigned this May 28, 2026
@caugonnet caugonnet added the stf Sequential Task Flow programming model label May 28, 2026
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 28, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 28, 2026
@caugonnet caugonnet marked this pull request as ready for review May 28, 2026 09:47
@caugonnet caugonnet requested a review from a team as a code owner May 28, 2026 09:47
@caugonnet caugonnet enabled auto-merge (squash) May 28, 2026 09:47
@cccl-authenticator-app cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL May 28, 2026
@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test 05cc825

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

important:

Walkthrough

Adds a C API and implementation to create STF contexts with selectable backend, optional caller-stream binding, and a shareable async-resources handle; adds tests for stream-chaining and handle reuse and updates docs for non-blocking finalize semantics.

Changes

STF context creation with backend selection and async resource sharing

Layer / File(s) Summary
C API contracts: async resources and context options
c/experimental/stf/include/cccl/c/experimental/stf/stf.h
New stf_async_resources_handle opaque type and stf_async_resources_create/stf_async_resources_destroy; stf_backend_kind enum; stf_ctx_options with backend, has_stream + cudaStream_t, optional shared handle; stf_ctx_create_ex declaration and updated stf_ctx_finalize docs.
Async resources lifecycle implementation
c/experimental/stf/src/stf.cu
Conversion helpers via reinterpret_cast; stf_async_resources_create allocates via stf_try_allocate; stf_async_resources_destroy deletes the handle.
Context creation factory with backend and stream selection
c/experimental/stf/src/stf.cu
stf_ctx_create_ex interprets stf_ctx_options (NULL-safe), extracts has_stream/stream/handle, branches on backend (STREAM vs GRAPH), and dispatches to constructors using provided stream, provided handle, or defaults.
Stream override tests (stream-backend)
c/experimental/stf/test/test_stream_ctx_override.cu
Adds slow_set_kernel, submit_set, run_ctx_k_concurrent, run_ctx_k_chains, and three C2H_TESTs verifying back-to-back contexts on a caller-provided stream yield correct ordered results.
Async-resources handle tests and lifetime smoke
c/experimental/stf/test/test_async_resources_handle.cu
Adds tests validating shared stf_async_resources_handle across contexts (STREAM and GRAPH), non-blocking stf_ctx_finalize with user stream, back-to-back ordering using a shared handle, and smoke/lifetime checks including destroy(nullptr) and sync-before-destroy requirements.
Documentation: caller-stream finalize & async handle lifetime
cudax/include/cuda/experimental/__stf/graph/graph_ctx.cuh, .../internal/async_resources_handle.cuh, .../stream/stream_ctx.cuh
Expanded Doxygen comments clarifying non-blocking finalize when contexts use a caller stream and describing destructor/release semantics and required caller synchronization for async resource lifetimes.

Possibly related PRs

  • NVIDIA/cccl#8919: Related STF/stream backend and caller-stream finalize/ordering changes.

Suggested reviewers

  • andralex
  • alliepiper

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
c/experimental/stf/src/stf.cu (1)

447-448: 💤 Low value

suggestion: The variable ah is declared non-const but never modified after initialization. Per coding guidelines, unmodified variables should be const.

-  const bool has_stream     = (o.has_stream != 0);
-  async_resources_handle ah = o.handle ? *async_resources_from_opaque(o.handle) : async_resources_handle{nullptr};
+  const bool has_stream           = (o.has_stream != 0);
+  const async_resources_handle ah = o.handle ? *async_resources_from_opaque(o.handle) : async_resources_handle{nullptr};

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 33bd160d-b0b8-4343-a324-41bc60a17de2

📥 Commits

Reviewing files that changed from the base of the PR and between 5e3f881 and 05cc825.

📒 Files selected for processing (3)
  • c/experimental/stf/include/cccl/c/experimental/stf/stf.h
  • c/experimental/stf/src/stf.cu
  • c/experimental/stf/test/test_stream_ctx_override.cu

Address review feedback by keeping the derived C context creation resource handle immutable after initialization.
@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test 5bac735

@github-actions

This comment has been minimized.

Comment thread c/experimental/stf/test/test_stream_ctx_override.cu Outdated
Comment thread c/experimental/stf/include/cccl/c/experimental/stf/stf.h
Comment thread c/experimental/stf/include/cccl/c/experimental/stf/stf.h
…time

stf_ctx_finalize() is non-blocking when a context is created with a
caller-provided CUDA stream, but the C API docs claimed it always blocked.
Clarify the conditional behavior in stf.h, document the matching contract on
async_resources_handle / stream_ctx / graph_ctx in cudax, fix the
stf_ctx_create_ex() example to cudaStreamSynchronize() the caller stream
before stf_async_resources_destroy(), and add a focused C API test covering
stf_ctx_create_ex() with an explicit shared async resources handle on the
stream and graph backends.
@caugonnet caugonnet requested a review from a team as a code owner May 28, 2026 22:06
@caugonnet caugonnet requested a review from andralex May 28, 2026 22:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
c/experimental/stf/test/test_async_resources_handle.cu (1)

47-47: 💤 Low value

suggestion: The (acc & 0) trick to defeat dead-code elimination is fine but opaque. A one-line comment clarifying intent would help future readers.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f12fcef3-a702-4b55-acda-5d89f49d2ba3

📥 Commits

Reviewing files that changed from the base of the PR and between 5bac735 and 2a88f36.

📒 Files selected for processing (5)
  • c/experimental/stf/include/cccl/c/experimental/stf/stf.h
  • c/experimental/stf/test/test_async_resources_handle.cu
  • cudax/include/cuda/experimental/__stf/graph/graph_ctx.cuh
  • cudax/include/cuda/experimental/__stf/internal/async_resources_handle.cuh
  • cudax/include/cuda/experimental/__stf/stream/stream_ctx.cuh
✅ Files skipped from review due to trivial changes (3)
  • cudax/include/cuda/experimental/__stf/internal/async_resources_handle.cuh
  • cudax/include/cuda/experimental/__stf/graph/graph_ctx.cuh
  • cudax/include/cuda/experimental/__stf/stream/stream_ctx.cuh
🚧 Files skipped from review as they are similar to previous changes (1)
  • c/experimental/stf/include/cccl/c/experimental/stf/stf.h

caugonnet and others added 2 commits May 29, 2026 00:18
Clarify in the C API (stf.h) and cudax docs (async_resources_handle,
stream_ctx, graph_ctx) that stf_ctx_finalize() is non-blocking for contexts
created with a caller-provided stream, and that a shared async resources
handle must outlive the work those caller streams complete. Fix the
stf_ctx_create_ex() example to synchronize the caller stream before
stf_async_resources_destroy(), add a focused C API test for
stf_ctx_create_ex() with an explicit shared handle, and fix the
slow_set_kernel busy loop (acc & 0 was always 0, letting the compiler elide
the loop) to publish via an atomic sink with well-defined unsigned arithmetic.
@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test bfdad44

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test ba04eb8

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test f301863

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 45m 53s: Pass: 100%/59 | Total: 12h 23m | Max: 45m 53s | Hits: 91%/166735

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stf Sequential Task Flow programming model

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants