Skip to content

Add managed-memory advise, prefetch, and discard-prefetch free functions#1775

Open
rparolin wants to merge 17 commits intoNVIDIA:mainfrom
rparolin:rparolin/managed_mem_advise_prefetch
Open

Add managed-memory advise, prefetch, and discard-prefetch free functions#1775
rparolin wants to merge 17 commits intoNVIDIA:mainfrom
rparolin:rparolin/managed_mem_advise_prefetch

Conversation

@rparolin
Copy link
Collaborator

@rparolin rparolin commented Mar 17, 2026

Summary

Add managed-memory advise(), prefetch(), and discard_prefetch() as free functions under the new cuda.core.managed_memory namespace, wrapping the CUDA driver APIs cuMemAdvise, cuMemPrefetchAsync, and cuMemDiscardAndPrefetchBatchAsync.

Closes #1332

Details

New public APIcuda.core.managed_memory module with three functions:

  • advise(target, advice, location, *, size, location_type) — apply managed-memory advice to a range
  • prefetch(target, location, *, stream, size, location_type) — prefetch a range to a target location
  • discard_prefetch(target, location, *, stream, size, location_type) — discard and prefetch a range

Each function accepts either a Buffer (size inferred) or a raw pointer (requires size=). Location can be specified as a Device, int ordinal, -1 for host, or with an explicit location_type ("device", "host", "host_numa", "host_numa_current"). Advice can be a CUmem_advise enum value or a string alias like "set_read_mostly". The stream parameter on prefetch and discard_prefetch also accepts a GraphBuilder.

Location validation matches the CUDA driver spec:

  • set_read_mostly, unset_read_mostly, unset_preferred_location — location is optional; allowed types are device, host, host_numa
  • set_preferred_location — all four location types valid
  • set_accessed_by, unset_accessed_by — only device and host (rejects host_numa and host_numa_current)

Backward compatibility — when cuda.bindings < 13.0, the functions fall back to the legacy cuMemAdvise(ptr, size, advice, device_int) / cuMemPrefetchAsync(ptr, size, device_int, stream) signatures. Enum lookups for the legacy path are cached to avoid repeated hasattr/getattr calls.

Implementation notes:

  • All managed-memory helpers, validation logic, and public API functions live in a dedicated _managed_memory_ops.pyx module under cuda.core._memory
  • _buffer.pxd exposes _init_mem_attrs, _query_memory_attrs, and the _MemAttrs struct (with a new is_managed field) for use by the ops module
  • _normalize_managed_location handles all location inference and constraint checking; each branch returns directly with no dead fallthrough code
  • Managed-memory detection uses cuPointerGetAttributes (the existing _MemAttrs infrastructure)
  • The public cuda.core.managed_memory module re-exports the three functions from the Cython implementation
  • Backward compatibility shim also registered under cuda.core.experimental.managed_memory

Tests

Adds coverage for:

  • advise/prefetch/discard_prefetch on managed-memory pool buffers and externally wrapped managed allocations
  • advise with CUmem_advise enum values (not just string aliases)
  • Location validation: all four location types for set_preferred_location; host_numa/host_numa_current rejection for set_accessed_by
  • Inferred location from int (-1 → host, 0 → device)
  • prefetch with location=None raises ValueError
  • size= rejection when target is a Buffer (TypeError)
  • Invalid advice: bad string and wrong type
  • Rejection on non-managed buffers
  • Legacy bindings path (monkeypatched get_binding_version)
  • Raw pointer ranges with explicit size=
  • Observable driver-side effects via cuMemRangeGetAttribute

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Mar 17, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rparolin rparolin requested a review from Andy-Jost March 17, 2026 00:41
@rparolin rparolin self-assigned this Mar 17, 2026
@rparolin rparolin added this to the cuda.core v0.7.0 milestone Mar 17, 2026
@rparolin rparolin marked this pull request as ready for review March 17, 2026 00:45
@rparolin rparolin marked this pull request as draft March 17, 2026 00:45
@rparolin rparolin changed the title wip Add managed-memory advise, prefetch, and discard-prefetch on Buffer Mar 17, 2026
@rparolin rparolin marked this pull request as ready for review March 17, 2026 00:57
@github-actions
Copy link

@rparolin
Copy link
Collaborator Author

/ok to test

@jrhemstad
Copy link

question: Does making these member functions of the Buffer type preclude this functionality for allocations that weren't created through the Buffer type? Did we consider making these free functions instead of member functions on the Buffer type?

@rparolin
Copy link
Collaborator Author

rparolin commented Mar 17, 2026

question: Does making these member functions of the Buffer type preclude this functionality for allocations that weren't created through the Buffer type? Did we consider making these free functions instead of member functions on the Buffer type?

I'm moving this back into draft. We discussed in our team meeting because I was already hesitant as Buffer is becoming a 'God object' with the functionality is gaining. We were going to explore alternatives. Free functions sounds like a good alternative to explore.

@rparolin rparolin marked this pull request as draft March 17, 2026 19:35
@rparolin rparolin marked this pull request as ready for review March 17, 2026 23:46
rparolin and others added 7 commits March 17, 2026 17:30
…ups, fix docs

- Remove duplicate long-form "cu_mem_advise_*" string aliases from
  _MANAGED_ADVICE_ALIASES; users pass short strings or the enum directly
- Replace 4 boolean allow_* params in _normalize_managed_location with a
  single allowed_loctypes frozenset driven by _MANAGED_ADVICE_ALLOWED_LOCTYPES
- Cache immutable runtime checks: CU_DEVICE_CPU, v2 bindings flag,
  discard_prefetch support, and advice enum-to-alias reverse map
- Collapse hasattr+getattr to single getattr in _managed_location_enum
- Move _require_managed_discard_prefetch_support to top of discard_prefetch
  for fail-fast behavior
- Fix docs build: reset Sphinx module scope after managed_memory section in
  api.rst so subsequent sections resolve under cuda.core
- Add discard_prefetch pool-allocation test and comment on _get_mem_range_attr

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e legacy path

The _V2_BINDINGS cache in _buffer.pyx persists across tests, so
monkeypatching get_binding_version alone is insufficient when earlier
tests have already populated the cache with the v2 value. Promote
_V2_BINDINGS from cdef int to a Python-level variable so tests can
monkeypatch it directly via monkeypatch.setattr, and reset it to -1
in both legacy-signature tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t real hardware

These three tests call cuMemAdvise on real CUDA devices and verify
memory range attributes. On devices without concurrent_managed_access
(e.g. Windows/WDDM), set_read_mostly silently no-ops and
set_preferred_location fails with CUDA_ERROR_INVALID_DEVICE. Use the
stricter _skip_if_managed_location_ops_unsupported guard, matching the
pattern already used by test_managed_memory_functions_accept_raw_pointer_ranges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s support

Reorder checks in discard_prefetch so _normalize_managed_target_range
runs before _require_managed_discard_prefetch_support. This ensures
non-managed buffers raise ValueError before the RuntimeError for missing
cuMemDiscardAndPrefetchBatchAsync support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ps module

Move advise, prefetch, and discard_prefetch functions and their helpers
out of _buffer.pyx into a new _managed_memory_ops Cython module to
improve separation of concerns. Expose _init_mem_attrs and
_query_memory_attrs as non-inline cdef functions in _buffer.pxd so the
new module can reuse them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rparolin rparolin changed the title Add managed-memory advise, prefetch, and discard-prefetch on Buffer Add managed-memory advise, prefetch, and discard-prefetch free functions Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support managed memory advise, prefetch, and discard-prefetch

2 participants