Add managed-memory advise, prefetch, and discard-prefetch free functions#1775
Add managed-memory advise, prefetch, and discard-prefetch free functions#1775rparolin wants to merge 17 commits intoNVIDIA:mainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
|
/ok to test |
|
question: Does making these member functions of the |
I'm moving this back into draft. We discussed in our team meeting because I was already hesitant as Buffer is becoming a 'God object' with the functionality is gaining. We were going to explore alternatives. Free functions sounds like a good alternative to explore. |
…ns in the cuda.core.managed_memory namespace
…ups, fix docs - Remove duplicate long-form "cu_mem_advise_*" string aliases from _MANAGED_ADVICE_ALIASES; users pass short strings or the enum directly - Replace 4 boolean allow_* params in _normalize_managed_location with a single allowed_loctypes frozenset driven by _MANAGED_ADVICE_ALLOWED_LOCTYPES - Cache immutable runtime checks: CU_DEVICE_CPU, v2 bindings flag, discard_prefetch support, and advice enum-to-alias reverse map - Collapse hasattr+getattr to single getattr in _managed_location_enum - Move _require_managed_discard_prefetch_support to top of discard_prefetch for fail-fast behavior - Fix docs build: reset Sphinx module scope after managed_memory section in api.rst so subsequent sections resolve under cuda.core - Add discard_prefetch pool-allocation test and comment on _get_mem_range_attr Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e legacy path The _V2_BINDINGS cache in _buffer.pyx persists across tests, so monkeypatching get_binding_version alone is insufficient when earlier tests have already populated the cache with the v2 value. Promote _V2_BINDINGS from cdef int to a Python-level variable so tests can monkeypatch it directly via monkeypatch.setattr, and reset it to -1 in both legacy-signature tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t real hardware These three tests call cuMemAdvise on real CUDA devices and verify memory range attributes. On devices without concurrent_managed_access (e.g. Windows/WDDM), set_read_mostly silently no-ops and set_preferred_location fails with CUDA_ERROR_INVALID_DEVICE. Use the stricter _skip_if_managed_location_ops_unsupported guard, matching the pattern already used by test_managed_memory_functions_accept_raw_pointer_ranges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s support Reorder checks in discard_prefetch so _normalize_managed_target_range runs before _require_managed_discard_prefetch_support. This ensures non-managed buffers raise ValueError before the RuntimeError for missing cuMemDiscardAndPrefetchBatchAsync support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ps module Move advise, prefetch, and discard_prefetch functions and their helpers out of _buffer.pyx into a new _managed_memory_ops Cython module to improve separation of concerns. Expose _init_mem_attrs and _query_memory_attrs as non-inline cdef functions in _buffer.pxd so the new module can reuse them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Add managed-memory
advise(),prefetch(), anddiscard_prefetch()as free functions under the newcuda.core.managed_memorynamespace, wrapping the CUDA driver APIscuMemAdvise,cuMemPrefetchAsync, andcuMemDiscardAndPrefetchBatchAsync.Closes #1332
Details
New public API —
cuda.core.managed_memorymodule with three functions:advise(target, advice, location, *, size, location_type)— apply managed-memory advice to a rangeprefetch(target, location, *, stream, size, location_type)— prefetch a range to a target locationdiscard_prefetch(target, location, *, stream, size, location_type)— discard and prefetch a rangeEach function accepts either a
Buffer(size inferred) or a raw pointer (requiressize=). Location can be specified as aDevice, int ordinal,-1for host, or with an explicitlocation_type("device","host","host_numa","host_numa_current"). Advice can be aCUmem_adviseenum value or a string alias like"set_read_mostly". Thestreamparameter onprefetchanddiscard_prefetchalso accepts aGraphBuilder.Location validation matches the CUDA driver spec:
set_read_mostly,unset_read_mostly,unset_preferred_location— location is optional; allowed types aredevice,host,host_numaset_preferred_location— all four location types validset_accessed_by,unset_accessed_by— onlydeviceandhost(rejectshost_numaandhost_numa_current)Backward compatibility — when
cuda.bindings < 13.0, the functions fall back to the legacycuMemAdvise(ptr, size, advice, device_int)/cuMemPrefetchAsync(ptr, size, device_int, stream)signatures. Enum lookups for the legacy path are cached to avoid repeatedhasattr/getattrcalls.Implementation notes:
_managed_memory_ops.pyxmodule undercuda.core._memory_buffer.pxdexposes_init_mem_attrs,_query_memory_attrs, and the_MemAttrsstruct (with a newis_managedfield) for use by the ops module_normalize_managed_locationhandles all location inference and constraint checking; each branch returns directly with no dead fallthrough codecuPointerGetAttributes(the existing_MemAttrsinfrastructure)cuda.core.managed_memorymodule re-exports the three functions from the Cython implementationcuda.core.experimental.managed_memoryTests
Adds coverage for:
advise/prefetch/discard_prefetchon managed-memory pool buffers and externally wrapped managed allocationsadvisewithCUmem_adviseenum values (not just string aliases)set_preferred_location;host_numa/host_numa_currentrejection forset_accessed_by-1→ host,0→ device)prefetchwithlocation=NoneraisesValueErrorsize=rejection when target is aBuffer(TypeError)get_binding_version)size=cuMemRangeGetAttribute