feat: Unified HPC Toolchain — merge Nexa_Vortex runtime + Nexa_Inference apps + CUTLASS kernels into PyC#12
Merged
DarkStarStrix merged 2 commits intomainfrom Mar 2, 2026
Conversation
added 2 commits
March 2, 2026 09:52
…C toolchain
This PR integrates three previously separate projects into a single,
vertically integrated HPC toolchain under the PyC banner.
## What's included
### Runtime layer (from Nexa_Vortex)
- runtime/vortex_core/: Rust async execution engine
- Asynchronous CPU→GPU conveyor belt pipeline (pipeline.rs)
- Lock-free crossbeam-channel dispatcher (cpu_dispatch.rs)
- NUMA-aware pinned memory allocator (allocator.rs)
- Hardware topology profiler (hw_profile.rs)
- Telemetry broadcaster (telemetry.rs)
- Safe C-ABI FFI wrappers for the PyC compiler (ffi/mod.rs)
- build.rs: auto-generates Rust bindings from PyC headers via bindgen
- Mesocarp lock-free primitive integrations (integrations/)
- python/pyc/runtime/: Python wrappers (control_plane, telemetry_manager)
### CUTLASS kernel library (new)
- compiler/cutlass_kernels/: High-performance GPU kernels
- cutlass_gemm.cu: FP16/BF16 Tensor Core + FP32 SIMT GEMM
- cutlass_conv2d.cu: FP16/BF16 Tensor Core Conv2d
- cutlass_attention.cu: FP16/BF16 fused attention
- cutlass_registry_init.cu: auto-registers all kernels at library load
### Application layer (from Nexa_Inference)
- apps/inference_api/: FastAPI SciML inference server
- main.py, inference.py, engines.py, pipelines.py
- models/schemas.py, auth.py, config.py
### Python SDK
- python/pyc/: Unified Python package
- pyc.compiler: ctypes wrapper for the C compiler ABI + kernel policy
- pyc.runtime: PyO3/pure-Python pipeline and hardware detection
- pyc.__init__: top-level convenience API (pyc.init(), pyc.detect_hardware())
### Build system
- pyproject.toml: Maturin-based Python package (replaces old pyproject.toml)
- runtime/CMakeLists.txt: ExternalProject Cargo build integration
- compiler/CMakeLists.txt: updated with CUTLASS kernel target
- include/pyc/cuda_backend.h: new public header for CUDA dispatch ABI
- scripts/migrate_sources.sh: helper to pull sources from original repos
## Architecture
[Nexa_Inference API] ← apps/inference_api/
↓
[PyC Compiler] ← compiler/ (IR, passes, CUTLASS kernels)
↓
[Vortex Runtime] ← runtime/vortex_core/ (async, NUMA, telemetry)
↓
[Hardware] ← GPU (CUDA/CUTLASS), CPU, NVLink
## Next steps
- Run scripts/migrate_sources.sh to pull remaining C sources from PyC
- cmake -B build -DPYC_BUILD_CUDA=ON && cmake --build build
- maturin develop --features python_ext
- pytest tests/
The new cuda_backend.h introduced in the merger PR defined a pyc_cuda_dispatch_trace struct with different field names than what the existing cuda_backend.c and compiler_api.c actually use, causing 20+ compile errors across all three CI platforms. Changes: - Replace incorrect struct fields (kernel_symbol, used_tensor_cores, fallback_reason, etc.) with the real fields from the original PyC codebase (cuda_requested, cuda_available, fallback_to_cpu, reason) - Add #define PYC_CUDA_REASON_MAX 128 (was missing, caused undeclared identifier error in cuda_backend.c:544) - Change include from 'pyc/ir.h' + 'pyc/kernel_registry.h' to 'pyc/compiler_api.h' which transitively provides pyc_tensor, pyc_ir_module, and all other required types (fixes 'unknown type name pyc_tensor' errors in cuda_backend.h:53,55,70,72) - Add void pyc_cuda_dispatch_trace_init() declaration (was called in compiler_api.c:1036 and :1279 but not declared in the header) - Fix pyc_cuda_dispatch() return type to pyc_cuda_dispatch_status (was int) and align parameter names with the real implementation - Fix PYC_CUDA_DISPATCH_ERROR value from -1 to 2 to match the enum Fixes CI on Ubuntu, macOS, and Windows.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR evolves PyC into a vertically integrated HPC toolchain by merging the best components from
Nexa_VortexandNexa_Inferencedirectly into this repository, and adding a new CUTLASS-backed kernel library.Architecture
Files Added / Changed
Runtime (from Nexa_Vortex)
runtime/vortex_core/src/pipeline.rsruntime/vortex_core/src/cpu_dispatch.rsruntime/vortex_core/src/allocator.rsruntime/vortex_core/src/hw_profile.rsruntime/vortex_core/src/ffi/mod.rsruntime/vortex_core/build.rsinclude/pyc/headersruntime/vortex_core/src/integrations/python/pyc/runtime/control_plane.pypython/pyc/runtime/telemetry_manager.pyCUTLASS Kernels (new)
compiler/cutlass_kernels/cutlass_gemm.cucompiler/cutlass_kernels/cutlass_conv2d.cucompiler/cutlass_kernels/cutlass_attention.cucompiler/cutlass_kernels/cutlass_registry_init.cu.soload timeApplication Layer (from Nexa_Inference)
apps/inference_api/src/main.pyapps/inference_api/src/inference.pyapps/inference_api/src/engines.pyapps/inference_api/src/pipelines.pyapps/inference_api/models/schemas.pyapps/inference_api/src/auth.pyapps/inference_api/src/config.pyPython SDK
python/pyc/__init__.pypyc.init(),pyc.detect_hardware()python/pyc/compiler/__init__.pypython/pyc/runtime/hw_profile.pypython/pyc/runtime/pipeline.pyBuild System
pyproject.tomlruntime/CMakeLists.txtcompiler/CMakeLists.txtinclude/pyc/cuda_backend.hscripts/migrate_sources.shHow to Build
Checklist
pycpackage