Skip to content

Fix nvshmem build#2815

Open
GaetanLepage wants to merge 1 commit intoNVIDIA:mainfrom
GaetanLepage:fix-nvshmem-build
Open

Fix nvshmem build#2815
GaetanLepage wants to merge 1 commit intoNVIDIA:mainfrom
GaetanLepage:fix-nvshmem-build

Conversation

@GaetanLepage
Copy link
Copy Markdown

Description

When building TE with NVTE_ENABLE_NVSHMEM=1, the build fails with:

FAILED: [code=2] nvshmem_api/CMakeFiles/nvshmemapi.dir/nvshmem_waitkernel.cu.o
/nix/store/8j41syz9cbh1l74k2283q14ghpap7nfx-cuda12.9-cuda_nvcc-12.9.86/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/nix/store/bn3mhkpnh7zrf0sb65jalb7dg76ycl42-gcc-wrapper-14.3.0/bin/c++  -I/build/source/transformer_eng>
/build/source/transformer_engine/common/nvshmem_api/nvshmem_waitkernel.cu(42): error: identifier "NVTE_CHECK_CUDA_DRIVER" is undefined
        NVTE_CHECK_CUDA_DRIVER(
        ^

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Add missing include in transformer_engine/common/nvshmem_api/nvshmem_waitkernel.cu
  • Add ${CMAKE_CURRENT_SOURCE_DIR}/../include to target_include_directories in transformer_engine/common/nvshmem_api/CMakeLists.txt

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR fixes a build failure when compiling TransformerEngine with NVTE_ENABLE_NVSHMEM=1 by adding two complementary, minimal corrections.

  • nvshmem_waitkernel.cu: Adds #include "../util/cuda_driver.h", which defines the NVTE_CHECK_CUDA_DRIVER macro used on several lines in the NVSHMEM_WAIT and STREAM_WAIT cases of nvshmem_wait_on_stream. This was the direct source of the reported compile error.
  • CMakeLists.txt: Adds ${CMAKE_CURRENT_SOURCE_DIR}/../include (i.e., transformer_engine/common/include/) to the PRIVATE include directories for the nvshmemapi target. This is required for the transitive dependency chain: cuda_driver.hcommon.h#include <transformer_engine/transformer_engine.h> (angle-bracket include), which can only be resolved if the include/ directory is on the compiler's search path. Without this CMake change, the header include in the .cu file would still fail at a deeper level.
  • CMakeLists.txt: Also fixes a missing newline at end of file.

Confidence Score: 5/5

Safe to merge — both changes are minimal, narrowly scoped, and together fully resolve the reported build failure with no side effects.

The fix is a two-line correction: adding the missing #include and the corresponding CMake include-path entry. The dependency chain is verified — cuda_driver.h transitively requires transformer_engine/common/include/ on the include path, and the .cu file genuinely lacked the include that defines NVTE_CHECK_CUDA_DRIVER. No logic changes, no interface changes, and the scope is limited to the NVSHMEM build path.

No files require special attention.

Important Files Changed

Filename Overview
transformer_engine/common/nvshmem_api/nvshmem_waitkernel.cu Adds missing #include "../util/cuda_driver.h" that provides the NVTE_CHECK_CUDA_DRIVER macro used on lines 43–52.
transformer_engine/common/nvshmem_api/CMakeLists.txt Adds ${CMAKE_CURRENT_SOURCE_DIR}/../include to PRIVATE include dirs so that the transitive <transformer_engine/transformer_engine.h> include (pulled in via common.h) resolves correctly; also fixes a missing trailing newline.

Sequence Diagram

sequenceDiagram
    participant cu as nvshmem_waitkernel.cu
    participant cdr as cuda_driver.h
    participant cmn as common.h
    participant te as transformer_engine.h

    cu->>cdr: include ../util/cuda_driver.h (NEW)
    cdr->>cmn: include ../common.h
    cmn->>te: include transformer_engine/transformer_engine.h
    Note over cmn,te: Resolved via ../include added to CMakeLists.txt
    te-->>cmn: DType, NVTETensor definitions
    cmn-->>cdr: transformer_engine namespace, NVTE_ERROR
    cdr-->>cu: NVTE_CHECK_CUDA_DRIVER macro defined
Loading

Reviews (2): Last reviewed commit: "Fix nvshmem build" | Re-trigger Greptile

Signed-off-by: Gaetan Lepage <gaetan@glepage.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant