Skip to content

feat(gpu): disable NFD/GFD and remove nodeAffinity from device plugin chart#497

Draft
elezar wants to merge 1 commit intomainfrom
feat/simplify-device-plugin-deployment
Draft

feat(gpu): disable NFD/GFD and remove nodeAffinity from device plugin chart#497
elezar wants to merge 1 commit intomainfrom
feat/simplify-device-plugin-deployment

Conversation

@elezar
Copy link
Member

@elezar elezar commented Mar 20, 2026

Summary

  • Disables GPU Feature Discovery (GFD) and Node Feature Discovery (NFD) DaemonSets in the NVIDIA device plugin HelmChart
  • Overrides the device plugin's default nodeAffinity to {} so the DaemonSet schedules unconditionally on the single-node gateway without requiring NFD/GFD labels (feature.node.kubernetes.io/pci-10de.present=true or nvidia.com/gpu.present=true)
  • Updates architecture docs and debug skill to reflect the change

Related Issue

N/A — pure declarative simplification with no new runtime code paths.

Changes

  • deploy/kube/gpu-manifests/nvidia-device-plugin-helmchart.yaml: disable gfd/nfd, add affinity: {} override, update comment block
  • architecture/gateway-single-node.md: update GPU Enablement section to explain NFD/GFD are disabled and why
  • .agents/skills/debug-openshell-cluster/SKILL.md: add troubleshooting entry for lingering NFD/GFD DaemonSets on clusters deployed before this change

Testing

  • openshell gateway start --gpu — device plugin DaemonSet reaches Ready (1/1)
  • kubectl get daemonset -A | grep -E 'nfd|gfd|node-feature' — no output
  • kubectl get node -o jsonpath='{.items[0].status.allocatable}'nvidia.com/gpu key present
  • mise run test — no regressions

Checklist

  • Conventional commit message
  • mise run pre-commit passed
  • Architecture docs updated
  • Debug skill updated per cluster infra change instructions in AGENTS.md

… chart

Disables GPU Feature Discovery and Node Feature Discovery DaemonSets and
overrides the device plugin's default nodeAffinity to empty so it schedules
unconditionally on the single-node gateway without requiring NFD/GFD labels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant