-
Notifications
You must be signed in to change notification settings - Fork 178
Description
Summary
The setup_dns_proxy() function in deploy/docker/cluster-entrypoint.sh writes nameserver <container-IP> to /etc/rancher/k3s/resolv.conf and sets up iptables PREROUTING DNAT rules to forward port 53 to Docker's internal DNS (127.0.0.11). However, the DNAT rules are ineffective for builds running inside k3s, causing all image builds (and any DNS resolution during builds) to fail with "Temporary failure resolving" errors.
Environment
- Host OS: Ubuntu with systemd-resolved (default on Ubuntu 20.04+, including DGX Spark)
- OpenShell CLI:
openshell 0.0.10 - Image:
ghcr.io/nvidia/openshell/cluster:0.0.10
Steps to Reproduce
- On a Linux host with systemd-resolved, start an OpenShell gateway
- Create a sandbox with a Dockerfile that runs
apt-get update - The build fails:
Err:1 http://deb.debian.org/debian bookworm InRelease Temporary failure resolving 'deb.debian.org'
Root Cause
The setup_dns_proxy() function:
- Discovers Docker DNS ports from iptables
DOCKER_OUTPUTchain - Gets the container's
eth0IP (e.g.,172.18.0.2) - Adds iptables PREROUTING DNAT rules to forward
:53→127.0.0.11:<high-port> - Writes
nameserver 172.18.0.2to/etc/rancher/k3s/resolv.conf - Verifies DNS from the container's own namespace — this succeeds
- But k3s builds (containerd) run in a different network context where the PREROUTING DNAT rules don't apply
- DNS queries to
172.18.0.2:53get "connection refused"
The fallback to 8.8.8.8/8.8.4.4 only triggers if setup_dns_proxy returns non-zero, but it can "succeed" (write the rules, pass self-verification) even though the rules don't work for k3s pods/builds.
Workaround
docker exec <gateway-container> sh -c 'echo "nameserver 8.8.8.8" > /etc/rancher/k3s/resolv.conf'Suggested Fix
After setting up the iptables DNAT rules, verify DNS from a network namespace that mirrors k3s pod networking (not just the container's own namespace). If verification fails, fall back to public DNS (8.8.8.8/8.8.4.4).
Related
- bug: k3s fails to start in cluster container due to missing default route (flannel auto-detection) #125 (k3s fails to start due to missing default route / DNS proxy failures)
- k3s-io/k3s#4087 (coredns forward /etc/resolv.conf not working)