Skip to content

Replace DNS resolver with NM dnsmasq plugin + /etc/hosts fallback #187

@bussyjd

Description

@bussyjd

Problem

The current wildcard DNS resolver for *.obol.stack uses a complex stack of Docker dnsmasq + NetworkManager bridge + veth slave + systemd-resolved drop-in. This approach has several issues:

Bugs

  • Global DNSOverTLS downgrade (HIGH): The resolved drop-in sets DNSOverTLS=opportunistic system-wide, weakening ALL DNS security — not just obol.stack queries. Users with DNSOverTLS=yes (valid hardened config) get silently downgraded.
  • Fails on Ubuntu 20.04: nmcli type veth requires NetworkManager >= 1.30. Ubuntu 20.04 ships NM 1.22.
  • Fails on Ubuntu Server: NetworkManager is not installed by default.
  • Fails on Debian, openSUSE, RHEL: These distros don't use systemd-resolved by default.
  • apk add on every container start: The dnsmasq Docker container runs apk add --no-cache dnsmasq on every restart. If Alpine's CDN is unreachable, DNS breaks silently.
  • Startup race: dnsmasq may not be ready when resolved starts forwarding queries.

Over-engineering

  • Creates 3 persistent NM connections (bridge + veth + slave config) + 1 resolved drop-in + 1 Docker container — all for a single DNS entry.
  • The bridge+veth exists solely to give systemd-resolved an interface with active carrier and non-loopback IP so it activates DNS scope. This is a workaround for resolved's operstate requirements.

Compatibility Matrix (current)

Distro Works?
Ubuntu 22.04+ Desktop Yes
Ubuntu 20.04 Desktop No (NM 1.22)
Ubuntu Server (any) No (no NM)
Fedora 33+ Yes
Debian 12 Desktop No (no resolved)
Arch (NM + resolved) Yes
RHEL 9 No (no resolved)
openSUSE Desktop No (no resolved)

Proposed Solution: Tiered Approach

Tier 1 — NetworkManager systems (Linux Desktop)

Use NM's built-in dns=dnsmasq plugin. Two config files, zero Docker containers, zero bridges:

# /etc/NetworkManager/conf.d/obol-dns.conf
[main]
dns=dnsmasq
# /etc/NetworkManager/dnsmasq.d/obol-stack.conf
address=/obol.stack/127.0.0.1

This resolves *.obol.stack → 127.0.0.1 directly — no Docker container, no bridge, no veth, no resolved drop-in, no DNSOverTLS changes.

Detection flow:

  1. Check if nmcli is available
  2. Check current dns= mode in NM config
  3. If not already dns=dnsmasq, set it and drop in the obol-stack.conf
  4. Restart NetworkManager

Tier 2 — Non-NM systems (Server, minimal installs)

Managed /etc/hosts entries with markers:

127.0.0.1 obol.stack                        # obol-stack-managed
127.0.0.1 openclaw-default.obol.stack       # obol-stack-managed
127.0.0.1 ethereum-nervous-otter.obol.stack # obol-stack-managed

Added/removed programmatically during network install/delete and openclaw setup/teardown. No wildcard, but obol tracks all deployments.

Tier 3 — macOS (unchanged)

Keep /etc/resolver/obol.stack with nameserver 127.0.0.1 / port 5553.

Benefits

  • Eliminates Docker container, bridge, veth, resolved drop-in
  • No DNSOverTLS downgrade
  • Works on Ubuntu 20.04+, Fedora, Arch, Mint, Pop!_OS, Debian (desktop), RHEL 9
  • Universal fallback via /etc/hosts for any Linux system
  • NM's dnsmasq plugin is well-established and documented

Compatibility Matrix (proposed)

Distro Tier Works?
Ubuntu 20.04+ Desktop 1 (NM dnsmasq) Yes
Ubuntu Server 2 (/etc/hosts) Yes
Fedora 33+ 1 Yes
Debian 12 Desktop 1 Yes
Arch (NM) 1 Yes
RHEL 9 1 Yes
openSUSE Desktop 1 (if NM) Yes
Any Linux (no NM) 2 (/etc/hosts) Yes
macOS 3 (/etc/resolver) Yes

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions