Skip to content

Fix SNI check failing when one IP family is undetectable#557

Merged
9seconds merged 1 commit into
multiple-ip-detectorsfrom
sni-graceful-degradation
Jun 5, 2026
Merged

Fix SNI check failing when one IP family is undetectable#557
9seconds merged 1 commit into
multiple-ip-detectorsfrom
sni-graceful-degradation

Conversation

@dolonet

@dolonet dolonet commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Follow-up to the review on #542 — the per-family error-poisoning in runSNICheck you OK'd fixing. Opening it as a PR into the branch so CI runs on it; merge, rework, or close as you see fit — your call on the shape, the regression test is the part that matters and it passes against any correct fix.

The bug

On a dual-stack secret host (A + AAAA) running on an IPv4-only-egress server, the tcp6 getIP fails and calls cancelCause(err) on the shared context, so runSNICheck returns a non-nil error even though tcp4 detection succeeded and matched. Both callers treat that as fatal — a server that is actually fine on IPv4 fails the SNI check outright (the exact audience of #529).

The fix

Mirror the graceful per-family handling access.go already uses (ip, _ = getIP(...)): discard the per-family getIP error and report an undetectable family through an empty OurIP4/OurIP6. Both callers already surface that via their existing cannot detect public IP address branch, so:

  • the error return is now reserved for genuine DNS-resolution failure (the callers' err != nilcannot resolve DNS name branch stays accurate);
  • the both-families-undetectable case now hits cannot detect public IP address instead of misreporting a DNS failure (the misleading-message point from the review, fixed for free);
  • removing the shared cancel also makes the two families independent — a fast-failing family can no longer abort the other family's in-flight detection.

One deliberate trade-off: the per-family getIP error is dropped, so on a both-families-fail server the user no longer sees the per-service dial errors, just cannot detect public IP address. That matches access.go's existing behavior and the merged 4-service error string was noisy, but flagging it in case you'd rather log it at debug.

Test

internal/cli/sni_check_test.go drives the real runSNICheck/getIP over a loopback UDP DNS fake (dual-stack) and an IPv4-only-egress mtglib.Network fake — no network, no new deps (x/net/dns/dnsmessage is already direct). It fails on the branch as-is (cannot resolve tcp6 address …) and passes with the fix. It's the first *_test.go in internal/cli and uses plain func Test… + require (matching the dcprobe/network style); happy to convert to a testify suite if you prefer.

Local checks green: go test -race, go vet, go build ./..., gofmt/gofumpt, golangci-lint (repo config).

runSNICheck wired each family's getIP failure through a shared
context.WithCancelCause, so a single family's detection failure (for
example tcp6 on an IPv4-only-egress server) made the whole check return
an error even when the other family was detected and matched. Both
callers treat that error as fatal, so a server that is fine on IPv4
failed the SNI check outright -- the exact audience of #529.

Mirror the graceful per-family handling access.go already uses: discard
the per-family getIP error and report an undetectable family through an
empty OurIP4/OurIP6, which both callers already surface via their
"cannot detect public IP address" branch. The error return is now
reserved for genuine DNS-resolution failure. Removing the shared cancel
also makes the two families independent, so a fast-failing family can no
longer abort the other family's in-flight detection.

Add a regression test that drives the real runSNICheck over a loopback
DNS fake and an IPv4-only-egress network fake.
@dolonet

dolonet commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

CI note: the two red checks are pre-existing / environmental, not from this change —

  • Test vulnerabilities: GO-2026-5039 (net/textproto) + GO-2026-5037 (crypto/x509), both Go stdlib, fixed in go1.26.4 (CI runs go1.26.3). Repo-wide — hits master too.
  • Test: only network/TestRealHTTPRequest failed, and it's a live GET https://httpbin.org/headers that timed out. internal/cli (this PR's package, including the new test) passed.

Locally on the touched package: go test -race ./internal/cli/, golangci-lint (repo config), go build ./... all green.

@9seconds

9seconds commented Jun 5, 2026

Copy link
Copy Markdown
Owner

I'll merge them now. Will make vulnerabilities verification optional in a separate PR

@9seconds 9seconds merged commit feb5a0a into multiple-ip-detectors Jun 5, 2026
8 of 14 checks passed
@9seconds 9seconds deleted the sni-graceful-degradation branch June 5, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants