Skip to content

Resolve URLs by using multiple services#542

Merged
9seconds merged 3 commits into
masterfrom
multiple-ip-detectors
Jun 9, 2026
Merged

Resolve URLs by using multiple services#542
9seconds merged 3 commits into
masterfrom
multiple-ip-detectors

Conversation

@9seconds

Copy link
Copy Markdown
Owner

This PR has an intention of resolving URLs by using multiple endpoints that identify an IP address of the service. This is handy if one service is blocked for some reason.

The detection mechanism follows this logic:

  1. It tries to access all services in parallel
  2. If service respond with some error (like, no route to host for IPv6), then we accurately collect those errors and return a merged one
  3. In case of the first IP resolved, we immediately return it.

Also, this PR refactors how access and SNI check are performed.

Closes #529

This PR has an intention of resolving URLs by using multiple endpoints
that identify an IP address of the service. This is handy if one service
is blocked for some reason.

The detection mechanism follows this logic:

1. It tries to access all services in parallel
2. If service respond with some error (like, no route to host for IPv6),
   then we accurately collect those errors and return a merged one
3. In case of the first IP resolved, we immediately return it.

Also, this PR refactors how access and SNI check are performed.
@9seconds 9seconds requested a review from dolonet May 27, 2026 14:13
@9seconds 9seconds force-pushed the multiple-ip-detectors branch 2 times, most recently from 069fb08 to 2145159 Compare May 27, 2026 14:18
@dolonet

dolonet commented May 29, 2026

Copy link
Copy Markdown
Collaborator

CI is green across the board and this Closes #529 as intended — multiple detection endpoints with merged-error collection is exactly what that issue asked for. The mtg access and getIP parts read well.

One behavioral regression I want to flag before this lands, in runSNICheck (internal/cli/sni_check.go), because it bites precisely the users #529 targets — those who never set public-ipv4/public-ipv6.

Both family goroutines now share a single ctx from context.WithCancelCause (sni_check.go:28), and a per-family detection failure calls cancelCause(err) on it (:58 for tcp4, :76 for tcp6):

ip, err = getIP(ctx, ntw, "tcp6")
if err != nil {
    cancelCause(err)
}

Consider a secret host with both A and AAAA records on a server with IPv4-only egress (a common setup). The tcp6 goroutine's getIP fails and calls cancelCause(err). The load-bearing consequence is unconditional:

wg.Wait(); return res, context.Cause(ctx) (:88) then returns a non-nil error regardless of whether the tcp4 detection succeeded — any single family's failure poisons the whole result. (Secondarily, since the ctx is shared, the cancel can also abort the still-in-flight tcp4 HTTP request — but the error return alone is enough.)

Both callers treat that error as fatal: checkSecretHost (doctor.go:396) prints cannot resolve DNS name of <host> and returns false, and warnSNIMismatch (run_proxy.go:221) logs SNI-DNS check: cannot resolve secret hostname and skips the check. So a server that is actually fine on IPv4 now fails the SNI check outright.

On master this was graceful per-family degradation (the behavior #505/#528 settled on): getIP returned nil rather than erroring, an undetected family was simply non-blocking — warnSNIMismatch had v6Match := res.OurIPv6 == nil || res.IPv6Match, and checkSecretHost keyed off IPv4Match || IPv6Match. This PR turns "one family undetectable" into a hard failure.

I checked this against both branches with a small harness driving the real runSNICheck (faked resolver + Network): for a dual-stack host whose A record matches the server's public IPv4 on IPv4-only egress, master returns a clean ✅ match while this branch returns the hard cannot resolve DNS name failure. It also fires when only public-ipv4 is set (the unset v6 family still trips cancelCause), so it is not limited to the no-config case.

Minimal fix, your call on the shape: give each family its own derived context (so one family's getIP failure can't cancel the other), or simply don't let a per-family getIP error call cancelCause on the shared ctx — and only return an error when both families fail to produce an IP. The OurIP4 == "" && OurIP6 == "" condition the callers already have is the natural place to land that.

Two smaller notes:

  • The doctor message is misleading on this path. When DNS resolved fine and only public-IP detection failed, it still prints cannot resolve DNS name of <host>. Even with the fix above, "cannot detect public IP" would be the accurate framing there.

  • Non-blocking nit, your design call: the new per-family tplEDNSSNIMatch ({{ .ip }}) drops the combined IPv4+IPv6 context that doctor: surface both public IPs in SNI-DNS mismatch message #505 added (... public IP is <v4> (IPv4) / <v6> (IPv6) ...). Reasonable to drop if you prefer one line per family — just flagging it's an intentional-looking output change.

@9seconds

9seconds commented Jun 3, 2026

Copy link
Copy Markdown
Owner Author

Thanks, makes sense.

runSNICheck wired each family's getIP failure through a shared
context.WithCancelCause, so a single family's detection failure (for
example tcp6 on an IPv4-only-egress server) made the whole check return
an error even when the other family was detected and matched. Both
callers treat that error as fatal, so a server that is fine on IPv4
failed the SNI check outright -- the exact audience of #529.

Mirror the graceful per-family handling access.go already uses: discard
the per-family getIP error and report an undetectable family through an
empty OurIP4/OurIP6, which both callers already surface via their
"cannot detect public IP address" branch. The error return is now
reserved for genuine DNS-resolution failure. Removing the shared cancel
also makes the two families independent, so a fast-failing family can no
longer abort the other family's in-flight detection.

Add a regression test that drives the real runSNICheck over a loopback
DNS fake and an IPv4-only-egress network fake.
@dolonet

dolonet commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Took a swing at the fix + a regression test for the IPv4-only-egress poisoning — opened #557 into multiple-ip-detectors so CI runs on it. Your call on the shape; merge, rework, or close as you like.

@dolonet dolonet left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multi-endpoint detection and the mtg access / getIP refactor read well — nice cleanup.

One thing to land before this merges: the per-family getIP failure in runSNICheck poisoning the whole result on IPv4-only-egress dual-stack hosts (details in the thread above, which you OK'd). I've put the fix + a regression test in #557 into this branch — merge that and this is good to go from my side.

Leaving this as a comment rather than an approval only because the branch still carries that regression; once #557 is in, consider it 👍.

@dolonet dolonet left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formalizing the verdict so this stops sitting in your review queue: one blocker — the runSNICheck regression we discussed above — with the fix + a regression test ready in #557 against this branch. Merge #557 and I'll flip this to approve; everything else here looks good.

Fix SNI check failing when one IP family is undetectable

@dolonet dolonet left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after #557 landed — the runSNICheck per-family poisoning is fixed and TestRunSNICheckIPv4OnlyEgressGraceful covers the IPv4-only-egress case. Flipping to approve as promised.

The earlier red Test was the httpbin /headers flake (only TestProxy/TestHTTPSRequest + TestNetwork/TestRealHTTPRequest; all unit tests pass) — I re-ran the job and it's green now. Good to merge whenever you're ready; this closes #529.

@9seconds 9seconds merged commit d095108 into master Jun 9, 2026
18 of 20 checks passed
@9seconds 9seconds deleted the multiple-ip-detectors branch June 9, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Public IP detection depends on a single hardcoded endpoint (ifconfig.co)

2 participants