Resolve URLs by using multiple services#542
Conversation
This PR has an intention of resolving URLs by using multiple endpoints that identify an IP address of the service. This is handy if one service is blocked for some reason. The detection mechanism follows this logic: 1. It tries to access all services in parallel 2. If service respond with some error (like, no route to host for IPv6), then we accurately collect those errors and return a merged one 3. In case of the first IP resolved, we immediately return it. Also, this PR refactors how access and SNI check are performed.
069fb08 to
2145159
Compare
|
CI is green across the board and this One behavioral regression I want to flag before this lands, in Both family goroutines now share a single ip, err = getIP(ctx, ntw, "tcp6")
if err != nil {
cancelCause(err)
}Consider a secret host with both A and AAAA records on a server with IPv4-only egress (a common setup). The tcp6 goroutine's
Both callers treat that error as fatal: On master this was graceful per-family degradation (the behavior #505/#528 settled on): I checked this against both branches with a small harness driving the real Minimal fix, your call on the shape: give each family its own derived context (so one family's Two smaller notes:
|
|
Thanks, makes sense. |
runSNICheck wired each family's getIP failure through a shared context.WithCancelCause, so a single family's detection failure (for example tcp6 on an IPv4-only-egress server) made the whole check return an error even when the other family was detected and matched. Both callers treat that error as fatal, so a server that is fine on IPv4 failed the SNI check outright -- the exact audience of #529. Mirror the graceful per-family handling access.go already uses: discard the per-family getIP error and report an undetectable family through an empty OurIP4/OurIP6, which both callers already surface via their "cannot detect public IP address" branch. The error return is now reserved for genuine DNS-resolution failure. Removing the shared cancel also makes the two families independent, so a fast-failing family can no longer abort the other family's in-flight detection. Add a regression test that drives the real runSNICheck over a loopback DNS fake and an IPv4-only-egress network fake.
|
Took a swing at the fix + a regression test for the IPv4-only-egress poisoning — opened #557 into |
dolonet
left a comment
There was a problem hiding this comment.
The multi-endpoint detection and the mtg access / getIP refactor read well — nice cleanup.
One thing to land before this merges: the per-family getIP failure in runSNICheck poisoning the whole result on IPv4-only-egress dual-stack hosts (details in the thread above, which you OK'd). I've put the fix + a regression test in #557 into this branch — merge that and this is good to go from my side.
Leaving this as a comment rather than an approval only because the branch still carries that regression; once #557 is in, consider it 👍.
Fix SNI check failing when one IP family is undetectable
dolonet
left a comment
There was a problem hiding this comment.
Re-reviewed after #557 landed — the runSNICheck per-family poisoning is fixed and TestRunSNICheckIPv4OnlyEgressGraceful covers the IPv4-only-egress case. Flipping to approve as promised.
The earlier red Test was the httpbin /headers flake (only TestProxy/TestHTTPSRequest + TestNetwork/TestRealHTTPRequest; all unit tests pass) — I re-ran the job and it's green now. Good to merge whenever you're ready; this closes #529.
This PR has an intention of resolving URLs by using multiple endpoints that identify an IP address of the service. This is handy if one service is blocked for some reason.
The detection mechanism follows this logic:
Also, this PR refactors how access and SNI check are performed.
Closes #529