Add egress monitoring to e2e tests#28409
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds egress monitoring for end-to-end tests: a DNS sidecar that logs outbound DNS queries, an EgressMonitor service that manages and reads those logs with cursor-based collection and allowlist filtering, Ghost/EnvironmentManager wiring to route container DNS through the monitor, and Playwright fixtures that record browser/server egress, attach JSON artifacts to test reports, and optionally enforce failures via environment toggles. Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
e2e/helpers/playwright/fixture.ts (1)
188-213:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInstall browser egress capture before the first navigation.
The listener is attached in the
pagefixture, but by thenpageWithAuthenticatedUserhas already runloginToGetAuthenticatedSession()orpage.goto('/ghost/#/'). Any external requests during login or the initial admin page load — including thechangelog.json/avatar-style calls this feature is meant to surface — are invisible. Move the collection to page/context creation time and carry the recorded data through teardown.Also applies to: 555-575, 579-605
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e/helpers/playwright/fixture.ts` around lines 188 - 213, The egress capture is being attached too late (after login/navigation) so requests during login or initial load are missed; in setupAuthenticatedPageFromStorageState (and similarly in the other blocks noted), initialize/install the browser egress capture immediately when creating the BrowserContext (right after browser.newContext) or before the first page.goto, attach listeners to the context/page there, and return or attach the recorded data (egress events) alongside the returned {page, context, ghostAccountOwner} so teardown can access and persist the captured requests; ensure the same change is applied to the page fixture and pageWithAuthenticatedUser/loginToGetAuthenticatedSession flows referenced in the review so collection begins at context creation time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@e2e/helpers/environment/dns-logger.cjs`:
- Around line 127-141: The TCP handler assumes a single data event contains a
full length-prefixed DNS frame; change the logic in the tcpServer connection
handler (currently using client.once('data') and immediate upstream.write) to
accumulate bytes from client.on('data') into a buffer, parse the 2-byte length
prefix to wait until the full frame is available (handle cases where multiple
frames arrive back-to-back or frames are split across chunks), call logQuery on
each complete message body (data.subarray(2)), and only then forward those exact
frames to the upstream (using upstream.write for each complete frame) and switch
to bidirectional proxying for the rest of the connection (e.g., piping or
symmetric data handlers) while preserving timeouts (FORWARD_TIMEOUT_MS) and
existing error/close handling for upstream and client; ensure UPSTREAM and
UPSTREAM_PORT are used as before.
In `@e2e/helpers/environment/service-managers/egress-monitor.ts`:
- Around line 72-76: start() currently returns as soon as Docker assigns an IP
(after getOrCreate() and resolveIp()), but the sidecar (dns-logger.cjs) may not
have bound its UDP/TCP listeners yet; modify start() to wait for the sidecar to
be ready before setting dnsServerIp and returning: after this.container is
available (from getOrCreate()) and serverIp is resolved (resolveIp()), poll the
container logs (this.container.logs or equivalent) for the sidecar's "listening"
events or perform a lightweight healthcheck (e.g., attempt a UDP/TCP probe to
serverIp:port) with a short timeout/retry loop, and only set/export dnsServerIp
and finish start() once the bind is observed; keep the existing error handling
and timeout to avoid hanging, and ensure GhostManager.setup() can rely on
start() completing only when the sidecar is actually bound.
- Around line 175-188: collectSince() stops as soon as one short poll shows no
growth instead of ensuring the log has been stable for the full settleMs; change
the loop to track the timestamp of the last time the log grew (e.g.,
lastGrowthTs), update lastGrowthTs whenever current.length > previous.length,
and only break and return once (Date.now() - lastGrowthTs) >= settleMs or the
deadline is reached; keep using the short polling interval (Math.min(settleMs,
150)) but compare stability against settleMs, and return
previous.slice(startCursor) once the stability window has passed.
In `@e2e/helpers/playwright/fixture.ts`:
- Around line 603-605: The EGRESS_MONITOR_ENABLED guard currently prevents
registering the Playwright browser request listener (page.on('request',
onRequest)), removing browser-initiated egress coverage; fix by decoupling
browser capture from the DNS sidecar toggle: either always register
page.on('request', onRequest) (so browser egress is always monitored) or
introduce a separate flag (e.g., BROWSER_EGRESS_MONITOR or
EGRESS_MONITOR_BROWSER) and use it in place of EGRESS_MONITOR_ENABLED; apply the
same change to the other similar Playwright listener registrations in this file
so browser monitoring is not accidentally disabled by the sidecar flag.
---
Outside diff comments:
In `@e2e/helpers/playwright/fixture.ts`:
- Around line 188-213: The egress capture is being attached too late (after
login/navigation) so requests during login or initial load are missed; in
setupAuthenticatedPageFromStorageState (and similarly in the other blocks
noted), initialize/install the browser egress capture immediately when creating
the BrowserContext (right after browser.newContext) or before the first
page.goto, attach listeners to the context/page there, and return or attach the
recorded data (egress events) alongside the returned {page, context,
ghostAccountOwner} so teardown can access and persist the captured requests;
ensure the same change is applied to the page fixture and
pageWithAuthenticatedUser/loginToGetAuthenticatedSession flows referenced in the
review so collection begins at context creation time.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7bb51890-7ec5-462e-8afa-8dd2c31f5aad
📒 Files selected for processing (7)
e2e/helpers/environment/constants.tse2e/helpers/environment/dns-logger.cjse2e/helpers/environment/environment-manager.tse2e/helpers/environment/service-managers/egress-monitor.tse2e/helpers/environment/service-managers/ghost-manager.tse2e/helpers/environment/service-managers/index.tse2e/helpers/playwright/fixture.ts
87444c3 to
2cd34db
Compare
The Playwright e2e suite boots Ghost with real network access (no nock), so outbound HTTP from the app — IndexNow, webmentions, geolocation, gravatar, update checks — can reach external services during tests, with no way to see it. Add two complementary monitors: - Server-side: a DNS-forwarding sidecar that Ghost's resolver is pointed at. Docker resolves internal service names itself and forwards only external lookups to the sidecar, so its log is a clean record of every external host the container reached, regardless of HTTP client and including arbitrary webmention targets pulled from post content. - Browser-side: a page request listener that flags external requests the Playwright browser makes itself (changelog, embeds, avatars). Both are recording-only by default and share an allowlist. Set E2E_EGRESS_ENFORCE=1 to fail tests on unexpected egress once the allowlist is built. The monitor is fail-open: if the sidecar cannot start, Ghost's DNS is left untouched and the suite runs normally. Each worker prints the external hosts it saw at teardown, to seed the allowlist from CI logs.
With enforcement on, the suite failed on every real external host. This allowlists the ones the suite legitimately contacts (Stripe.js/Elements, reCAPTCHA, Bunny Fonts, Transistor embeds, Unsplash, Ghost's CDN + changelog/update-check, the billing mock) and geojs geolocation, each with a comment explaining why it's allowed. Two are deliberately handled differently: - gravatar.com is gated off in e2e (privacy__useGravatar=false) rather than allowlisted — it's a real external call with no e2e coverage (gravatar is unit-tested in ghost/core), so gating removes the leak while keeping the feature. - api.stripe.com is intentionally left out. Ghost should hit the fake Stripe server; it only leaks because a test connects Stripe without `stripeEnabled`. The fix is that test, not an allowlist entry — hence the specific Stripe subdomains here rather than a blanket stripe.com.
The test connected Stripe with settingsService.setStripeConnected() but
didn't enable the fake Stripe server, so there was no STRIPE_API_HOST
override. Creating a paid tier then made Ghost call the real
api.stripe.com to create the Stripe product/price.
Use test.use({stripeEnabled: true}) instead — which connects Stripe and
points it at the fake server — matching the sibling members/tier tests.
The manual setStripeConnected() is now redundant.
2cd34db to
dadc3c0
Compare
Summary
The Playwright e2e suite boots Ghost with real network access (unlike the nock-isolated unit/integration suites), so outbound HTTP from the app — IndexNow, webmentions, member/staff geolocation, gravatar, update checks — can reach external services during tests. There was no visibility into this.
This adds two complementary monitors and enforces an allowlist:
Server-side — DNS sidecar
A tiny DNS-forwarding sidecar runs on the dev network; the Ghost container's resolver (
HostConfig.Dns) is pointed at it. Docker keeps the sidecar as the upstream of its embedded DNS, so internal service names (mysql, redis, …) still resolve normally and only external lookups are forwarded to the sidecar. Its log is therefore a clean record of every external host the container reached — independent of which HTTP client made the call, and including arbitrary hostnames (e.g. webmention targets pulled from post content). No Ghost code changes.Browser-side — page request listener
A
page.on('request')listener flags external requests the Playwright-driven browser makes itself (changelog.json, embeds, avatar services) — the layer the DNS sidecar can't see.Behaviour
EGRESS_ALLOWLIST(inhelpers/environment/constants.ts) fails — so a new integration, or a service that should be mocked, surfaces immediately.E2E_EGRESS_ENFORCE=0drops back to record-only (attach + annotate, no failure) — useful while expanding the allowlist.E2E_EGRESS_MONITOR=0disables all egress monitoring (both layers).Notably absent from the captured egress
api.indexnow.org(no test enables the flag + publishes), webmention targets, and dicebear don't fire in the current suite — so they're not allowlisted, and enforcement will catch them if they ever start.