Skip to content

Add egress monitoring to e2e tests#28409

Open
ErisDS wants to merge 3 commits into
mainfrom
e2e-egress-monitor
Open

Add egress monitoring to e2e tests#28409
ErisDS wants to merge 3 commits into
mainfrom
e2e-egress-monitor

Conversation

@ErisDS

@ErisDS ErisDS commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary

The Playwright e2e suite boots Ghost with real network access (unlike the nock-isolated unit/integration suites), so outbound HTTP from the app — IndexNow, webmentions, member/staff geolocation, gravatar, update checks — can reach external services during tests. There was no visibility into this.

This adds two complementary monitors and enforces an allowlist:

Server-side — DNS sidecar

A tiny DNS-forwarding sidecar runs on the dev network; the Ghost container's resolver (HostConfig.Dns) is pointed at it. Docker keeps the sidecar as the upstream of its embedded DNS, so internal service names (mysql, redis, …) still resolve normally and only external lookups are forwarded to the sidecar. Its log is therefore a clean record of every external host the container reached — independent of which HTTP client made the call, and including arbitrary hostnames (e.g. webmention targets pulled from post content). No Ghost code changes.

Browser-side — page request listener

A page.on('request') listener flags external requests the Playwright-driven browser makes itself (changelog.json, embeds, avatar services) — the layer the DNS sidecar can't see.

Behaviour

  • Enforced by default. A test that triggers egress to a host not on EGRESS_ALLOWLIST (in helpers/environment/constants.ts) fails — so a new integration, or a service that should be mocked, surfaces immediately.
  • The allowlist covers the external services the suite legitimately reaches today (Stripe/reCAPTCHA, Bunny Fonts, Transistor embeds, Unsplash, Ghost's own CDN, the billing mock) plus two privacy-sensitive server-side lookups (gravatar, geojs) that are candidates for a config gate in a follow-up.
  • E2E_EGRESS_ENFORCE=0 drops back to record-only (attach + annotate, no failure) — useful while expanding the allowlist.
  • E2E_EGRESS_MONITOR=0 disables all egress monitoring (both layers).
  • Fail-open: if the sidecar can't start, Ghost's DNS is left untouched and the suite runs normally.
  • Each worker prints the external hosts it saw at teardown, so additions to the allowlist can be read straight out of CI logs.

Notably absent from the captured egress

api.indexnow.org (no test enables the flag + publishes), webmention targets, and dicebear don't fire in the current suite — so they're not allowlisted, and enforcement will catch them if they ever start.

@ErisDS ErisDS requested a review from 9larsons as a code owner June 8, 2026 11:26
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds egress monitoring for end-to-end tests: a DNS sidecar that logs outbound DNS queries, an EgressMonitor service that manages and reads those logs with cursor-based collection and allowlist filtering, Ghost/EnvironmentManager wiring to route container DNS through the monitor, and Playwright fixtures that record browser/server egress, attach JSON artifacts to test reports, and optionally enforce failures via environment toggles.

Suggested reviewers

  • 9larsons
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add egress monitoring to e2e tests' accurately and clearly describes the main change: introducing egress monitoring infrastructure to the e2e test suite.
Description check ✅ Passed The description comprehensively explains the motivation, implementation details, behavior, and controls for the egress monitoring feature, directly relating to all file changes in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-egress-monitor

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
e2e/helpers/playwright/fixture.ts (1)

188-213: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Install browser egress capture before the first navigation.

The listener is attached in the page fixture, but by then pageWithAuthenticatedUser has already run loginToGetAuthenticatedSession() or page.goto('/ghost/#/'). Any external requests during login or the initial admin page load — including the changelog.json/avatar-style calls this feature is meant to surface — are invisible. Move the collection to page/context creation time and carry the recorded data through teardown.

Also applies to: 555-575, 579-605

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/helpers/playwright/fixture.ts` around lines 188 - 213, The egress capture
is being attached too late (after login/navigation) so requests during login or
initial load are missed; in setupAuthenticatedPageFromStorageState (and
similarly in the other blocks noted), initialize/install the browser egress
capture immediately when creating the BrowserContext (right after
browser.newContext) or before the first page.goto, attach listeners to the
context/page there, and return or attach the recorded data (egress events)
alongside the returned {page, context, ghostAccountOwner} so teardown can access
and persist the captured requests; ensure the same change is applied to the page
fixture and pageWithAuthenticatedUser/loginToGetAuthenticatedSession flows
referenced in the review so collection begins at context creation time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e/helpers/environment/dns-logger.cjs`:
- Around line 127-141: The TCP handler assumes a single data event contains a
full length-prefixed DNS frame; change the logic in the tcpServer connection
handler (currently using client.once('data') and immediate upstream.write) to
accumulate bytes from client.on('data') into a buffer, parse the 2-byte length
prefix to wait until the full frame is available (handle cases where multiple
frames arrive back-to-back or frames are split across chunks), call logQuery on
each complete message body (data.subarray(2)), and only then forward those exact
frames to the upstream (using upstream.write for each complete frame) and switch
to bidirectional proxying for the rest of the connection (e.g., piping or
symmetric data handlers) while preserving timeouts (FORWARD_TIMEOUT_MS) and
existing error/close handling for upstream and client; ensure UPSTREAM and
UPSTREAM_PORT are used as before.

In `@e2e/helpers/environment/service-managers/egress-monitor.ts`:
- Around line 72-76: start() currently returns as soon as Docker assigns an IP
(after getOrCreate() and resolveIp()), but the sidecar (dns-logger.cjs) may not
have bound its UDP/TCP listeners yet; modify start() to wait for the sidecar to
be ready before setting dnsServerIp and returning: after this.container is
available (from getOrCreate()) and serverIp is resolved (resolveIp()), poll the
container logs (this.container.logs or equivalent) for the sidecar's "listening"
events or perform a lightweight healthcheck (e.g., attempt a UDP/TCP probe to
serverIp:port) with a short timeout/retry loop, and only set/export dnsServerIp
and finish start() once the bind is observed; keep the existing error handling
and timeout to avoid hanging, and ensure GhostManager.setup() can rely on
start() completing only when the sidecar is actually bound.
- Around line 175-188: collectSince() stops as soon as one short poll shows no
growth instead of ensuring the log has been stable for the full settleMs; change
the loop to track the timestamp of the last time the log grew (e.g.,
lastGrowthTs), update lastGrowthTs whenever current.length > previous.length,
and only break and return once (Date.now() - lastGrowthTs) >= settleMs or the
deadline is reached; keep using the short polling interval (Math.min(settleMs,
150)) but compare stability against settleMs, and return
previous.slice(startCursor) once the stability window has passed.

In `@e2e/helpers/playwright/fixture.ts`:
- Around line 603-605: The EGRESS_MONITOR_ENABLED guard currently prevents
registering the Playwright browser request listener (page.on('request',
onRequest)), removing browser-initiated egress coverage; fix by decoupling
browser capture from the DNS sidecar toggle: either always register
page.on('request', onRequest) (so browser egress is always monitored) or
introduce a separate flag (e.g., BROWSER_EGRESS_MONITOR or
EGRESS_MONITOR_BROWSER) and use it in place of EGRESS_MONITOR_ENABLED; apply the
same change to the other similar Playwright listener registrations in this file
so browser monitoring is not accidentally disabled by the sidecar flag.

---

Outside diff comments:
In `@e2e/helpers/playwright/fixture.ts`:
- Around line 188-213: The egress capture is being attached too late (after
login/navigation) so requests during login or initial load are missed; in
setupAuthenticatedPageFromStorageState (and similarly in the other blocks
noted), initialize/install the browser egress capture immediately when creating
the BrowserContext (right after browser.newContext) or before the first
page.goto, attach listeners to the context/page there, and return or attach the
recorded data (egress events) alongside the returned {page, context,
ghostAccountOwner} so teardown can access and persist the captured requests;
ensure the same change is applied to the page fixture and
pageWithAuthenticatedUser/loginToGetAuthenticatedSession flows referenced in the
review so collection begins at context creation time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7bb51890-7ec5-462e-8afa-8dd2c31f5aad

📥 Commits

Reviewing files that changed from the base of the PR and between fb010cc and 5f9c64a.

📒 Files selected for processing (7)
  • e2e/helpers/environment/constants.ts
  • e2e/helpers/environment/dns-logger.cjs
  • e2e/helpers/environment/environment-manager.ts
  • e2e/helpers/environment/service-managers/egress-monitor.ts
  • e2e/helpers/environment/service-managers/ghost-manager.ts
  • e2e/helpers/environment/service-managers/index.ts
  • e2e/helpers/playwright/fixture.ts

Comment thread e2e/helpers/environment/dns-logger.cjs Outdated
Comment thread e2e/helpers/environment/service-managers/egress-monitor.ts
Comment thread e2e/helpers/environment/service-managers/egress-monitor.ts
Comment thread e2e/helpers/playwright/fixture.ts
@ErisDS ErisDS force-pushed the e2e-egress-monitor branch 7 times, most recently from 87444c3 to 2cd34db Compare June 8, 2026 15:24
ErisDS added 3 commits June 10, 2026 10:37
The Playwright e2e suite boots Ghost with real network access (no
nock), so outbound HTTP from the app — IndexNow, webmentions,
geolocation, gravatar, update checks — can reach external services
during tests, with no way to see it.

Add two complementary monitors:

- Server-side: a DNS-forwarding sidecar that Ghost's resolver is
  pointed at. Docker resolves internal service names itself and
  forwards only external lookups to the sidecar, so its log is a
  clean record of every external host the container reached,
  regardless of HTTP client and including arbitrary webmention
  targets pulled from post content.

- Browser-side: a page request listener that flags external requests
  the Playwright browser makes itself (changelog, embeds, avatars).

Both are recording-only by default and share an allowlist. Set
E2E_EGRESS_ENFORCE=1 to fail tests on unexpected egress once the
allowlist is built. The monitor is fail-open: if the sidecar cannot
start, Ghost's DNS is left untouched and the suite runs normally.

Each worker prints the external hosts it saw at teardown, to seed the
allowlist from CI logs.
With enforcement on, the suite failed on every real external host. This
allowlists the ones the suite legitimately contacts (Stripe.js/Elements,
reCAPTCHA, Bunny Fonts, Transistor embeds, Unsplash, Ghost's CDN +
changelog/update-check, the billing mock) and geojs geolocation, each
with a comment explaining why it's allowed.

Two are deliberately handled differently:
- gravatar.com is gated off in e2e (privacy__useGravatar=false) rather
  than allowlisted — it's a real external call with no e2e coverage
  (gravatar is unit-tested in ghost/core), so gating removes the leak
  while keeping the feature.
- api.stripe.com is intentionally left out. Ghost should hit the fake
  Stripe server; it only leaks because a test connects Stripe without
  `stripeEnabled`. The fix is that test, not an allowlist entry — hence
  the specific Stripe subdomains here rather than a blanket stripe.com.
The test connected Stripe with settingsService.setStripeConnected() but
didn't enable the fake Stripe server, so there was no STRIPE_API_HOST
override. Creating a paid tier then made Ghost call the real
api.stripe.com to create the Stripe product/price.

Use test.use({stripeEnabled: true}) instead — which connects Stripe and
points it at the fake server — matching the sibling members/tier tests.
The manual setStripeConnected() is now redundant.
@ErisDS ErisDS force-pushed the e2e-egress-monitor branch from 2cd34db to dadc3c0 Compare June 10, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant