autobrowse: optional vendor-neutral inbox-provider hook by aq17 · Pull Request #119 · browserbase/skills

aq17 · 2026-05-26T21:29:17Z

Merge order: this lands first, then browserbase/browse.sh#159 + a BB_SKILLS_SHA bump.

Summary

Adds a generic, vendor-neutral, off-by-default inbox-provider hook to autobrowse so it can build signup/login/MFA skills — without shipping any email-vendor code (AgentMail lives only in the internal browse.sh repo). This is the public half.

skills/autobrowse/scripts/evaluate.mjs:

--inbox-cmd <path> / AUTOBROWSE_INBOX_CMD configures an optional external "inbox provider" command. When unset there is no inbox feature (default). The file documents the provider contract (create / wait-otp / wait-link / latest / release + the .inbox.json { email, inbox_id } schema).
The inner agent may only call read subcommands (wait-otp/wait-link/latest); create/release are orchestrator-only.
forceInboxScope pins the provider to the run's own --workspace/--task (sibling-task isolation in parallel runs).
Address resolves from .inbox.json (email preferred), with --inbox-email as a fallback; {{inbox_email}} in task.md is substituted.
Vendor-neutral throughout — git grep -i agentmail is empty.

No inbox.mjs and no SKILL.md changes — the feature is intentionally undocumented/experimental for now (the flag is in --help + the contract comment).

Test plan

Stub provider: hook injects the Agent-Inbox section, substitutes the address, scope-forces --workspace/--task; with no --inbox-cmd there's no section and the allowlist is browse-only.
release/create BLOCKED for the inner agent; wait-otp/wait-link/latest allowed.
readInboxState resolves to email when email≠inbox_id.
Full browse.sh sandbox pipeline (with #159): real Substack signup created + read its inbox and received the verification email.

🤖 Generated with Claude Code

Lets an autobrowse loop provision a throwaway inbox so the inner agent can register accounts and complete email verification. A new scripts/inbox.mjs CLI (create / wait-otp / wait-link / latest / release) talks to the browse.sh inbox endpoint, which owns the AgentMail key — the agent only ever sees the address. evaluate.mjs gains --inbox-email, injects the inbox into the system prompt, and allows the agent to shell out to inbox.mjs. SKILL.md documents the opt-in provision/release steps, graduation note (inbox is loop-only), and the 3-concurrent-loop free-tier cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Consolidates all inbox-provisioning logic into the autobrowse skill so the feature is self-contained with nothing browse.sh-specific. inbox.mjs now calls api.agentmail.to directly using AGENTMAIL_API_KEY from the env (sweep-on-create and the ab- prefix guard move into the CLI). Browserbase deployments inject a pooled key; regular users provide their own (free at agentmail.to) and get a clear setup error if it's unset. The inner agent still only ever sees the inbox address — the key is read by inbox.mjs and never printed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Hardening found by a live Substack magic-link signup run end-to-end: - wait-link returned an open-tracking pixel (.gif) because it grabbed the first URL anywhere in the body. Now extract <a href> anchors with a reject-list (unsubscribe/mailto/tel/preferences/.gif), which skips img-src pixels; --match matches the href OR the visible link text so "confirm"/"sign in" finds the CTA even when the href is a tracking redirect (browse open follows it). - latest only showed list-summary metadata (the list endpoint omits the body). It now fetches the full single message by id so text/html/links are visible. - partsOf prefers AgentMail's cleaned extracted_text/extracted_html. - evaluate.mjs killed wait-otp/wait-link at the fixed 30s exec cap (ETIMEDOUT on --within 60/90). exec timeout for inbox wait commands is now --within + 15s. Verified end-to-end: signup → wait-link returns the real "Confirm your email" CTA → browse open → signed-in Substack home. Sweep still proven to never touch non-ab- inboxes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… truth) - create now releases the inbox the task already tracks before minting a new one — a re-create within the 1h sweep window otherwise orphaned a live inbox (leaked AND unreachable by release). (#2) - evaluate.mjs resolves the inbox address from .inbox.json (what wait-otp/ wait-link actually poll); --inbox-email is a fallback and a mismatch now warns instead of silently polling a different inbox. (#4) - {{inbox_email}} in task.md is now substituted with the resolved address. (#3) - executeCommand pins inbox.mjs to the run's own --workspace/--task, so a sub-agent can't read or release a sibling task's inbox (parallel runs share a workspace, isolated only by --task). (#5) The 30s exec-timeout issue (#1) was already fixed by execTimeoutFor in 2d091fc. Verified: re-create deletes the prior inbox (no orphan); a divergent --inbox-email warns and the resolved address wins; {{inbox_email}} is replaced; an agent passing a foreign --task is overridden back to its own. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aq17 · 2026-06-01T21:03:18Z

Addressed the Bugbot findings in 47f740f (lifecycle + single-source-of-truth) — #1 was already handled in 2d091fc:

Bugbot issue	Resolution
Inbox wait killed by timeout	Already fixed in `2d091fc` — `execTimeoutFor` gives `wait-otp`/`wait-link` `--within + 15s` instead of the fixed 30s cap.
Recreate leaves orphan inboxes	`cmdCreate` now DELETEs the inbox the task already tracks before minting a new one (sweep can't catch a <1h-old inbox, and overwriting `.inbox.json` would orphan it).
Task placeholder never substituted	`evaluate.mjs` now substitutes `{{inbox_email}}` (optional inner whitespace) in `task.md` with the resolved address.
Prompt email ignores inbox state	`.inbox.json` is now the single source of truth (it's what `wait-*` poll). `--inbox-email` is a fallback; a mismatch logs a `WARNING` and the `.inbox.json` value wins.
Inbox CLI args unsandboxed	`executeCommand` pins `inbox.mjs` to the run's own `--workspace`/`--task`, stripping any agent-supplied values — a sub-agent can no longer read/release a sibling task's inbox.

Verified each end-to-end against a live AgentMail org: re-create deletes the prior inbox (no orphan); a divergent --inbox-email warns and the resolved address wins; {{inbox_email}} is replaced (no literal/flag leak); an agent passing a foreign --task is overridden back to its own ([scope] ... overridden).

aq17 · 2026-06-01T22:08:35Z

Validation summary — ready for review

Tested at HEAD 47f740f:

Standalone loop (twice): full Substack magic-link signup → wait-link returned the real CTA → browse open → signed-in state confirmed ({"success": true, "signed_in": true} + screenshot). 12 turns / ~$0.63.
Bugbot fixes (each verified live against AgentMail):
- re-create releases the prior inbox (no orphan)
- {{inbox_email}} substituted in task.md (no literal/flag leak)
- .inbox.json is the single source of truth; divergent --inbox-email warns and the resolved address wins
- executeCommand pins inbox.mjs to the run's own --workspace/--task (a foreign --task is overridden)
- the 30s exec-timeout issue was already handled by execTimeoutFor
Full browse.sh sandbox pipeline (with browserbase/browse.sh#159): a real Vercel Sandbox cloned bb-skills @ this commit, ran inbox.mjs create, and Substack delivered the verification email to the minted inbox — proving the feature works in the actual generation pipeline, not just the standalone harness.
Secret hygiene: the AgentMail key never appears in any trace artifact.

No leaked inboxes after any run. Ready to merge.

shubh24 · 2026-06-02T00:49:47Z

Reviewed this — the core idea is solid and the security spine is genuinely well done: the AgentMail key never reaches the inner agent, only the throwaway address does. Nice.

Three things worth tightening before merge. Framing them simply:

1. Leftover inboxes pile up (the big one).
Releasing the inbox is currently a note in SKILL.md telling the orchestrator to run inbox.mjs release — not something the code guarantees. Robots (LLMs) routinely skip trailing cleanup steps. That's literally the lesson #123 just learned for browser sessions: it moved teardown into code for exactly this reason. On any non-happy exit (inner-loop error, max iterations, a crash, Ctrl-C) the inbox is never deleted. The free tier caps at 3 inboxes, so a few forgotten ones and the next create hard-fails. The 1h sweep only helps if created_at parses (see below), so it's not a reliable backstop.
→ Release the run's own inbox in code on both the success and error paths, right alongside the session teardown — don't rely on the agent remembering.

2. It can grab the wrong verification code.
DEFAULT_OTP_RE = \b\d{4,8}\b returns the first 4–8 digit run anywhere in the email, in document order. Verification emails are full of other numbers — a year (2026), a price, a zip — and any of those can come before the real code and get returned instead. The agent is told this "prints just the extracted code," so a wrong number flows straight into the form and the run fails confusingly.
→ Prefer a code that sits next to a keyword, e.g. /(?:code|otp|verification|passcode)\D{0,20}(\d{4,8})/i (capture group 1), falling back to the bare regex.

3. It'll open links sent by strangers.
A throwaway inbox can receive mail from anyone who learns the address, and the prompt steers the agent to browse open whatever wait-link returns. So an attacker-delivered link gets auto-opened — including internal hosts (http://169.254.169.254/, http://localhost), open-redirects, or phishing. REJECT_LINK_RE only filters unsubscribe/tracking/gif, and plain http:// to internal IPs passes. Related: --from is a substring match, so --from stripe.com also matches stripe.com.evil.com.
→ Restrict to https, reject RFC1918/loopback/link-local hosts, drop the bare-text-URL fallback, and make --from an exact domain-boundary match.

Everything else I found is low/nit (a possibly-dead --inbox-email flag, stripHtml/stripTags duplication, a --within parsed in two places). Happy to expand on any of these.

Removes AgentMail from the public skill entirely and replaces the bundled inbox.mjs with a generic, off-by-default provider contract. autobrowse no longer ships an email provider or names any vendor; it only knows how to *call* one. - evaluate.mjs: `--inbox-cmd <path>` / AUTOBROWSE_INBOX_CMD configures an optional inbox-provider command. Allowlist, exec-timeout, force-scope, and the (now vendor-neutral) Agent Inbox prompt key off it; all are inert when unset. Documents the provider contract (create/wait-otp/wait-link/latest/release + the .inbox.json {email,inbox_id} schema) as the explicit boundary. - Deleted scripts/inbox.mjs (AgentMail-specific — moves to the internal caller). - Scrubbed AGENTMAIL_API_KEY/agentmail.to from .env.example, SKILL.md (silent on the feature), and example-task.md. Kept generic mechanics: .inbox.json single-source-of-truth, {{inbox_email}} substitution, --workspace/--task force-scoping, wait-command exec timeout. Verified: with a throwaway stub provider the hook injects the section, substitutes the address, and forces scope; with no --inbox-cmd there is no inbox section and the allowlist is browse-only. `git grep -i agentmail` → no matches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 0593523. Configure here.}

aq17 · 2026-06-02T01:18:42Z

Reworked per team feedback — AgentMail is now fully out of this public repo

Keeping AgentMail browse.sh-internal (skills is public), but without forking autobrowse. The inbox capability is now a generic, off-by-default provider hook; the AgentMail implementation + secrets live only in the internal browse.sh repo and are injected into the sandbox at runtime.

This PR (public) now:

Deletes scripts/inbox.mjs (AgentMail-specific).
evaluate.mjs gains --inbox-cmd <path> / AUTOBROWSE_INBOX_CMD — an optional, vendor-neutral inbox-provider command. Allowlist, exec-timeout, force-scope, and the (vendor-neutral) Agent Inbox prompt all key off it and are inert when unset. Documents the explicit provider contract.
Scrubs AgentMail from .env.example, SKILL.md (silent on the feature), example-task.md. git grep -i agentmail → no matches.
Keeps generic mechanics: .inbox.json SSOT, {{inbox_email}} substitution, --workspace/--task force-scope, wait-command timeout.

Verified: with a throwaway stub provider the hook injects the section + substitutes the address + forces scope; with no --inbox-cmd there's no inbox section and the allowlist is browse-only; the browse.sh provider (separate repo) drives create/release against real AgentMail through --inbox-cmd.

Pairs with the internal browse.sh PR (provider injection). Divergence stays minimal: one shared autobrowse core; browse.sh owns only a swappable provider script + a few prompt lines.

aq17 · 2026-06-02T01:28:20Z

✅ Re-validated end-to-end on the reworked architecture (full browse.sh sandbox pipeline, local): /api/skills/generate → sandbox cloned the public skill @ 0593523 (no AgentMail, --inbox-cmd hook only) → browse.sh injected /vercel/sandbox/inbox-provider.mjs and passed --inbox-cmd → the injected provider minted ab-…@agentmail.to via the edge-injected key → Substack delivered "Create your account on Substack" to it. Every new seam exercised with real external email; the dotenv-in-sandbox bug was caught and fixed. Inbox released after; no leaks.

shubh24 · 2026-06-02T02:14:15Z

Reviewed this alongside the AgentMail provider in browserbase/browse.sh#159. The vendor-neutral split is great — this PR ships zero email-vendor code, just a clean create / wait-otp / wait-link / latest / release contract and a swappable --inbox-cmd. A few small things, nothing blocking.

🟡 Medium

The allowlist lets the inner agent run more than it should. isAllowedCommand only checks that the command is node <the configured provider> — it never looks at the subcommand, so the agent can call create, release, and latest, not just the wait-otp / wait-link it's actually told about. Sibling-task isolation still holds (forceInboxScope), but the agent can shoot itself in the foot: a mid-run release kills its own live inbox, and a create overwrites .inbox.json so the address baked into the prompt no longer matches the one being polled. Consider restricting the allowlist to the read-only subcommands and keeping create / release orchestrator-only.

⚪ Low

The PR description doesn't match the diff. It describes a self-contained inbox.mjs that talks directly to AgentMail, plus SKILL.md changes — none of that is in this PR (that's the browse.sh side). Only the Cursor auto-summary below it is accurate. Worth rewriting so reviewers aren't chasing code that isn't here, and so the merge order (this lands first, then browse.sh#159) stays clear.
The public --inbox-cmd hook isn't documented. The body mentions SKILL.md docs, but there's no SKILL.md change in the diff. Runtime usage works (the injected prompt section covers it), but the new flag is undocumented for anyone driving autobrowse directly.
readInboxState returns inbox_id but it's used as the email address. The contract designates email for that. It only works because the browse.sh provider happens to set email === inbox_id; a provider that distinguishes them would break. Safer: return email || inbox_id || null.
The OTP default text is vendor-specific. buildInboxSection tells the agent the default matches "a 4–8 digit code" — but that default actually lives in the provider, not here. Since this file is meant to be vendor-neutral, better to say "the provider's default pattern; pass --regex to override."

✅ What's done well

The contract design is clean, and the isolation instinct — forceInboxScope stripping any agent-supplied --workspace / --task and pinning the run's real ones — is exactly right for the shared-workspace, parallel-sub-agent setup.

(Review was AI-assisted.)

Review fixes (skills#119): - Inner allowlist now permits only the read subcommands (wait-otp / wait-link / latest) of the configured --inbox-cmd provider; create/release are orchestrator-only. Stops the inner agent from killing its own inbox (`release`) or rewriting .inbox.json (`create`) mid-run so the polled inbox no longer matches the address in its prompt. - readInboxState prefers the contract's `email` field (`email || inbox_id`); it was using inbox_id as the address, which only worked while a provider set them equal. - buildInboxSection OTP text is vendor-neutral now ("the provider's default code match; pass --regex to override") — the digit default lives in the provider, not this file. Verified (stub provider, no network): release is BLOCKED while wait-otp is allowed and scope-forced to the run's task; an email≠inbox_id .inbox.json resolves to email; {{inbox_email}} substituted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aq17 · 2026-06-03T00:24:06Z

Thanks @shubh24 — addressed in 2c4be94 (public) + browserbase/browse.sh@befc60d (provider). Mapping your review:

🟡 Allowlist let the agent run more than it should. Fixed — the inner allowlist now permits only the read subcommands (wait-otp/wait-link/latest); create/release are orchestrator-only, so the agent can't release its own live inbox or create over .inbox.json mid-run. (Verified: release blocked, wait-otp allowed + scope-forced.)

⚪ readInboxState returns inbox_id, used as email. Fixed → email || inbox_id || null. (Verified with an email≠inbox_id .inbox.json.)

⚪ OTP default text vendor-specific. buildInboxSection now says "the provider's default code match; pass --regex to override" — no digit-count claim in the vendor-neutral file.

⚪ PR description didn't match diff. Rewritten above to describe the actual hook + merge order.

⚪ --inbox-cmd undocumented in SKILL.md. Intentionally keeping SKILL.md silent (team call: don't advertise the experimental feature publicly); the flag is in evaluate.mjs --help + the provider-contract comment. --inbox-email stays as the documented resolution fallback (order: .inbox.json → flag).

Link-safety from your earlier comment (open-redirect/internal-IP, substring --from) — that code now lives in the browse.sh provider; fixed there in befc60d: https-only + reject loopback/link-local(metadata)/RFC1918/*.local, anchors-only (dropped bare-text URLs), and exact domain-boundary --from.

The Cursor inline threads (orphan-inbox, placeholder-substitution, prompt-vs-state, args-unsandboxed) are stale — all resolved in the rework; the one current Cursor HIGH (readInboxState) is fixed above.

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread skills/autobrowse/scripts/evaluate.mjs

Comment thread skills/autobrowse/scripts/inbox.mjs Outdated

Comment thread skills/autobrowse/scripts/evaluate.mjs

aq17 requested review from shrey150 and shubh24 May 27, 2026 21:31

cursor Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread skills/autobrowse/scripts/evaluate.mjs

cursor Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread skills/autobrowse/scripts/evaluate.mjs

aq17 mentioned this pull request Jun 1, 2026

autobrowse: tear down self-owned browser session on exit #123

Open

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread skills/autobrowse/scripts/evaluate.mjs Outdated

aq17 changed the title ~~autobrowse: autonomous email inbox for signup/login/MFA tasks~~ autobrowse: optional vendor-neutral inbox-provider hook Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autobrowse: optional vendor-neutral inbox-provider hook#119

autobrowse: optional vendor-neutral inbox-provider hook#119
aq17 wants to merge 6 commits into
mainfrom
autobrowse-agentmail-inbox

aq17 commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aq17 commented Jun 1, 2026

Uh oh!

aq17 commented Jun 1, 2026

Uh oh!

shubh24 commented Jun 2, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

aq17 commented Jun 2, 2026

Uh oh!

aq17 commented Jun 2, 2026

Uh oh!

shubh24 commented Jun 2, 2026

Uh oh!

aq17 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aq17 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aq17 commented Jun 1, 2026

Uh oh!

aq17 commented Jun 1, 2026

Validation summary — ready for review

Uh oh!

shubh24 commented Jun 2, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aq17 commented Jun 2, 2026

Reworked per team feedback — AgentMail is now fully out of this public repo

Uh oh!

aq17 commented Jun 2, 2026

Uh oh!

shubh24 commented Jun 2, 2026

🟡 Medium

⚪ Low

✅ What's done well

Uh oh!

aq17 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aq17 commented May 26, 2026 •

edited

Loading