Skip to content

autobrowse: optional vendor-neutral inbox-provider hook#119

Open
aq17 wants to merge 6 commits into
mainfrom
autobrowse-agentmail-inbox
Open

autobrowse: optional vendor-neutral inbox-provider hook#119
aq17 wants to merge 6 commits into
mainfrom
autobrowse-agentmail-inbox

Conversation

@aq17

@aq17 aq17 commented May 26, 2026

Copy link
Copy Markdown
Contributor

Merge order: this lands first, then browserbase/browse.sh#159 + a BB_SKILLS_SHA bump.

Summary

Adds a generic, vendor-neutral, off-by-default inbox-provider hook to autobrowse so it can build signup/login/MFA skills — without shipping any email-vendor code (AgentMail lives only in the internal browse.sh repo). This is the public half.

skills/autobrowse/scripts/evaluate.mjs:

  • --inbox-cmd <path> / AUTOBROWSE_INBOX_CMD configures an optional external "inbox provider" command. When unset there is no inbox feature (default). The file documents the provider contract (create / wait-otp / wait-link / latest / release + the .inbox.json { email, inbox_id } schema).
  • The inner agent may only call read subcommands (wait-otp/wait-link/latest); create/release are orchestrator-only.
  • forceInboxScope pins the provider to the run's own --workspace/--task (sibling-task isolation in parallel runs).
  • Address resolves from .inbox.json (email preferred), with --inbox-email as a fallback; {{inbox_email}} in task.md is substituted.
  • Vendor-neutral throughout — git grep -i agentmail is empty.

No inbox.mjs and no SKILL.md changes — the feature is intentionally undocumented/experimental for now (the flag is in --help + the contract comment).

Test plan

  • Stub provider: hook injects the Agent-Inbox section, substitutes the address, scope-forces --workspace/--task; with no --inbox-cmd there's no section and the allowlist is browse-only.
  • release/create BLOCKED for the inner agent; wait-otp/wait-link/latest allowed.
  • readInboxState resolves to email when emailinbox_id.
  • Full browse.sh sandbox pipeline (with #159): real Substack signup created + read its inbox and received the verification email.

🤖 Generated with Claude Code

Lets an autobrowse loop provision a throwaway inbox so the inner agent can
register accounts and complete email verification. A new scripts/inbox.mjs CLI
(create / wait-otp / wait-link / latest / release) talks to the browse.sh
inbox endpoint, which owns the AgentMail key — the agent only ever sees the
address. evaluate.mjs gains --inbox-email, injects the inbox into the system
prompt, and allows the agent to shell out to inbox.mjs. SKILL.md documents the
opt-in provision/release steps, graduation note (inbox is loop-only), and the
3-concurrent-loop free-tier cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread skills/autobrowse/scripts/evaluate.mjs
Comment thread skills/autobrowse/scripts/inbox.mjs Outdated
Comment thread skills/autobrowse/scripts/evaluate.mjs
@aq17 aq17 requested review from shrey150 and shubh24 May 27, 2026 21:31
Consolidates all inbox-provisioning logic into the autobrowse skill so the
feature is self-contained with nothing browse.sh-specific. inbox.mjs now calls
api.agentmail.to directly using AGENTMAIL_API_KEY from the env (sweep-on-create
and the ab- prefix guard move into the CLI). Browserbase deployments inject a
pooled key; regular users provide their own (free at agentmail.to) and get a
clear setup error if it's unset. The inner agent still only ever sees the inbox
address — the key is read by inbox.mjs and never printed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread skills/autobrowse/scripts/evaluate.mjs
Hardening found by a live Substack magic-link signup run end-to-end:

- wait-link returned an open-tracking pixel (.gif) because it grabbed the first
  URL anywhere in the body. Now extract <a href> anchors with a reject-list
  (unsubscribe/mailto/tel/preferences/.gif), which skips img-src pixels; --match
  matches the href OR the visible link text so "confirm"/"sign in" finds the CTA
  even when the href is a tracking redirect (browse open follows it).
- latest only showed list-summary metadata (the list endpoint omits the body).
  It now fetches the full single message by id so text/html/links are visible.
- partsOf prefers AgentMail's cleaned extracted_text/extracted_html.
- evaluate.mjs killed wait-otp/wait-link at the fixed 30s exec cap (ETIMEDOUT
  on --within 60/90). exec timeout for inbox wait commands is now --within + 15s.

Verified end-to-end: signup → wait-link returns the real "Confirm your email"
CTA → browse open → signed-in Substack home. Sweep still proven to never touch
non-ab- inboxes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread skills/autobrowse/scripts/evaluate.mjs
… truth)

- create now releases the inbox the task already tracks before minting a new
  one — a re-create within the 1h sweep window otherwise orphaned a live inbox
  (leaked AND unreachable by release). (#2)
- evaluate.mjs resolves the inbox address from .inbox.json (what wait-otp/
  wait-link actually poll); --inbox-email is a fallback and a mismatch now warns
  instead of silently polling a different inbox. (#4)
- {{inbox_email}} in task.md is now substituted with the resolved address. (#3)
- executeCommand pins inbox.mjs to the run's own --workspace/--task, so a
  sub-agent can't read or release a sibling task's inbox (parallel runs share a
  workspace, isolated only by --task). (#5)

The 30s exec-timeout issue (#1) was already fixed by execTimeoutFor in 2d091fc.

Verified: re-create deletes the prior inbox (no orphan); a divergent
--inbox-email warns and the resolved address wins; {{inbox_email}} is replaced;
an agent passing a foreign --task is overridden back to its own.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aq17

aq17 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the Bugbot findings in 47f740f (lifecycle + single-source-of-truth) — #1 was already handled in 2d091fc:

Bugbot issue Resolution
Inbox wait killed by timeout Already fixed in 2d091fcexecTimeoutFor gives wait-otp/wait-link --within + 15s instead of the fixed 30s cap.
Recreate leaves orphan inboxes cmdCreate now DELETEs the inbox the task already tracks before minting a new one (sweep can't catch a <1h-old inbox, and overwriting .inbox.json would orphan it).
Task placeholder never substituted evaluate.mjs now substitutes {{inbox_email}} (optional inner whitespace) in task.md with the resolved address.
Prompt email ignores inbox state .inbox.json is now the single source of truth (it's what wait-* poll). --inbox-email is a fallback; a mismatch logs a WARNING and the .inbox.json value wins.
Inbox CLI args unsandboxed executeCommand pins inbox.mjs to the run's own --workspace/--task, stripping any agent-supplied values — a sub-agent can no longer read/release a sibling task's inbox.

Verified each end-to-end against a live AgentMail org: re-create deletes the prior inbox (no orphan); a divergent --inbox-email warns and the resolved address wins; {{inbox_email}} is replaced (no literal/flag leak); an agent passing a foreign --task is overridden back to its own ([scope] ... overridden).

@aq17

aq17 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Validation summary — ready for review

Tested at HEAD 47f740f:

  • Standalone loop (twice): full Substack magic-link signup → wait-link returned the real CTA → browse opensigned-in state confirmed ({"success": true, "signed_in": true} + screenshot). 12 turns / ~$0.63.
  • Bugbot fixes (each verified live against AgentMail):
    • re-create releases the prior inbox (no orphan)
    • {{inbox_email}} substituted in task.md (no literal/flag leak)
    • .inbox.json is the single source of truth; divergent --inbox-email warns and the resolved address wins
    • executeCommand pins inbox.mjs to the run's own --workspace/--task (a foreign --task is overridden)
    • the 30s exec-timeout issue was already handled by execTimeoutFor
  • Full browse.sh sandbox pipeline (with browserbase/browse.sh#159): a real Vercel Sandbox cloned bb-skills @ this commit, ran inbox.mjs create, and Substack delivered the verification email to the minted inbox — proving the feature works in the actual generation pipeline, not just the standalone harness.
  • Secret hygiene: the AgentMail key never appears in any trace artifact.

No leaked inboxes after any run. Ready to merge.

@shubh24

shubh24 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Reviewed this — the core idea is solid and the security spine is genuinely well done: the AgentMail key never reaches the inner agent, only the throwaway address does. Nice.

Three things worth tightening before merge. Framing them simply:

1. Leftover inboxes pile up (the big one).
Releasing the inbox is currently a note in SKILL.md telling the orchestrator to run inbox.mjs release — not something the code guarantees. Robots (LLMs) routinely skip trailing cleanup steps. That's literally the lesson #123 just learned for browser sessions: it moved teardown into code for exactly this reason. On any non-happy exit (inner-loop error, max iterations, a crash, Ctrl-C) the inbox is never deleted. The free tier caps at 3 inboxes, so a few forgotten ones and the next create hard-fails. The 1h sweep only helps if created_at parses (see below), so it's not a reliable backstop.
→ Release the run's own inbox in code on both the success and error paths, right alongside the session teardown — don't rely on the agent remembering.

2. It can grab the wrong verification code.
DEFAULT_OTP_RE = \b\d{4,8}\b returns the first 4–8 digit run anywhere in the email, in document order. Verification emails are full of other numbers — a year (2026), a price, a zip — and any of those can come before the real code and get returned instead. The agent is told this "prints just the extracted code," so a wrong number flows straight into the form and the run fails confusingly.
→ Prefer a code that sits next to a keyword, e.g. /(?:code|otp|verification|passcode)\D{0,20}(\d{4,8})/i (capture group 1), falling back to the bare regex.

3. It'll open links sent by strangers.
A throwaway inbox can receive mail from anyone who learns the address, and the prompt steers the agent to browse open whatever wait-link returns. So an attacker-delivered link gets auto-opened — including internal hosts (http://169.254.169.254/, http://localhost), open-redirects, or phishing. REJECT_LINK_RE only filters unsubscribe/tracking/gif, and plain http:// to internal IPs passes. Related: --from is a substring match, so --from stripe.com also matches stripe.com.evil.com.
→ Restrict to https, reject RFC1918/loopback/link-local hosts, drop the bare-text-URL fallback, and make --from an exact domain-boundary match.

Everything else I found is low/nit (a possibly-dead --inbox-email flag, stripHtml/stripTags duplication, a --within parsed in two places). Happy to expand on any of these.

Removes AgentMail from the public skill entirely and replaces the bundled
inbox.mjs with a generic, off-by-default provider contract. autobrowse no longer
ships an email provider or names any vendor; it only knows how to *call* one.

- evaluate.mjs: `--inbox-cmd <path>` / AUTOBROWSE_INBOX_CMD configures an optional
  inbox-provider command. Allowlist, exec-timeout, force-scope, and the (now
  vendor-neutral) Agent Inbox prompt key off it; all are inert when unset.
  Documents the provider contract (create/wait-otp/wait-link/latest/release +
  the .inbox.json {email,inbox_id} schema) as the explicit boundary.
- Deleted scripts/inbox.mjs (AgentMail-specific — moves to the internal caller).
- Scrubbed AGENTMAIL_API_KEY/agentmail.to from .env.example, SKILL.md (silent on
  the feature), and example-task.md.

Kept generic mechanics: .inbox.json single-source-of-truth, {{inbox_email}}
substitution, --workspace/--task force-scoping, wait-command exec timeout.

Verified: with a throwaway stub provider the hook injects the section,
substitutes the address, and forces scope; with no --inbox-cmd there is no inbox
section and the allowlist is browse-only. `git grep -i agentmail` → no matches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0593523. Configure here.

Comment thread skills/autobrowse/scripts/evaluate.mjs Outdated
@aq17 aq17 changed the title autobrowse: autonomous email inbox for signup/login/MFA tasks autobrowse: optional vendor-neutral inbox-provider hook Jun 2, 2026
@aq17

aq17 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Reworked per team feedback — AgentMail is now fully out of this public repo

Keeping AgentMail browse.sh-internal (skills is public), but without forking autobrowse. The inbox capability is now a generic, off-by-default provider hook; the AgentMail implementation + secrets live only in the internal browse.sh repo and are injected into the sandbox at runtime.

This PR (public) now:

  • Deletes scripts/inbox.mjs (AgentMail-specific).
  • evaluate.mjs gains --inbox-cmd <path> / AUTOBROWSE_INBOX_CMD — an optional, vendor-neutral inbox-provider command. Allowlist, exec-timeout, force-scope, and the (vendor-neutral) Agent Inbox prompt all key off it and are inert when unset. Documents the explicit provider contract.
  • Scrubs AgentMail from .env.example, SKILL.md (silent on the feature), example-task.md. git grep -i agentmail → no matches.
  • Keeps generic mechanics: .inbox.json SSOT, {{inbox_email}} substitution, --workspace/--task force-scope, wait-command timeout.

Verified: with a throwaway stub provider the hook injects the section + substitutes the address + forces scope; with no --inbox-cmd there's no inbox section and the allowlist is browse-only; the browse.sh provider (separate repo) drives create/release against real AgentMail through --inbox-cmd.

Pairs with the internal browse.sh PR (provider injection). Divergence stays minimal: one shared autobrowse core; browse.sh owns only a swappable provider script + a few prompt lines.

@aq17

aq17 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Re-validated end-to-end on the reworked architecture (full browse.sh sandbox pipeline, local): /api/skills/generate → sandbox cloned the public skill @ 0593523 (no AgentMail, --inbox-cmd hook only) → browse.sh injected /vercel/sandbox/inbox-provider.mjs and passed --inbox-cmd → the injected provider minted ab-…@agentmail.to via the edge-injected key → Substack delivered "Create your account on Substack" to it. Every new seam exercised with real external email; the dotenv-in-sandbox bug was caught and fixed. Inbox released after; no leaks.

@shubh24

shubh24 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Reviewed this alongside the AgentMail provider in browserbase/browse.sh#159. The vendor-neutral split is great — this PR ships zero email-vendor code, just a clean create / wait-otp / wait-link / latest / release contract and a swappable --inbox-cmd. A few small things, nothing blocking.

🟡 Medium

The allowlist lets the inner agent run more than it should. isAllowedCommand only checks that the command is node <the configured provider> — it never looks at the subcommand, so the agent can call create, release, and latest, not just the wait-otp / wait-link it's actually told about. Sibling-task isolation still holds (forceInboxScope), but the agent can shoot itself in the foot: a mid-run release kills its own live inbox, and a create overwrites .inbox.json so the address baked into the prompt no longer matches the one being polled. Consider restricting the allowlist to the read-only subcommands and keeping create / release orchestrator-only.

⚪ Low

  • The PR description doesn't match the diff. It describes a self-contained inbox.mjs that talks directly to AgentMail, plus SKILL.md changes — none of that is in this PR (that's the browse.sh side). Only the Cursor auto-summary below it is accurate. Worth rewriting so reviewers aren't chasing code that isn't here, and so the merge order (this lands first, then browse.sh#159) stays clear.
  • The public --inbox-cmd hook isn't documented. The body mentions SKILL.md docs, but there's no SKILL.md change in the diff. Runtime usage works (the injected prompt section covers it), but the new flag is undocumented for anyone driving autobrowse directly.
  • readInboxState returns inbox_id but it's used as the email address. The contract designates email for that. It only works because the browse.sh provider happens to set email === inbox_id; a provider that distinguishes them would break. Safer: return email || inbox_id || null.
  • The OTP default text is vendor-specific. buildInboxSection tells the agent the default matches "a 4–8 digit code" — but that default actually lives in the provider, not here. Since this file is meant to be vendor-neutral, better to say "the provider's default pattern; pass --regex to override."

✅ What's done well

The contract design is clean, and the isolation instinct — forceInboxScope stripping any agent-supplied --workspace / --task and pinning the run's real ones — is exactly right for the shared-workspace, parallel-sub-agent setup.

(Review was AI-assisted.)

Review fixes (skills#119):
- Inner allowlist now permits only the read subcommands (wait-otp / wait-link /
  latest) of the configured --inbox-cmd provider; create/release are
  orchestrator-only. Stops the inner agent from killing its own inbox
  (`release`) or rewriting .inbox.json (`create`) mid-run so the polled inbox
  no longer matches the address in its prompt.
- readInboxState prefers the contract's `email` field (`email || inbox_id`);
  it was using inbox_id as the address, which only worked while a provider set
  them equal.
- buildInboxSection OTP text is vendor-neutral now ("the provider's default
  code match; pass --regex to override") — the digit default lives in the
  provider, not this file.

Verified (stub provider, no network): release is BLOCKED while wait-otp is
allowed and scope-forced to the run's task; an email≠inbox_id .inbox.json
resolves to email; {{inbox_email}} substituted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aq17

aq17 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @shubh24 — addressed in 2c4be94 (public) + browserbase/browse.sh@befc60d (provider). Mapping your review:

🟡 Allowlist let the agent run more than it should. Fixed — the inner allowlist now permits only the read subcommands (wait-otp/wait-link/latest); create/release are orchestrator-only, so the agent can't release its own live inbox or create over .inbox.json mid-run. (Verified: release blocked, wait-otp allowed + scope-forced.)

readInboxState returns inbox_id, used as email. Fixed → email || inbox_id || null. (Verified with an emailinbox_id .inbox.json.)

⚪ OTP default text vendor-specific. buildInboxSection now says "the provider's default code match; pass --regex to override" — no digit-count claim in the vendor-neutral file.

⚪ PR description didn't match diff. Rewritten above to describe the actual hook + merge order.

--inbox-cmd undocumented in SKILL.md. Intentionally keeping SKILL.md silent (team call: don't advertise the experimental feature publicly); the flag is in evaluate.mjs --help + the provider-contract comment. --inbox-email stays as the documented resolution fallback (order: .inbox.json → flag).

Link-safety from your earlier comment (open-redirect/internal-IP, substring --from) — that code now lives in the browse.sh provider; fixed there in befc60d: https-only + reject loopback/link-local(metadata)/RFC1918/*.local, anchors-only (dropped bare-text URLs), and exact domain-boundary --from.

The Cursor inline threads (orphan-inbox, placeholder-substitution, prompt-vs-state, args-unsandboxed) are stale — all resolved in the rework; the one current Cursor HIGH (readInboxState) is fixed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants