Skip to content

feat: resume agents by provider session id#1976

Closed
janburzinski wants to merge 10 commits into
generalaction:mainfrom
janburzinski:emdash/codex-hook-fix-x8mvv
Closed

feat: resume agents by provider session id#1976
janburzinski wants to merge 10 commits into
generalaction:mainfrom
janburzinski:emdash/codex-hook-fix-x8mvv

Conversation

@janburzinski

@janburzinski janburzinski commented May 11, 2026

Copy link
Copy Markdown
Collaborator
  • improves codex and opencode resume

@janburzinski

Copy link
Copy Markdown
Collaborator Author

resolves ENG-1159 ENG-1147

@greptile-apps

greptile-apps Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR overhauls how Emdash resumes Codex and OpenCode agent sessions: instead of relying on positional --last flags or project-local TOML hooks, it captures provider-specific session IDs (Codex thread IDs, OpenCode sessionID) and stores them in the conversation's config JSON column so each session can be resumed precisely with codex resume <id> or opencode --session <id>.

  • Provider session ID capture is now two-pronged: hook events from OpenCode (via the updated opencode-notifications-plugin.js) propagate session IDs through event-enricher.ts → saveConversationProviderSessionId, while Codex (which no longer uses project-local hook config) captures thread IDs by polling the Codex SQLite state DB after the first user message is recorded.
  • buildAgentCommand was updated with provider-specific resume branches; Codex and OpenCode use providerSessionId when available and gracefully fall back to no-op (Codex) or --continue (OpenCode) when not.
  • A new injectAgentNotificationHooks setting (default true) was added to allow disabling hook config injection without removing other functionality.

Confidence Score: 5/5

Safe to merge; the new session-ID-capture paths are well-guarded, the frontend silently discards the spurious session.created notification, and the fallback behavior (no-op for Codex, --continue for OpenCode) is explicit and tested.

All critical paths (resume, hook delivery, DB writes) have appropriate fallbacks. The two noted issues are defensive-coding gaps that are unlikely to trigger given how the config column is always written by the app via JSON.stringify, and neither would cause data loss or incorrect resume behavior. The previous thread concerns (full-table scan, timer accumulation, claimedCodexThreadIds leak) are all addressed in this version of the code.

provider-session-id.ts (unguarded JSON.parse) and codex-session-store.ts (pre-save mutation) are the only files with minor quality gaps worth a second look.

Important Files Changed

Filename Overview
src/main/core/conversations/provider-session-id.ts New file: saves provider session IDs to the conversation config column with a LIKE-based pre-filter for uniqueness; missing try-catch for the target row's JSON.parse call, inconsistent with the candidate-loop guard in the same function.
src/main/core/conversations/impl/codex-session-store.ts New file: matches Codex threads to conversations via SQLite polling; scheduleCodexThreadCapture now guards against timer accumulation with captureScheduled, but resolveCodexSessionIdForResume mutates the conversation parameter in-place before awaiting the DB save.
src/main/core/conversations/impl/agent-command.ts Added providerSessionId parameter; Codex resume now uses explicit thread ID (filtering --last from custom flags), OpenCode uses --session, others fall back to existing resumeFlag logic.
src/main/core/agent-hooks/event-enricher.ts Added session ID extraction from hook event bodies for codex/opencode providers; session.created events safely fire as notification with empty payload that the frontend silently discards.
src/main/core/agent-hooks/opencode-notifications-plugin.js Added session.created handler that sends session_id; refactored sendPayload out of event handler; added readSessionId with broad field cascade to handle varying OpenCode event schemas.
src/main/core/agent-hooks/hook-config.ts Removed Codex TOML hook config writing (writeCodexNotify); codex is removed from the writeAll provider list, replaced by SQLite-based session capture.
src/main/core/conversations/impl/local-conversation.ts Integrates codex-session-store registration on PTY start and cleanup on exit; resolves providerSessionId for Codex resume; added early return for injectAgentNotificationHooks setting.
src/shared/agent-provider-registry.ts Codex resumeFlag changed from 'resume --last' to 'resume'; supportsHooks removed (Codex no longer uses hook-based session capture).

Sequence Diagram

sequenceDiagram
    participant UI as Renderer
    participant PTY as PTY Controller
    participant LC as LocalConversation
    participant CSS as CodexSessionStore
    participant OCPlugin as OpenCode Plugin
    participant EE as EventEnricher
    participant PSI as ProviderSessionId
    participant DB as Database

    Note over LC,DB: New Codex Session Start
    LC->>DB: resolveCodexSessionIdForResume
    DB-->>LC: providerSessionId or undefined
    LC->>LC: buildAgentCommand(providerSessionId)
    LC->>CSS: registerPendingCodexSession
    CSS->>CSS: scheduleCodexThreadCapture retry timers

    Note over PTY,CSS: User types first message
    UI->>PTY: sendInput(data)
    PTY->>CSS: recordCodexInput(sessionId, data)
    CSS->>CSS: findCodexThread via Codex SQLite DB
    CSS->>PSI: saveConversationProviderSessionId
    PSI->>DB: UPDATE conversations SET config

    Note over OCPlugin,DB: OpenCode session.created hook
    OCPlugin->>EE: "POST /hook type=notification body={session_id}"
    EE->>PSI: saveConversationProviderSessionId
    PSI->>DB: UPDATE conversations SET config
    EE-->>UI: "AgentEvent type=notification empty payload"
    Note over UI: Frontend silently discards no notificationType

    Note over LC,DB: On Resume
    LC->>DB: resolveCodexSessionIdForResume
    LC->>LC: codex resume ID or opencode --session ID
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
src/main/core/conversations/provider-session-id.ts:35-36
The `JSON.parse` call for the target conversation's config has no error handling, while the candidate-checking loop two lines above wraps the same operation in `try/catch` with an explicit comment about malformed records. If the target row's `config` column ever contains invalid JSON (e.g. written by an older version of the app or manually edited), this line throws an uncaught error that propagates through `enrichEvent` and could break hook processing for that session.

```suggestion
  let config: Record<string, unknown> = {};
  try {
    config = row?.config ? JSON.parse(row.config) : {};
  } catch {
    // Ignore malformed config; proceed to overwrite with valid JSON.
  }
  if (typeof config.providerSessionId === 'string' && config.providerSessionId) return;
```

### Issue 2 of 2
src/main/core/conversations/impl/codex-session-store.ts:126-131
`resolveCodexSessionIdForResume` mutates `conversation.providerSessionId` in-place before the `await saveConversationProviderSessionId(...)` completes. If the async save rejects, the in-memory object now carries a `providerSessionId` that was never persisted — any code that reads the same `conversation` reference after the thrown error will observe a stale, unsaved value. Returning the resolved value is sufficient; the caller in `local-conversation.ts` captures it from the return value, not from the mutated object.

```suggestion
    const providerSessionId = threads[conversationIndex]?.id;
    if (!providerSessionId) return conversation.providerSessionId;

    await saveConversationProviderSessionId(conversation.id, providerSessionId);
    return providerSessionId;
```

Reviews (3): Last reviewed commit: "fix: avoid implicit codex resume fallbac..." | Re-trigger Greptile

Comment on lines +110 to +122
const threads = codexDb
.prepare(
`SELECT id, created_at_ms
FROM threads
WHERE archived = 0
AND cwd = ?
AND created_at_ms >= ?
ORDER BY created_at_ms ASC, id ASC
LIMIT 100`
)
.all(cwd, firstCreatedAtMs) as CodexThreadRow[];

const providerSessionId = threads[conversationIndex]?.id;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Wrong session resumed when multiple tasks share the same worktree

resolveCodexSessionIdForResume queries all Codex threads for the same cwd sorted by time, then picks threads[conversationIndex] — a position-based match. The query has no task-specific filter, only cwd = ?. When two or more Emdash tasks are run inside the same worktree (identical cwd), threads from both tasks are returned together and the positional offset is wrong, causing a resume to open a completely unrelated Codex session silently. The fallback returns conversation.providerSessionId (which may be undefined), so the failure mode produces --last rather than an error, making the mismatch hard to notice.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/core/conversations/impl/codex-session-store.ts
Line: 110-122

Comment:
**Wrong session resumed when multiple tasks share the same worktree**

`resolveCodexSessionIdForResume` queries all Codex threads for the same `cwd` sorted by time, then picks `threads[conversationIndex]` — a position-based match. The query has no task-specific filter, only `cwd = ?`. When two or more Emdash tasks are run inside the same worktree (identical `cwd`), threads from both tasks are returned together and the positional offset is wrong, causing a resume to open a completely unrelated Codex session silently. The fallback returns `conversation.providerSessionId` (which may be `undefined`), so the failure mode produces `--last` rather than an error, making the mismatch hard to notice.

How can I resolve this? If you propose a fix, please make it concise.

};

const pendingCodexSessions = new Map<string, PendingCodexSession>();
const claimedCodexThreadIds = new Set<string>();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 claimedCodexThreadIds grows without bound

claimedCodexThreadIds is a module-level Set that has entries added on every successful claimCodexThread call but is never pruned. If saveConversationProviderSessionId throws after claimedCodexThreadIds.add(providerSessionId) (line 179), the thread ID is permanently marked as claimed in memory for the lifetime of the app process even though it was never persisted to the DB, and the next existingProviderSessionIds() call will treat it as taken. Over a long session with many Codex conversations this is also a low-level memory leak.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/core/conversations/impl/codex-session-store.ts
Line: 32

Comment:
**`claimedCodexThreadIds` grows without bound**

`claimedCodexThreadIds` is a module-level `Set` that has entries added on every successful `claimCodexThread` call but is never pruned. If `saveConversationProviderSessionId` throws after `claimedCodexThreadIds.add(providerSessionId)` (line 179), the thread ID is permanently marked as claimed in memory for the lifetime of the app process even though it was never persisted to the DB, and the next `existingProviderSessionIds()` call will treat it as taken. Over a long session with many Codex conversations this is also a low-level memory leak.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +184 to +206
function scheduleFirstMessageCapture(ptySessionId: string, firstUserMessage: string): void {
const tryCapture = async () => {
const pending = pendingCodexSessions.get(ptySessionId);
if (!pending || pending.captured) return;

const providerSessionId = findCodexThreadByFirstMessage({
cwd: pending.cwd,
startedAtMs: pending.startedAtMs,
firstUserMessage,
});
if (providerSessionId) await claimCodexThread(ptySessionId, providerSessionId);
};

for (const delayMs of [250, 1_000, 2_500, 5_000]) {
setTimeout(() => {
tryCapture().catch((error) => {
log.debug('CodexSessionStore: failed to capture Codex first message', {
error: String(error),
});
});
}, delayMs);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Timer accumulation in scheduleFirstMessageCapture

recordCodexInput calls scheduleFirstMessageCapture on every newline it encounters. Each call unconditionally registers 4 new setTimeout callbacks. If the user types several messages before the first capture succeeds — or in error paths where capture keeps failing — the number of pending timers grows without limit for that session. Once pending.captured is set the timers do return early, but they are still scheduled and held in the event loop until their delay expires. A short-circuit like checking if (pending.captured) return before creating any timers would avoid the accumulation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/core/conversations/impl/codex-session-store.ts
Line: 184-206

Comment:
**Timer accumulation in `scheduleFirstMessageCapture`**

`recordCodexInput` calls `scheduleFirstMessageCapture` on every newline it encounters. Each call unconditionally registers 4 new `setTimeout` callbacks. If the user types several messages before the first capture succeeds — or in error paths where capture keeps failing — the number of pending timers grows without limit for that session. Once `pending.captured` is set the timers do return early, but they are still scheduled and held in the event loop until their delay expires. A short-circuit like checking `if (pending.captured) return` before creating any timers would avoid the accumulation.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +9 to +21
const otherRows = await db
.select({ config: conversations.config })
.from(conversations)
.where(ne(conversations.id, conversationId));

for (const row of otherRows) {
if (!row.config) continue;
try {
if (JSON.parse(row.config).providerSessionId === providerSessionId) return;
} catch {
// Ignore malformed historical config.
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Full-table scan on every hook event

saveConversationProviderSessionId selects every row from conversations (ne(conversations.id, conversationId) with no provider filter) to check for ID uniqueness. For an OpenCode session that fires frequent hook events (e.g., session.created, session.idle), this scan runs on each event. As the conversation history grows this becomes an O(n) read per event. Adding a provider filter (e.g., only codex and opencode rows) or moving the uniqueness constraint to the database would scope the cost.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/core/conversations/provider-session-id.ts
Line: 9-21

Comment:
**Full-table scan on every hook event**

`saveConversationProviderSessionId` selects every row from `conversations` (`ne(conversations.id, conversationId)` with no provider filter) to check for ID uniqueness. For an OpenCode session that fires frequent hook events (e.g., `session.created`, `session.idle`), this scan runs on each event. As the conversation history grows this becomes an O(n) read per event. Adding a provider filter (e.g., only `codex` and `opencode` rows) or moving the uniqueness constraint to the database would scope the cost.

How can I resolve this? If you propose a fix, please make it concise.

@janburzinski

Copy link
Copy Markdown
Collaborator Author

@greptile

@janburzinski

Copy link
Copy Markdown
Collaborator Author

@greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant