feat: resume agents by provider session id#1976
Conversation
|
resolves ENG-1159 ENG-1147 |
Greptile SummaryThis PR overhauls how Emdash resumes Codex and OpenCode agent sessions: instead of relying on positional
Confidence Score: 5/5Safe to merge; the new session-ID-capture paths are well-guarded, the frontend silently discards the spurious session.created notification, and the fallback behavior (no-op for Codex, --continue for OpenCode) is explicit and tested. All critical paths (resume, hook delivery, DB writes) have appropriate fallbacks. The two noted issues are defensive-coding gaps that are unlikely to trigger given how the config column is always written by the app via JSON.stringify, and neither would cause data loss or incorrect resume behavior. The previous thread concerns (full-table scan, timer accumulation, claimedCodexThreadIds leak) are all addressed in this version of the code. provider-session-id.ts (unguarded JSON.parse) and codex-session-store.ts (pre-save mutation) are the only files with minor quality gaps worth a second look.
|
| Filename | Overview |
|---|---|
| src/main/core/conversations/provider-session-id.ts | New file: saves provider session IDs to the conversation config column with a LIKE-based pre-filter for uniqueness; missing try-catch for the target row's JSON.parse call, inconsistent with the candidate-loop guard in the same function. |
| src/main/core/conversations/impl/codex-session-store.ts | New file: matches Codex threads to conversations via SQLite polling; scheduleCodexThreadCapture now guards against timer accumulation with captureScheduled, but resolveCodexSessionIdForResume mutates the conversation parameter in-place before awaiting the DB save. |
| src/main/core/conversations/impl/agent-command.ts | Added providerSessionId parameter; Codex resume now uses explicit thread ID (filtering --last from custom flags), OpenCode uses --session, others fall back to existing resumeFlag logic. |
| src/main/core/agent-hooks/event-enricher.ts | Added session ID extraction from hook event bodies for codex/opencode providers; session.created events safely fire as notification with empty payload that the frontend silently discards. |
| src/main/core/agent-hooks/opencode-notifications-plugin.js | Added session.created handler that sends session_id; refactored sendPayload out of event handler; added readSessionId with broad field cascade to handle varying OpenCode event schemas. |
| src/main/core/agent-hooks/hook-config.ts | Removed Codex TOML hook config writing (writeCodexNotify); codex is removed from the writeAll provider list, replaced by SQLite-based session capture. |
| src/main/core/conversations/impl/local-conversation.ts | Integrates codex-session-store registration on PTY start and cleanup on exit; resolves providerSessionId for Codex resume; added early return for injectAgentNotificationHooks setting. |
| src/shared/agent-provider-registry.ts | Codex resumeFlag changed from 'resume --last' to 'resume'; supportsHooks removed (Codex no longer uses hook-based session capture). |
Sequence Diagram
sequenceDiagram
participant UI as Renderer
participant PTY as PTY Controller
participant LC as LocalConversation
participant CSS as CodexSessionStore
participant OCPlugin as OpenCode Plugin
participant EE as EventEnricher
participant PSI as ProviderSessionId
participant DB as Database
Note over LC,DB: New Codex Session Start
LC->>DB: resolveCodexSessionIdForResume
DB-->>LC: providerSessionId or undefined
LC->>LC: buildAgentCommand(providerSessionId)
LC->>CSS: registerPendingCodexSession
CSS->>CSS: scheduleCodexThreadCapture retry timers
Note over PTY,CSS: User types first message
UI->>PTY: sendInput(data)
PTY->>CSS: recordCodexInput(sessionId, data)
CSS->>CSS: findCodexThread via Codex SQLite DB
CSS->>PSI: saveConversationProviderSessionId
PSI->>DB: UPDATE conversations SET config
Note over OCPlugin,DB: OpenCode session.created hook
OCPlugin->>EE: "POST /hook type=notification body={session_id}"
EE->>PSI: saveConversationProviderSessionId
PSI->>DB: UPDATE conversations SET config
EE-->>UI: "AgentEvent type=notification empty payload"
Note over UI: Frontend silently discards no notificationType
Note over LC,DB: On Resume
LC->>DB: resolveCodexSessionIdForResume
LC->>LC: codex resume ID or opencode --session ID
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
src/main/core/conversations/provider-session-id.ts:35-36
The `JSON.parse` call for the target conversation's config has no error handling, while the candidate-checking loop two lines above wraps the same operation in `try/catch` with an explicit comment about malformed records. If the target row's `config` column ever contains invalid JSON (e.g. written by an older version of the app or manually edited), this line throws an uncaught error that propagates through `enrichEvent` and could break hook processing for that session.
```suggestion
let config: Record<string, unknown> = {};
try {
config = row?.config ? JSON.parse(row.config) : {};
} catch {
// Ignore malformed config; proceed to overwrite with valid JSON.
}
if (typeof config.providerSessionId === 'string' && config.providerSessionId) return;
```
### Issue 2 of 2
src/main/core/conversations/impl/codex-session-store.ts:126-131
`resolveCodexSessionIdForResume` mutates `conversation.providerSessionId` in-place before the `await saveConversationProviderSessionId(...)` completes. If the async save rejects, the in-memory object now carries a `providerSessionId` that was never persisted — any code that reads the same `conversation` reference after the thrown error will observe a stale, unsaved value. Returning the resolved value is sufficient; the caller in `local-conversation.ts` captures it from the return value, not from the mutated object.
```suggestion
const providerSessionId = threads[conversationIndex]?.id;
if (!providerSessionId) return conversation.providerSessionId;
await saveConversationProviderSessionId(conversation.id, providerSessionId);
return providerSessionId;
```
Reviews (3): Last reviewed commit: "fix: avoid implicit codex resume fallbac..." | Re-trigger Greptile
| const threads = codexDb | ||
| .prepare( | ||
| `SELECT id, created_at_ms | ||
| FROM threads | ||
| WHERE archived = 0 | ||
| AND cwd = ? | ||
| AND created_at_ms >= ? | ||
| ORDER BY created_at_ms ASC, id ASC | ||
| LIMIT 100` | ||
| ) | ||
| .all(cwd, firstCreatedAtMs) as CodexThreadRow[]; | ||
|
|
||
| const providerSessionId = threads[conversationIndex]?.id; |
There was a problem hiding this comment.
Wrong session resumed when multiple tasks share the same worktree
resolveCodexSessionIdForResume queries all Codex threads for the same cwd sorted by time, then picks threads[conversationIndex] — a position-based match. The query has no task-specific filter, only cwd = ?. When two or more Emdash tasks are run inside the same worktree (identical cwd), threads from both tasks are returned together and the positional offset is wrong, causing a resume to open a completely unrelated Codex session silently. The fallback returns conversation.providerSessionId (which may be undefined), so the failure mode produces --last rather than an error, making the mismatch hard to notice.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/core/conversations/impl/codex-session-store.ts
Line: 110-122
Comment:
**Wrong session resumed when multiple tasks share the same worktree**
`resolveCodexSessionIdForResume` queries all Codex threads for the same `cwd` sorted by time, then picks `threads[conversationIndex]` — a position-based match. The query has no task-specific filter, only `cwd = ?`. When two or more Emdash tasks are run inside the same worktree (identical `cwd`), threads from both tasks are returned together and the positional offset is wrong, causing a resume to open a completely unrelated Codex session silently. The fallback returns `conversation.providerSessionId` (which may be `undefined`), so the failure mode produces `--last` rather than an error, making the mismatch hard to notice.
How can I resolve this? If you propose a fix, please make it concise.| }; | ||
|
|
||
| const pendingCodexSessions = new Map<string, PendingCodexSession>(); | ||
| const claimedCodexThreadIds = new Set<string>(); |
There was a problem hiding this comment.
claimedCodexThreadIds grows without bound
claimedCodexThreadIds is a module-level Set that has entries added on every successful claimCodexThread call but is never pruned. If saveConversationProviderSessionId throws after claimedCodexThreadIds.add(providerSessionId) (line 179), the thread ID is permanently marked as claimed in memory for the lifetime of the app process even though it was never persisted to the DB, and the next existingProviderSessionIds() call will treat it as taken. Over a long session with many Codex conversations this is also a low-level memory leak.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/core/conversations/impl/codex-session-store.ts
Line: 32
Comment:
**`claimedCodexThreadIds` grows without bound**
`claimedCodexThreadIds` is a module-level `Set` that has entries added on every successful `claimCodexThread` call but is never pruned. If `saveConversationProviderSessionId` throws after `claimedCodexThreadIds.add(providerSessionId)` (line 179), the thread ID is permanently marked as claimed in memory for the lifetime of the app process even though it was never persisted to the DB, and the next `existingProviderSessionIds()` call will treat it as taken. Over a long session with many Codex conversations this is also a low-level memory leak.
How can I resolve this? If you propose a fix, please make it concise.| function scheduleFirstMessageCapture(ptySessionId: string, firstUserMessage: string): void { | ||
| const tryCapture = async () => { | ||
| const pending = pendingCodexSessions.get(ptySessionId); | ||
| if (!pending || pending.captured) return; | ||
|
|
||
| const providerSessionId = findCodexThreadByFirstMessage({ | ||
| cwd: pending.cwd, | ||
| startedAtMs: pending.startedAtMs, | ||
| firstUserMessage, | ||
| }); | ||
| if (providerSessionId) await claimCodexThread(ptySessionId, providerSessionId); | ||
| }; | ||
|
|
||
| for (const delayMs of [250, 1_000, 2_500, 5_000]) { | ||
| setTimeout(() => { | ||
| tryCapture().catch((error) => { | ||
| log.debug('CodexSessionStore: failed to capture Codex first message', { | ||
| error: String(error), | ||
| }); | ||
| }); | ||
| }, delayMs); | ||
| } | ||
| } |
There was a problem hiding this comment.
Timer accumulation in
scheduleFirstMessageCapture
recordCodexInput calls scheduleFirstMessageCapture on every newline it encounters. Each call unconditionally registers 4 new setTimeout callbacks. If the user types several messages before the first capture succeeds — or in error paths where capture keeps failing — the number of pending timers grows without limit for that session. Once pending.captured is set the timers do return early, but they are still scheduled and held in the event loop until their delay expires. A short-circuit like checking if (pending.captured) return before creating any timers would avoid the accumulation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/core/conversations/impl/codex-session-store.ts
Line: 184-206
Comment:
**Timer accumulation in `scheduleFirstMessageCapture`**
`recordCodexInput` calls `scheduleFirstMessageCapture` on every newline it encounters. Each call unconditionally registers 4 new `setTimeout` callbacks. If the user types several messages before the first capture succeeds — or in error paths where capture keeps failing — the number of pending timers grows without limit for that session. Once `pending.captured` is set the timers do return early, but they are still scheduled and held in the event loop until their delay expires. A short-circuit like checking `if (pending.captured) return` before creating any timers would avoid the accumulation.
How can I resolve this? If you propose a fix, please make it concise.| const otherRows = await db | ||
| .select({ config: conversations.config }) | ||
| .from(conversations) | ||
| .where(ne(conversations.id, conversationId)); | ||
|
|
||
| for (const row of otherRows) { | ||
| if (!row.config) continue; | ||
| try { | ||
| if (JSON.parse(row.config).providerSessionId === providerSessionId) return; | ||
| } catch { | ||
| // Ignore malformed historical config. | ||
| } | ||
| } |
There was a problem hiding this comment.
Full-table scan on every hook event
saveConversationProviderSessionId selects every row from conversations (ne(conversations.id, conversationId) with no provider filter) to check for ID uniqueness. For an OpenCode session that fires frequent hook events (e.g., session.created, session.idle), this scan runs on each event. As the conversation history grows this becomes an O(n) read per event. Adding a provider filter (e.g., only codex and opencode rows) or moving the uniqueness constraint to the database would scope the cost.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/core/conversations/provider-session-id.ts
Line: 9-21
Comment:
**Full-table scan on every hook event**
`saveConversationProviderSessionId` selects every row from `conversations` (`ne(conversations.id, conversationId)` with no provider filter) to check for ID uniqueness. For an OpenCode session that fires frequent hook events (e.g., `session.created`, `session.idle`), this scan runs on each event. As the conversation history grows this becomes an O(n) read per event. Adding a provider filter (e.g., only `codex` and `opencode` rows) or moving the uniqueness constraint to the database would scope the cost.
How can I resolve this? If you propose a fix, please make it concise.|
@greptile |
|
@greptile |
Uh oh!
There was an error while loading. Please reload this page.