fix(start-proxy): explicitly fail when proxy retries are exhausted#3531
fix(start-proxy): explicitly fail when proxy retries are exhausted#3531iaadillatif wants to merge 1 commit intogithub:mainfrom
Conversation
…prove startup stability checks
There was a problem hiding this comment.
Pull request overview
This PR hardens start-proxy startup so that exhausting retries results in an explicit failure (instead of potentially continuing after the proxy has already exited), by extracting proxy launch logic into a dedicated launcher module and adding targeted tests.
Changes:
- Extracted proxy startup/retry logic into
src/start-proxy/launcher.tsand updated the action to call it. - Added explicit failure when retries are exhausted, including “last exit code” diagnostics.
- Added unit tests for success, spawn error propagation, and retry exhaustion behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/start-proxy/launcher.ts | New launcher implementing retries, stabilization delay checks, and explicit failure on exhaustion. |
| src/start-proxy/launcher.test.ts | New tests covering retry exhaustion, spawn error propagation, and success path. |
| src/start-proxy-action.ts | Uses the extracted launcher instead of inline startup logic. |
| lib/start-proxy-action.js | Generated build output reflecting the TypeScript changes (not reviewed). |
| test("startProxy throws explicit error when retries are exhausted", async (t) => { | ||
| const fakeSpawn = sinon.stub().callsFake(() => { | ||
| const process = new FakeChildProcess(); | ||
| // Emit exit on next event loop iteration, ensuring it fires during the delay | ||
| setImmediate(() => { |
There was a problem hiding this comment.
This test will wait on the real util.delay(1000) inside startProxy (and retry scenarios will hit it repeatedly), which will slow down the test suite and can make timing-dependent tests flakier. Consider stubbing util.delay (or injecting a delay function into startProxy) so the tests run quickly and deterministically.
| subprocess.unref(); | ||
| if (subprocess.pid) { | ||
| core.saveState("proxy-process-pid", `${subprocess.pid}`); | ||
| } |
There was a problem hiding this comment.
core.saveState("proxy-process-pid", ...) is called immediately after spawn(), before the stabilization delay. If the child exits during the delay (or on a later retry), the post-action step (start-proxy-action-post.ts) will still read this PID and call process.kill(pid), which can end up targeting an unrelated process if the PID is reused. Consider only saving the PID after the proxy has remained alive past the startup delay (i.e., right before returning success), or otherwise persisting a PID only for the final, stabilized process.
This PR hardens proxy startup reliability by converting a silent false-success scenario into an explicit failure.
Fixes #3530 issue
Previously, the startup loop could exhaust retry attempts without throwing a terminal error. In such cases, outputs could still be set and execution would continue even though the proxy process was not running.
This change ensures that proxy initialization fails fast if stabilization does not occur.
Root Cause
The previous control flow:
This allowed workflows to proceed with a non-functional proxy.
Changes
1. Extracted startup logic
Created
src/start-proxy/launcher.ts:2. Explicit retry exhaustion handling
lastExitCode.Error("Failed to start proxy after 5 attempts (last exit code: X)")when no stable process remains.3. Tests
Added
launcher.test.tscovering:Backward Compatibility
Behavioral impact is limited to converting silent false-positive startup into explicit failure.
Testing
Executed:
npm run lintnpm run buildnpm run testAdded tests in:
src/start-proxy/launcher.test.tsAll new launcher tests pass on Node 24+.
Covering:
Why This Matters
This improves workflow reliability by:
Fail-fast behavior improves user debugging experience and system robustness.