Skip to content

fix(transport): add timeout to closeAndWait to prevent hangs#40750

Closed
SebTardif wants to merge 3 commits into
microsoft:mainfrom
SebTardif:fix/transport-close-timeout
Closed

fix(transport): add timeout to closeAndWait to prevent hangs#40750
SebTardif wants to merge 3 commits into
microsoft:mainfrom
SebTardif:fix/transport-close-timeout

Conversation

@SebTardif
Copy link
Copy Markdown
Contributor

@SebTardif SebTardif commented May 9, 2026

Summary

  • closeAndWait awaits the WebSocket close event with no deadline; if the remote end never sends a close frame, the caller hangs indefinitely
  • Add a 30-second timeout via Promise.race so the close path always completes
  • Clear the timer after resolution to avoid keeping the event loop alive

Failure scenario: Called in the error cleanup path of WebSocketTransport._connect() (line 128). If the initial connection fails but the WebSocket never emits a close event (e.g., network partition, unresponsive remote), closeAndWait blocks forever, preventing the connection error from propagating to the caller.

Origin: closeAndWait was introduced in #2350 (2020-05-29) by Dmitry Gozman as part of the Progress concept.

SebTardif added 2 commits May 9, 2026 06:01
closeAndWait awaits the WebSocket close event with no deadline.
If the remote end never sends a close frame, the caller hangs
indefinitely. Add a 30-second timeout via Promise.race so the
close path always completes.

Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
Clear the setTimeout timer after Promise.race resolves to avoid
keeping the event loop alive unnecessarily. Also remove the
parameter to keep the public signature unchanged.

Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
@pavelfeldman
Copy link
Copy Markdown
Member

Let's add a test that demonstrates the issue (fails w/o the fix).

Add two tests demonstrating the fix:
- closeAndWait completes within timeout when server never sends close
- closeAndWait returns immediately when already closed

Re-add timeoutMs parameter (default 30s) so tests can use a short
timeout without waiting 30 seconds.

Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
@SebTardif
Copy link
Copy Markdown
Contributor Author

Added two tests in tests/library/transport.spec.ts:

  1. closeAndWait should not hang when server does not send close frame - creates a WebSocket server that never closes its end, verifies closeAndWait completes within the timeout instead of hanging forever
  2. closeAndWait should return immediately when already closed - verifies the early-return path when the socket is already closed

Re-added the timeoutMs parameter (default 30s) so the test can use a short timeout (1s) without slowing down CI.

@pavelfeldman
Copy link
Copy Markdown
Member

These pass for me w/o the fix.

Running 2 tests using 2 workers

  ✓  1 [chromium-library] › tests/library/transport.spec.ts:39:5 › closeAndWait should return immediately when already closed (28ms)
  ✓  2 [chromium-library] › tests/library/transport.spec.ts:23:5 › closeAndWait should not hang when server does not send close frame (23ms)

@SebTardif
Copy link
Copy Markdown
Contributor Author

You're right, the ws library already has a built-in 30-second close timeout (closeTimeout = 30 * 1000 in websocket.js) that destroys the socket if the closing handshake doesn't complete. Our timeout is redundant with that protection, and we can't write a test that demonstrates a real hang. Closing.

@SebTardif SebTardif closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants