Skip to content

fix(runtime): avoid bounded dispatch channel stalls#5015

Draft
NathanFlurry wants to merge 1 commit into
counter-latency/sqlite-generation-fencefrom
counter-latency/unbounded-dispatch-channels
Draft

fix(runtime): avoid bounded dispatch channel stalls#5015
NathanFlurry wants to merge 1 commit into
counter-latency/sqlite-generation-fencefrom
counter-latency/unbounded-dispatch-channels

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 11, 2026


Code Review: fix(runtime): avoid bounded dispatch channel stalls

Overview

This PR converts bounded mpsc::channel(capacity) channels to mpsc::unbounded_channel() across several packages to eliminate task stalls caused by .await-ing on full channels. Files affected:

  • pegboard-gateway / pegboard-gateway2 — per-request in-flight message channels (128 capacity)
  • universaldb (postgres + rocksdb) — transaction command channels (100 capacity)
  • rivetkit-python/client — actor event channels (100 capacity)
  • udb/cli.rs — UDB migration changelog channel (10,000 capacity)
  • guard-core/tests — test helper (benign)

The change is mechanically correct and the .await removals follow naturally. A few design and code quality points worth discussing.


Issues

1. forward_tunnel_message is dead-async after the change (pegboard-gateway2/shared_state.rs:854)

The function is still declared async fn but no longer has any .await points in its body after switching to the unbounded sender. Callers still .await it unnecessarily. It should be simplified to a plain fn.

2. CLAUDE.md pattern: prefer try_reserve + actor.overloaded over unbounded channels

CLAUDE.md says: "Actor-owned lifecycle / dispatch / lifecycle-event inbox producers use try_reserve helpers and return actor.overloaded. Do not await bounded mpsc::Sender::send."

The prescribed fix for "do not await a bounded send" is Sender::try_reserve() returning an actor.overloaded error when the buffer is full — not making the channel unbounded. That preserves backpressure and gives clients a meaningful signal. Distinguishing channels by role:

Channel Actor inbox? Verdict
pegboard-gateway in-flight request msg Yes (proxies actor traffic) Should use try_reserve pattern
pegboard-gateway2 in-flight request msg Yes Should use try_reserve pattern
universaldb transaction commands Internal async pipeline Unbounded acceptable
rivetkit-python actor events Actor event delivery Should use try_reserve pattern
udb/cli.rs migration changelog Offline migration tool Unbounded acceptable

3. Unbounded memory growth risk for gateway channels

The gateway channels route live per-request WebSocket traffic. With a slow or stalled consumer task, messages will now accumulate without bound. The old 128-message cap was a safety valve against misbehaving or slow clients causing unbounded memory growth in the gateway process. At minimum, add a comment near each unbounded_channel() call explaining what external constraint prevents runaway growth (e.g. WebSocket flow control, connection-level timeouts, etc.).


Minor

  • guard-core test change: _message_rx was already unused, no real impact — fine as cleanup.
  • udb/cli.rs change from 10,000 to unbounded is reasonable for an offline migration CLI.
  • All .await removals are mechanically correct and follow directly from the type change.

@MasterPtato MasterPtato force-pushed the counter-latency/sqlite-generation-fence branch from 71dd250 to becf8f8 Compare May 19, 2026 19:03
@MasterPtato MasterPtato force-pushed the counter-latency/unbounded-dispatch-channels branch from a499917 to da0205b Compare May 19, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant