dcd/musb: harden & refactor the EP0 control state machine (review follow-up for #3643)#3699
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cfde577240
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ser.reset_input_buffer() | ||
| for ch in 'cat README.TXT\r': | ||
| serial_write_all(ser, ch.encode()) | ||
| ser.flush() # drain to the wire so the per-char pacing below actually paces the device |
There was a problem hiding this comment.
Bound serial flushes used for pacing
In the host MSC shell test, if the CDC target stops draining after a byte is queued, ser.write() can return but this new ser.flush() calls pyserial's unbounded drain path and does not honor HIL_SERIAL_WRITE_TIMEOUT, so the pytest worker can hang here instead of failing after the configured serial-write timeout. This reintroduces the unbounded serial-write hang for these per-character loops (also in the dd 1024 loop below); keep the pacing without flush() or wrap the drain in an explicit timeout.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Correct — tcdrain is unbounded and ignores write_timeout, so it would reintroduce the hang serial_write_all exists to prevent. Per maintainer direction the flush is dropped entirely rather than bounded: the per-character loops keep their existing 2 ms inter-byte sleep, which already spaces single-byte writes on the wire (coalescing needs two bytes queued within one ~1 ms USB frame, which the sleep prevents). Net result — the per-char loops are unchanged from #3643; only serial_write_all becomes single-write-timeout-fatal. Fixed in 95d6b75.
There was a problem hiding this comment.
Pull request overview
This PR is a follow-up to #3643 that further hardens the MUSB EP0 control-transfer state machine against deferred-SETUP race conditions, and adjusts HIL serial I/O to avoid data duplication on write timeouts while restoring per-character pacing where required.
Changes:
- Refines EP0 STATUS_OUT deferral handling by splitting pending states and introducing an
rxrdy_parkedgate to prevent stale RXRDY re-entry/replay races. - Refactors EP0 SETUP FIFO draining into a shared helper (
pipe0_read_setup) and adjusts CSRL0 guard ordering per MUSB spec guidance. - Updates HIL serial write behavior to treat write timeouts as fatal (no retry) and restores
flush()in specific per-character pacing loops.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/portable/mentor/musb/dcd_musb.c |
Reworks EP0 state transitions and deferred SETUP replay logic to close IRQ/xfer ordering races; adds RXRDY parking mechanism and shared SETUP read helper. |
test/hil/hil_test.py |
Simplifies serial write to avoid retry-based duplication on timeouts and restores flush-based pacing for specific per-character command loops. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
TinyUSB Average Code Size Metrics
Input files
|
Hardware-in-the-loop (HIL) Test Reporthfp-iar
Legend: ✅ pass · ❌ fail · ⚪ skipped · blank not run hfp.json
Legend: ✅ pass · ❌ fail · ⚪ skipped · blank not run tinyusb.json
Legend: ✅ pass · ❌ fail · ⚪ skipped · blank not run |
|
| target | .text | .rodata | .data | .bss | total | % diff |
|---|---|---|---|---|---|---|
| ek_tm4c123gxl/dfu_runtime | 8,772 → 8,916 (+144) | — | — | — | 8,804 → 8,948 (+144) | +1.6% |
| ek_tm4c123gxl/hid_generic_inout | 9,768 → 9,912 (+144) | — | — | — | 9,800 → 9,944 (+144) | +1.5% |
| msp_exp432e401y/dfu_runtime | 8,816 → 8,948 (+132) | — | — | — | 9,384 → 9,516 (+132) | +1.4% |
| ek_tm4c123gxl/hid_multiple_interface | 10,580 → 10,724 (+144) | — | — | — | 10,612 → 10,756 (+144) | +1.4% |
| ek_tm4c123gxl/hid_boot_interface | 10,592 → 10,736 (+144) | — | — | — | 10,624 → 10,768 (+144) | +1.4% |
| ek_tm4c123gxl/hid_composite | 10,756 → 10,900 (+144) | — | — | — | 10,788 → 10,932 (+144) | +1.3% |
| ek_tm4c123gxl/midi_test | 10,900 → 11,044 (+144) | — | — | — | 10,932 → 11,076 (+144) | +1.3% |
| msp_exp432e401y/hid_generic_inout | 9,868 → 10,000 (+132) | — | — | — | 10,436 → 10,568 (+132) | +1.3% |
| ek_tm4c123gxl/cdc_dual_ports | 11,752 → 11,896 (+144) | — | — | — | 11,784 → 11,928 (+144) | +1.2% |
| ek_tm4c123gxl/printer_to_cdc | 11,816 → 11,960 (+144) | — | — | — | 11,844 → 11,988 (+144) | +1.2% |
|
CI status: 315/317 pass. The two failures are infrastructure, not this PR:
🤖 Generated with Claude Code |
cfde577 to
2c4b905
Compare
|
History rewritten into one commit per review finding (tree is byte-identical to the previously HIL-tested state; each intermediate commit compiles):
🤖 Generated with Claude Code |
The actual-STALL path (no deferred SETUP) forced EP0 to IDLE but left rxrdy_parked set if the aborted transfer had parked RXRDY via NAK flow control (e.g. a rejected OUT-data request in DATA_OUT). A subsequent SETUP IRQ would then hit the parked-gate early return and be ignored, relying on SentStall/SetupEnd to clear the flag first. Clear it here so recovery never depends on that ordering. Addresses Copilot review on #3699. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The per-character CLI loops (cat README.TXT, dd 1024) used ser.flush() to push each byte to the wire before the pacing sleep, but flush() calls tcdrain which blocks unbounded and ignores write_timeout — a wedged CDC target would hang the worker instead of failing, the exact hang serial_write_all was added to avoid. Replace it with a bounded drain that polls ser.out_waiting until the kernel TX queue empties or SERIAL_WRITE_TIMEOUT elapses, preserving the pacing guarantee. Addresses Codex review on #3699. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The 8-byte EP0 SETUP drain (count0 assert + two FIFO word reads via a union) was duplicated verbatim between the IDLE case and the deferral case; a future fix applied to one copy but not the other would only show up on the rare deferred-race path. Share one helper. count0 is now read inside the only remaining user (DATA OUT drain). Review follow-up for #3643 (dcd_musb.c l.507 finding). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The pre-existing comment explaining why the OUT branch does not ack RxPktRdy was dropped when the SETUP handling moved into pipe0_start_setup(). It is load-bearing: acking before edpt0_xfer() arms the drain buffer would let the host send data with nowhere to put it. Restore it with the databook-deviation rationale so the branches don't get "unified" later. Review follow-up for #3643 (dcd_musb.c l.116 finding). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The goto jumped into the csrl==0 completion switch with RXRDY still set, making its "When CSRL0 is zero" guard comment untrue on that path. Handle each deferral state in a self-contained switch instead; the csrl==0 switch is now only reached with csrl==0 and its comment is truthful again. Behavior unchanged. Review follow-up for #3643 (dcd_musb.c l.523 finding). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
MUSBMHDRC 21.1.5 requires the EP0 service routine to check SentStall and SetupEnd first; the early DATAEND return ran before both, and SentStall is most likely to fire exactly while DataEnd may still read back set (auto-STALL after DataEnd, 21.1.7), which would skip the recovery. The guard also moves below the RXRDY block so a coalesced DATAEND|RXRDY read cannot swallow a SETUP on cores where the CPU-set-only DataEnd bit reads back 1; the comment documents the vendor-dependent read-back. Review follow-up for #3643 (dcd_musb.c l.445 finding). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
STATUS_OUT_PENDING conflated "edpt0_xfer(STATUS OUT) called, awaiting confirm IRQ" with "confirm IRQ seen, awaiting edpt0_xfer". The deferral path completed the status and replayed the saved SETUP from the ISR in both flavors; in the IRQ-first one, usbd's still- outstanding edpt0_xfer(STATUS OUT) for the old transfer (queued via status_stage_xact) then landed in the replayed transfer's state and corrupted it: NULL pipe0.buf armed plus RXRDYC, so the host's next DATA OUT drained through a NULL pointer. usbd processes EP0 XFER_COMPLETE events unconditionally, so nothing downstream defuses it. Split the state into STATUS_OUT_PENDING_XFER / _IRQ. The deferral completes and replays only in PENDING_XFER (old transfer already retired); in PENDING_IRQ it only holds the SETUP and the usbd-driven edpt0_xfer fires the completion and replays. The DATA_IN deferral now synthesizes PENDING_IRQ (its remain==0 invariant asserted: a SETUP before DataEnd raises SetupEnd instead), which also makes the old deferred-promotion in the csrl==0 DATA_IN case unreachable - dropped. Assert the drain buffer before the DATA OUT FIFO read as a cheap backstop for this corruption class. Review follow-up for #3643 (dcd_musb.c l.503 finding). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The deferral path drains the SETUP but leaves RxPktRdy set, and the SETUP's IRQ latches after the ISR's clear-on-read intr_tx read - so a second process_ep0 pass (same ISR, via the intr_tx re-read merge) is guaranteed and misreads the leftovers: count0==0 fires a spurious DATA OUT completion, the replay's RXRDYC write turns the second pass into a phantom csrl==0 DATA IN completion, and a zero-length replay re-enters the deferral case on a drained FIFO (count0 assert or garbage saved as a SETUP). The registers cannot expose the staleness: RxPktRdy and count0 read unchanged until ServicedRxPktRdy is written. Track it in software: rxrdy_consumed means "RxPktRdy is set in hw but its packet was already consumed". Set wherever a drained packet's RXRDY is intentionally left set (OUT/zero-length flow-control parks, every DATA OUT drain awaiting the next arm, the deferral path); cleared at every RXRDYC write site (edpt0_xfer arms, dcd_set_address, STALLED/SETEND recovery, bus reset). The RXRDY block returns early while parked. Replayed IN requests skip the RXRDYC in pipe0_start_setup and keep the packet parked until the edpt0_xfer(DATA IN) arm acks it (before loading the shared FIFO), so the stale pass sees RXRDY+parked instead of csrl==0. The normal IDLE path is unchanged - master never re-entered these windows because the single SETUP edge was always consumed by the pass that parked it; the deferral is what introduced a pending second pass. Review follow-up for #3643 (dcd_musb.c l.516 finding). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
dcd_edpt_stall(EP0 OUT) discarded the deferred SETUP and armed SendStall. A deferred SETUP can only exist once the old transfer's status stage was seen on the wire, so the request usbd is rejecting (class callback failing at CONTROL_STAGE_DATA) already succeeded host-side and the hardware already ACKed the next SETUP - the STALL would land on that innocent request, which then fails host-side without any tud callback ever seeing it. Skip the stall and replay the deferred SETUP; the rejected transfer needs no wire-level stall since it is already over. Review follow-up for #3643 (dcd_musb.c l.860 finding). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The actual-STALL path (no deferred SETUP) forced EP0 to IDLE but left rxrdy_consumed set if the aborted transfer had parked RXRDY via NAK flow control (e.g. a rejected OUT-data request in DATA_OUT). A subsequent SETUP IRQ would then hit the parked-gate early return and be ignored, relying on SentStall/SetupEnd to clear the flag first. Clear it here so recovery never depends on that ordering. Addresses Copilot review on #3699. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
pyserial's posix write() raises SerialTimeoutException after partial progress with the byte count lost, so the retry loop re-sent from the same offset and could duplicate bytes on the wire — surfacing as bogus data-mismatch failures that look like device firmware bugs. write_timeout is already a total per-call deadline, so the loop added duplication risk without extending the budget: write once and treat a timeout as fatal. Default bumped 2 -> 10 s to keep the old overall bound; HIL_SERIAL_WRITE_DEADLINE removed. The per-character CLI loops keep their existing pacing (the 2 ms sleep between single-byte writes already spaces them on the wire); no unbounded ser.flush()/tcdrain is added. Review follow-up for #3643 (hil_test.py l.257/264 findings). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
48470f4 to
95d6b75
Compare
The actual-STALL path (no deferred SETUP) forced EP0 to IDLE but left rxrdy_parked set if the aborted transfer had parked RXRDY via NAK flow control (e.g. a rejected OUT-data request in DATA_OUT). A subsequent SETUP IRQ would then hit the parked-gate early return and be ignored, relying on SentStall/SetupEnd to clear the flag first. Clear it here so recovery never depends on that ordering. Addresses Copilot review on #3699. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Tested ok with my DFU stress test. |
|
@HiFiPhile please dont merge this yet. I am still in the middle of reviewing this. Claude is doing good job so far but this still need some tweak. |
95d6b75 to
ff57edb
Compare
Pure cleanup, no behavior change: - Extract the EP0 control-transfer state into a named pipe0_state_t typedef instead of an anonymous nested struct, and access it through a local pipe0_state_t* in the functions that touch it repeatedly. - Group the pipe0 fields so the two bools sit together and the larger tusb_control_request_t deferred_setup is last. - Reword the deferral comments: "coalesced" -> "combined". Note: separating the edpt0_xfer DATA_IN/DATA_OUT case (dispatch on state instead of dir_in) was attempted and reverted — it breaks ADI MUSB enumeration. usbd can arm the opposite-direction status while pipe0 is still in a DATA state, and only dir-dispatch routes that correctly; a comment on the combined case records this. Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A short IN control response (device sends fewer bytes than wLength — e.g. the 18-byte device descriptor answering a 64-byte GET_DESCRIPTOR) left remain_wlength != 0, so the DATA_IN -> STATUS_OUT transition never fired and pipe0 stayed in DATA_IN through the status stage. usbd then armed the status-OUT while state was still DATA_IN. Set DATAEND and transition on the last packet: remain_wlength == 0, or a short packet (incl. a terminating ZLP) which ends the data stage. With state now tracking the stage, split edpt0_xfer's DATA handling into separate DATA_IN / DATA_OUT cases dispatching on state (asserting state == call direction) instead of the combined dir_in branch. Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each), including the #3643 high-CPU-load IRQ-toggle coalescing stress. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Rename process_ep0/process_epin/process_epout/process_bus_reset (all invoked only from dcd_int_handler) to *_isr, making their ISR context explicit at every call site. pipe0_process_deferred_setup is left as-is since it also runs from task context (dcd_edpt_stall). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The deferral (RXRDY-combined) and csrl==0 tail paths in process_ep0_isr ran the same per-state status-stage logic. Move all of it into one pipe0_process_status_isr() helper covering every state including DATA_IN, which picks STATUS_OUT vs STATUS_OUT_PENDING_IRQ from deferred_setup_valid (a deferred SETUP means the status confirm was coalesced with it). Both callers now just invoke the helper; the deferral path saves the SETUP and sets deferred_setup_valid first. Also drops the deferral path's TU_ASSERT(remain_wlength == 0), which was wrong for a short last DATA-IN packet, and renames pipe0_process_deferred_setup -> pipe0_try_deferred_setup (it no-ops when nothing is deferred). Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each), including the #3643 high-CPU-load IRQ-toggle coalescing stress. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
pipe0_read_setup() copied the FIFO into a local union, then copied that into the caller's struct. Read the two FIFO words straight into the caller's uint32_t[2] (one copy) and cast to tusb_control_request_t* in pipe0_start_setup(). pipe0.deferred_setup becomes uint32_t[2] so the deferral path reads directly into it as well. Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d9468152c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Mirror the IN-side short-packet fix on the OUT drain: end the data stage (-> STATUS_IN) when wLength is received OR a short OUT packet (count0 < CFG_TUD_ENDPOINT0_SIZE) signals the host's end-of-data, not only when remain_wlength hits exactly 0. Also clamp the remain_wlength subtraction so a host that overruns wLength can't underflow it and strand the transfer. Without this, a control-OUT whose host sends fewer bytes than wLength left pipe0 in DATA_OUT; usbd then armed STATUS IN and tripped the split's TU_ASSERT(!dir_in). Found by /code-review; conformant hosts send exactly wLength so HIL was already green. Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…_isr The helper advances the whole EP0 control state machine on a completion/confirmation IRQ — it dispatches on pipe0->state and also fires the DATA_IN completion, not just the status stage — so "process_status" undersold it. Matches the process_*_isr family. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cleanup from a code-review pass, no behavior change: - Replace the open-coded "last DATA packet" test (remain_wlength == 0 || len < CFG_TUD_ENDPOINT0_SIZE), duplicated in the edpt0_xfer DATA IN arm, pipe0_process_xfer_state_isr, and the DATA OUT drain, with one inline pipe0_data_stage_done() so IN and OUT can't drift. - Correct the xact_len comment (only the IN path reports it; OUT reports count0) and the dcd_edpt_stall comment (a deferred SETUP means the old transfer ended on the wire, not that its status stage was "seen"). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Hooray! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@HiFiPhile I am happy the pr-to-pr now, please review and let me know if your stress test still works. |
HiFiPhile
left a comment
There was a problem hiding this comment.
Looks good and retested.
Follow-up to the review of #3643 (the MUSB EP0 deferred-SETUP feature). It started as fixes for the 10 review findings and grew into a correctness + clarity rework of the EP0 control state machine. Base:
musb_ep0_race.Correctness — EP0 deferred-SETUP races (the #3643 review findings)
STATUS_OUT_PENDINGinto_XFER/_IRQso the deferral never completes+replays while usbd'sedpt0_xfer(STATUS OUT)is still outstanding (was a NULLpipe0.bufdrain).rxrdy_consumedflag — RxPktRdy left set for an already-drained packet gates the guaranteed secondprocess_ep0pass; replayed IN requests stay parked until theedpt0_xfer(DATA IN)arm acks them.DATAEND|RXRDYcan't swallow a SETUP on cores where the CPU-set-only DataEnd reads back 1.dcd_edpt_stall(EP0 OUT)replays the deferred SETUP instead of SendStall-ing the host's already-ACKed next request.Correctness — short control packets
remain_wlength == 0. A short IN response (e.g. an 18-byte device descriptor to a 64-byteGET_DESCRIPTOR) otherwise leftpipe0stuck inDATA_INthrough the status stage.remain_wlength -= tu_min16(count0, remain_wlength)against a host overrun.Refactor / clarity (behavior-neutral)
pipe0_state_ttypedef + localpipe0pointer; group struct fields._isrsuffix on the ISR-context handlers (process_ep0_isr,process_epin_isr,process_epout_isr,process_bus_reset_isr).pipe0_read_setup()(single-copy SETUP read intouint32_t[2]),pipe0_data_stage_done()(the one last-packet predicate), andpipe0_process_xfer_state_isr()unifying the two EP0 tail paths;pipe0_try_deferred_setup()rename.goto, restore/repair load-bearing comments.HIL
hil: make serial write timeout fatal— pyserial loses the partial-write count on aSerialTimeoutException, so retrying duplicated bytes; write once and fail on timeout.Verification
pre-commit run --all-filesclean (format, codespell, Ceedling unit tests); full example sets build for both MUSB families.device/dfu, MinSizeRel):dcd_musb.c.text +~180 B vsmusb_ep0_race.A separate buffer-overflow hardening found during this review (usbd copies an OUT data packet that overruns wLength into the caller buffer) is split out as #3705 since it's in the shared control handler, not MUSB-specific.
🤖 Generated with Claude Code