Skip to content

dcd/musb: harden & refactor the EP0 control state machine (review follow-up for #3643)#3699

Merged
HiFiPhile merged 17 commits into
musb_ep0_racefrom
claude/musb-ep0-review-fixes
Jun 16, 2026
Merged

dcd/musb: harden & refactor the EP0 control state machine (review follow-up for #3643)#3699
HiFiPhile merged 17 commits into
musb_ep0_racefrom
claude/musb-ep0-review-fixes

Conversation

@hathach

@hathach hathach commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Follow-up to the review of #3643 (the MUSB EP0 deferred-SETUP feature). It started as fixes for the 10 review findings and grew into a correctness + clarity rework of the EP0 control state machine. Base: musb_ep0_race.

Correctness — EP0 deferred-SETUP races (the #3643 review findings)

  • IRQ-first replay race: split STATUS_OUT_PENDING into _XFER / _IRQ so the deferral never completes+replays while usbd's edpt0_xfer(STATUS OUT) is still outstanding (was a NULL pipe0.buf drain).
  • Stale RXRDY re-entry: new rxrdy_consumed flag — RxPktRdy left set for an already-drained packet gates the guaranteed second process_ep0 pass; replayed IN requests stay parked until the edpt0_xfer(DATA IN) arm acks them.
  • DATAEND guard ordering: check SentStall/SetupEnd first (§21.1.5), and evaluate the DATAEND guard after the RXRDY block so a coalesced DATAEND|RXRDY can't swallow a SETUP on cores where the CPU-set-only DataEnd reads back 1.
  • Stall vs ACKed SETUP: dcd_edpt_stall(EP0 OUT) replays the deferred SETUP instead of SendStall-ing the host's already-ACKed next request.

Correctness — short control packets

  • End the EP0 IN data stage on the last packet — wLength met or a short packet (incl. ZLP) — not only when remain_wlength == 0. A short IN response (e.g. an 18-byte device descriptor to a 64-byte GET_DESCRIPTOR) otherwise left pipe0 stuck in DATA_IN through the status stage.
  • Mirror it on the OUT drain, and clamp remain_wlength -= tu_min16(count0, remain_wlength) against a host overrun.

Refactor / clarity (behavior-neutral)

  • pipe0_state_t typedef + local pipe0 pointer; group struct fields.
  • _isr suffix on the ISR-context handlers (process_ep0_isr, process_epin_isr, process_epout_isr, process_bus_reset_isr).
  • Helpers: pipe0_read_setup() (single-copy SETUP read into uint32_t[2]), pipe0_data_stage_done() (the one last-packet predicate), and pipe0_process_xfer_state_isr() unifying the two EP0 tail paths; pipe0_try_deferred_setup() rename.
  • Drop the goto, restore/repair load-bearing comments.

HIL

  • hil: make serial write timeout fatal — pyserial loses the partial-write count on a SerialTimeoutException, so retrying duplicated bytes; write once and fail on timeout.

Verification

  • HIL full device suite: ek_tm4c123gxl 13/13, max32666fthr 13/13, including dcd/musb: defer EP0 SETUP during DATA_IN/STATUS race #3643's high-CPU-load IRQ-toggle coalescing stress (which exercises the deferred-SETUP/short-packet paths).
  • pre-commit run --all-files clean (format, codespell, Ceedling unit tests); full example sets build for both MUSB families.
  • Code size (max32666fthr device/dfu, MinSizeRel): dcd_musb.c .text +~180 B vs musb_ep0_race.

A separate buffer-overflow hardening found during this review (usbd copies an OUT data packet that overruns wLength into the caller buffer) is split out as #3705 since it's in the shared control handler, not MUSB-specific.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 12, 2026 10:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cfde577240

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread test/hil/hil_test.py Outdated
ser.reset_input_buffer()
for ch in 'cat README.TXT\r':
serial_write_all(ser, ch.encode())
ser.flush() # drain to the wire so the per-char pacing below actually paces the device

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound serial flushes used for pacing

In the host MSC shell test, if the CDC target stops draining after a byte is queued, ser.write() can return but this new ser.flush() calls pyserial's unbounded drain path and does not honor HIL_SERIAL_WRITE_TIMEOUT, so the pytest worker can hang here instead of failing after the configured serial-write timeout. This reintroduces the unbounded serial-write hang for these per-character loops (also in the dd 1024 loop below); keep the pacing without flush() or wrap the drain in an explicit timeout.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — tcdrain is unbounded and ignores write_timeout, so it would reintroduce the hang serial_write_all exists to prevent. Per maintainer direction the flush is dropped entirely rather than bounded: the per-character loops keep their existing 2 ms inter-byte sleep, which already spaces single-byte writes on the wire (coalescing needs two bytes queued within one ~1 ms USB frame, which the sleep prevents). Net result — the per-char loops are unchanged from #3643; only serial_write_all becomes single-write-timeout-fatal. Fixed in 95d6b75.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a follow-up to #3643 that further hardens the MUSB EP0 control-transfer state machine against deferred-SETUP race conditions, and adjusts HIL serial I/O to avoid data duplication on write timeouts while restoring per-character pacing where required.

Changes:

  • Refines EP0 STATUS_OUT deferral handling by splitting pending states and introducing an rxrdy_parked gate to prevent stale RXRDY re-entry/replay races.
  • Refactors EP0 SETUP FIFO draining into a shared helper (pipe0_read_setup) and adjusts CSRL0 guard ordering per MUSB spec guidance.
  • Updates HIL serial write behavior to treat write timeouts as fatal (no retry) and restores flush() in specific per-character pacing loops.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/portable/mentor/musb/dcd_musb.c Reworks EP0 state transitions and deferred SETUP replay logic to close IRQ/xfer ordering races; adds RXRDY parking mechanism and shared SETUP read helper.
test/hil/hil_test.py Simplifies serial write to avoid retry-based duplication on timeouts and restores flush-based pacing for specific per-character command loops.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/portable/mentor/musb/dcd_musb.c
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

TinyUSB Average Code Size Metrics

File .text .rodata .data .bss size %
ehci.c 2763 0 0 6274 7783 4.4%
midi2_host.c 1802 0 0 5921 7723 4.3%
cdc_host.c 6399 487 15 971 7594 4.3%
usbh.c 4967 57 82 1165 6238 3.5%
ncm_device.c 1754 28 815 4393 6160 3.5%
hcd_dwc2.c 5070 25 1 545 5640 3.2%
video_device.c 4432 5 1235 480 4904 2.8%
midi_host.c 1339 7 7 3538 4880 2.7%
audio_device.c 2890 0 1259 1623 4508 2.5%
dcd_dwc2.c 4223 19 0 265 4506 2.5%
ohci.c 1925 0 0 2503 4428 2.5%
dcd_ch32_usbfs.c 1582 0 0 2444 4026 2.3%
usbd.c 3515 57 90 355 3933 2.2%
ecm_rndis_device.c 1059 0 1 2759 3818 2.1%
hcd_stm32_fsdev.c 3248 0 1 420 3670 2.1%
dcd_ft9xx.c 3284 0 0 172 3456 1.9%
msc_device.c 2513 108 2281 804 3318 1.9%
dcd_khci.c 1952 0 0 1290 3242 1.8%
dcd_nrf5x.c 2939 0 0 292 3231 1.8%
hcd_musb.c 3071 0 0 157 3228 1.8%
dcd_ci_fs.c 1924 0 0 1290 3214 1.8%
midi2_device.c 2628 52 1175 561 3211 1.8%
dcd_da146xx.c 3067 0 0 144 3211 1.8%
hcd_rusb2.c 2923 0 0 245 3168 1.8%
dcd_rusb2.c 2918 0 0 156 3074 1.7%
hcd_ch32_usbfs.c 2491 0 0 502 2993 1.7%
hcd_khci.c 2443 0 0 454 2897 1.6%
dcd_stm32_fsdev.c 2553 0 0 291 2844 1.6%
dcd_musb.c 2595 0 0 179 2773 1.6%
dcd_mm32f327x_otg.c 1477 0 0 1290 2767 1.6%
hcd_samd.c 2220 0 0 324 2544 1.4%
usbtmc_device.c 2194 24 68 313 2539 1.4%
dcd_ci_hs.c 1756 0 0 1344 2534 1.4%
dcd_eptri.c 2273 0 0 259 2532 1.4%
hid_host.c 1241 0 0 1270 2511 1.4%
dcd_ch32_usbhs.c 1892 0 0 481 2373 1.3%
hcd_rp2040.c 1996 17 4 321 2338 1.3%
mtp_device.c 1713 22 743 589 2309 1.3%
dcd_rp2040.c 840 0 764 653 2257 1.3%
msc_host.c 1633 0 0 395 2028 1.1%
dcd_msp430x5xx.c 1801 0 0 176 1977 1.1%
cdc_device.c 1236 16 1092 728 1963 1.1%
dcd_lpc17_40.c 1481 0 0 648 1805 1.0%
midi_device.c 1149 0 1007 619 1765 1.0%
dcd_nuc505.c 0 0 1533 157 1690 0.9%
dcd_lpc_ip3511.c 1463 0 0 264 1683 0.9%
hub.c 1384 8 8 30 1419 0.8%
dcd_samg.c 1322 0 0 72 1394 0.8%
printer_device.c 828 0 706 560 1387 0.8%
dcd_samd.c 1036 0 0 266 1302 0.7%
dcd_nuc121.c 1170 0 0 101 1270 0.7%
hid_device.c 1123 44 997 119 1242 0.7%
vendor_device.c 639 0 534 559 1197 0.7%
dcd_nuc120.c 1096 0 0 78 1174 0.7%
rp2040_usb.c 386 35 619 11 1051 0.6%
dfu_device.c 776 28 712 136 912 0.5%
tusb_fifo.c 855 0 486 0 850 0.5%
typec_stm32.c 820 8 2 12 842 0.5%
dwc2_common.c 603 22 0 0 615 0.3%
usbc.c 420 2 20 166 608 0.3%
hcd_pio_usb.c 262 0 240 0 502 0.3%
tusb.c 455 0 387 3 457 0.3%
hcd_ci_hs.c 181 0 0 0 181 0.1%
fsdev_common.c 180 0 0 0 180 0.1%
rusb2_common.c 160 0 16 0 176 0.1%
dfu_rt_device.c 157 0 134 0 157 0.1%
TOTAL 124487 1071 17034 52137 178202 100.0%
Input files
  • cmake-build/cmake-build-adafruit_clue/metrics.json
  • cmake-build/cmake-build-apard32690/metrics.json
  • cmake-build/cmake-build-at32f403a_weact_blackpill/metrics.json
  • cmake-build/cmake-build-at_start_f402/metrics.json
  • cmake-build/cmake-build-at_start_f413/metrics.json
  • cmake-build/cmake-build-at_start_f415/metrics.json
  • cmake-build/cmake-build-at_start_f423/metrics.json
  • cmake-build/cmake-build-at_start_f425/metrics.json
  • cmake-build/cmake-build-at_start_f435/metrics.json
  • cmake-build/cmake-build-at_start_f455/metrics.json
  • cmake-build/cmake-build-b_g474e_dpow1/metrics.json
  • cmake-build/cmake-build-b_u585i_iot2a/metrics.json
  • cmake-build/cmake-build-ch32f205r-r0/metrics.json
  • cmake-build/cmake-build-ch32v103c_bluepill/metrics.json
  • cmake-build/cmake-build-ch32v203c_r0_1v0/metrics.json
  • cmake-build/cmake-build-ch32v307v_r1_1v0/metrics.json
  • cmake-build/cmake-build-cynthion_d11/metrics.json
  • cmake-build/cmake-build-da14695_dk_usb/metrics.json
  • cmake-build/cmake-build-double_m33_express/metrics.json
  • cmake-build/cmake-build-ea4088_quickstart/metrics.json
  • cmake-build/cmake-build-ea4357/metrics.json
  • cmake-build/cmake-build-ek_tm4c123gxl/metrics.json
  • cmake-build/cmake-build-feather_stm32f405/metrics.json
  • cmake-build/cmake-build-fomu/metrics.json
  • cmake-build/cmake-build-frdm_k32l2a4s/metrics.json
  • cmake-build/cmake-build-frdm_k64f/metrics.json
  • cmake-build/cmake-build-frdm_kl25z/metrics.json
  • cmake-build/cmake-build-frdm_mcxa153/metrics.json
  • cmake-build/cmake-build-frdm_rw612/metrics.json
  • cmake-build/cmake-build-hpm6750evk2/metrics.json
  • cmake-build/cmake-build-lpcxpresso11u37/metrics.json
  • cmake-build/cmake-build-lpcxpresso1347/metrics.json
  • cmake-build/cmake-build-lpcxpresso1549/metrics.json
  • cmake-build/cmake-build-lpcxpresso1769/metrics.json
  • cmake-build/cmake-build-lpcxpresso18s37/metrics.json
  • cmake-build/cmake-build-lpcxpresso51u68/metrics.json
  • cmake-build/cmake-build-lpcxpresso54114/metrics.json
  • cmake-build/cmake-build-metro_m0_express/metrics.json
  • cmake-build/cmake-build-metro_m4_express/metrics.json
  • cmake-build/cmake-build-metro_m7_1011/metrics.json
  • cmake-build/cmake-build-mm32f327x_mb39/metrics.json
  • cmake-build/cmake-build-mm900evxb/metrics.json
  • cmake-build/cmake-build-msp_exp430f5529lp/metrics.json
  • cmake-build/cmake-build-msp_exp432e401y/metrics.json
  • cmake-build/cmake-build-nutiny_nuc126v/metrics.json
  • cmake-build/cmake-build-nutiny_sdk_nuc120/metrics.json
  • cmake-build/cmake-build-nutiny_sdk_nuc121/metrics.json
  • cmake-build/cmake-build-nutiny_sdk_nuc505/metrics.json
  • cmake-build/cmake-build-portenta_c33/metrics.json
  • cmake-build/cmake-build-raspberry_pi_pico/metrics.json
  • cmake-build/cmake-build-raspberrypi_cm4/metrics.json
  • cmake-build/cmake-build-raspberrypi_zero/metrics.json
  • cmake-build/cmake-build-samg55_xplained/metrics.json
  • cmake-build/cmake-build-sipeed_longan_nano/metrics.json
  • cmake-build/cmake-build-stlinkv3mini/metrics.json
  • cmake-build/cmake-build-stm32c071nucleo/metrics.json
  • cmake-build/cmake-build-stm32c542nucleo/metrics.json
  • cmake-build/cmake-build-stm32f070rbnucleo/metrics.json
  • cmake-build/cmake-build-stm32f103_bluepill/metrics.json
  • cmake-build/cmake-build-stm32f207nucleo/metrics.json
  • cmake-build/cmake-build-stm32f303disco/metrics.json
  • cmake-build/cmake-build-stm32g0b1nucleo/metrics.json
  • cmake-build/cmake-build-stm32h503nucleo/metrics.json
  • cmake-build/cmake-build-stm32h743eval/metrics.json
  • cmake-build/cmake-build-stm32h7s3nucleo/metrics.json
  • cmake-build/cmake-build-stm32l052dap52/metrics.json
  • cmake-build/cmake-build-stm32l412nucleo/metrics.json
  • cmake-build/cmake-build-stm32n6570dk/metrics.json
  • cmake-build/cmake-build-stm32u083cdk/metrics.json
  • cmake-build/cmake-build-stm32wb55nucleo/metrics.json
  • cmake-build/cmake-build-stm32wba_nucleo/metrics.json
  • cmake-build/cmake-build-xmc4500_relax/metrics.json

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

Hardware-in-the-loop (HIL) Test Report

hfp-iar

Board cdc_dual_ports dfu cdc_msc cdc_msc_throughput audio_test_freertos dfu_runtime cdc_msc_freertos hid_boot_interface msc_dual_lun hid_generic_inout printer_to_cdc midi_test mtp
stm32l412nucleo ✅ CDC 619k/396k MSC 796k/720k
stm32f746disco ✅ CDC 12.4M/9.1M MSC 14.7M/29.2M
stm32f746disco-DMA ✅ CDC 13.6M/9.2M MSC 18.3M/29.9M
lpcxpresso43s67 ✅ CDC 11M/10.7M MSC 23.5M/25.3M

Legend: ✅ pass · ❌ fail · ⚪ skipped · blank not run

hfp.json

Board cdc_dual_ports dfu cdc_msc cdc_msc_throughput audio_test_freertos dfu_runtime cdc_msc_freertos hid_boot_interface msc_dual_lun hid_generic_inout printer_to_cdc midi_test mtp
stm32l412nucleo ✅ CDC 629k/402k MSC 794k/775k
stm32f746disco ✅ CDC 12.2M/8.6M MSC 13.5M/28M
stm32f746disco-DMA ✅ CDC 12.9M/10.5M MSC 15.6M/31.4M
lpcxpresso43s67 ✅ CDC 12.3M/12.3M MSC 24.3M/24.7M

Legend: ✅ pass · ❌ fail · ⚪ skipped · blank not run

tinyusb.json

Board cdc_dual_ports dfu cdc_msc cdc_msc_throughput audio_test_freertos dfu_runtime cdc_msc_freertos hid_boot_interface msc_dual_lun hid_generic_inout printer_to_cdc midi_test mtp host_info_to_device_cdc cdc_msc_hid msc_file_explorer msc_file_explorer_freertos device_info hid_composite_freertos
ek_tm4c123gxl ✅ CDC 735k/780k MSC 1M/1M
espressif_p4_function_ev rd 409KB/s
espressif_p4_function_ev-DMA rd 409KB/s
espressif_s3_devkitm rd 409KB/s
espressif_s3_devkitm-DMA
feather_nrf52840_express ✅ CDC 488k/550k MSC 524k/511k
max32666fthr ✅ CDC 7M/14.3M MSC 17M/20.6M
metro_m4_express ✅ CDC 508k/511k MSC 565k/490k
mimxrt1015_evk ✅ CDC 11M/7.1M MSC 23.8M/17.5M
mimxrt1064_evk ✅ CDC 14.3M/7.1M MSC 25.4M/21.5M rd 1361KB/s rd 1339KB/s
lpcxpresso11u37 ✅ CDC 491k/299k MSC 393k/511k
ra4m1_ek ✅ CDC 508k/511k MSC 511k/511k
raspberry_pi_pico ✅ CDC 680k/496k MSC 845k/1M
raspberry_pi_pico_w rd 1106KB/s rd 1022KB/s
raspberry_pi_pico2 rd 1110KB/s rd 1022KB/s
adafruit_fruit_jam ✅ CDC 959k/556k MSC 996k/1M rd 62KB/s rd 62KB/s
stm32f072disco ✅ CDC 504k/303k MSC 488k/398k
stm32f407disco ✅ CDC 963k/552k MSC 1M/1M
stm32f723disco ✅ CDC 865k/778k MSC 889k/1M rd 17476KB/s rd 4096KB/s
stm32f723disco-DMA ✅ CDC 1M/598k MSC 1.1M/1M rd 19418KB/s rd 4096KB/s
stm32h743nucleo ✅ CDC 508k/511k MSC 537k/538k
stm32h743nucleo-DMA ✅ CDC 516k/344k MSC 576k/597k
stm32g0b1nucleo ✅ CDC 508k/511k MSC 552k/437k
stm32l476disco ✅ CDC 532k/560k MSC 552k/512k
stm32u083nucleo ✅ CDC 595k/511k MSC 888k/547k

Legend: ✅ pass · ❌ fail · ⚪ skipped · blank not run

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

MemBrowse Memory Report

Top 10 targets by memory change (%) (out of 2384 targets) View Project Dashboard →

target .text .rodata .data .bss total % diff
ek_tm4c123gxl/dfu_runtime 8,772 → 8,916 (+144) 8,804 → 8,948 (+144) +1.6%
ek_tm4c123gxl/hid_generic_inout 9,768 → 9,912 (+144) 9,800 → 9,944 (+144) +1.5%
msp_exp432e401y/dfu_runtime 8,816 → 8,948 (+132) 9,384 → 9,516 (+132) +1.4%
ek_tm4c123gxl/hid_multiple_interface 10,580 → 10,724 (+144) 10,612 → 10,756 (+144) +1.4%
ek_tm4c123gxl/hid_boot_interface 10,592 → 10,736 (+144) 10,624 → 10,768 (+144) +1.4%
ek_tm4c123gxl/hid_composite 10,756 → 10,900 (+144) 10,788 → 10,932 (+144) +1.3%
ek_tm4c123gxl/midi_test 10,900 → 11,044 (+144) 10,932 → 11,076 (+144) +1.3%
msp_exp432e401y/hid_generic_inout 9,868 → 10,000 (+132) 10,436 → 10,568 (+132) +1.3%
ek_tm4c123gxl/cdc_dual_ports 11,752 → 11,896 (+144) 11,784 → 11,928 (+144) +1.2%
ek_tm4c123gxl/printer_to_cdc 11,816 → 11,960 (+144) 11,844 → 11,988 (+144) +1.2%

@hathach

hathach commented Jun 12, 2026

Copy link
Copy Markdown
Owner Author

CI status: 315/317 pass. The two failures are infrastructure, not this PR:

  • hil-tinyusb (tinyusb.json) — all failures are the raspberry_pi_pico fixture's host-mode tests: the CH340 CDC device enumerates every time, the General UDisk (048d:04d2) on the pico's PIO-USB port never appears. Identical signature on Clean up PVS-Studio static-analysis findings for raspberry_pi_pico #3697 yesterday, on a local rig run, and on two CI runs of this PR (incl. a --failed rerun) — the stick looks genuinely wedged/loose rather than flaky, and there is no software power-cycle path to it (it hangs off the pico's own host port). Needs a physical reseat. Both MUSB boards this PR targets are fully green in the same job: ek_tm4c123gxl 13/13, max32666fthr 13/13 (dfu, cdc_msc, throughput, dual ports, printer_to_cdc, midi, mtp, hid, msc_dual_lun). The hfp.json HIL config also passes.
  • ci/circleci: build---one-random-make-nrf-large-arm-gcc — died in "Restore Toolchain Cache" → build canceled (CircleCI infra; nRF is untouched here). Needs a rerun from the CircleCI UI — no token on the rig to do it via API.

🤖 Generated with Claude Code

@hathach hathach force-pushed the claude/musb-ep0-review-fixes branch from cfde577 to 2c4b905 Compare June 12, 2026 17:22
@hathach

hathach commented Jun 12, 2026

Copy link
Copy Markdown
Owner Author

History rewritten into one commit per review finding (tree is byte-identical to the previously HIL-tested state; each intermediate commit compiles):

  • 6ecd5c6 dcd/musb: extract pipe0_read_setup() helper (l.507)
  • bf6f3fc dcd/musb: restore EP0 OUT RXRDY flow-control comment (l.116)
  • b7715a9 dcd/musb: replace deferral goto with per-state handling (l.523)
  • da749ee dcd/musb: check SentStall/SetupEnd before DATAEND guard (l.445)
  • ee3dd01 dcd/musb: fix deferred-SETUP replay racing usbd's status call (l.503)
  • 851ebe6 dcd/musb: gate stale EP0 RXRDY interrupts with rxrdy_parked (l.516)
  • 92da00c dcd/musb: replay deferred SETUP instead of stalling EP0 (l.860)
  • 2c4b905 hil: make serial write timeout fatal, restore per-char flush (l.257/264/800)

🤖 Generated with Claude Code

hathach added a commit that referenced this pull request Jun 13, 2026
The actual-STALL path (no deferred SETUP) forced EP0 to IDLE but left
rxrdy_parked set if the aborted transfer had parked RXRDY via NAK flow
control (e.g. a rejected OUT-data request in DATA_OUT). A subsequent
SETUP IRQ would then hit the parked-gate early return and be ignored,
relying on SentStall/SetupEnd to clear the flag first. Clear it here
so recovery never depends on that ordering.

Addresses Copilot review on #3699.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
hathach added a commit that referenced this pull request Jun 13, 2026
The per-character CLI loops (cat README.TXT, dd 1024) used ser.flush()
to push each byte to the wire before the pacing sleep, but flush()
calls tcdrain which blocks unbounded and ignores write_timeout — a
wedged CDC target would hang the worker instead of failing, the exact
hang serial_write_all was added to avoid. Replace it with a bounded
drain that polls ser.out_waiting until the kernel TX queue empties or
SERIAL_WRITE_TIMEOUT elapses, preserving the pacing guarantee.

Addresses Codex review on #3699.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
hathach and others added 9 commits June 13, 2026 23:34
The 8-byte EP0 SETUP drain (count0 assert + two FIFO word reads via a
union) was duplicated verbatim between the IDLE case and the deferral
case; a future fix applied to one copy but not the other would only
show up on the rare deferred-race path. Share one helper. count0 is
now read inside the only remaining user (DATA OUT drain).

Review follow-up for #3643 (dcd_musb.c l.507 finding).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The pre-existing comment explaining why the OUT branch does not ack
RxPktRdy was dropped when the SETUP handling moved into
pipe0_start_setup(). It is load-bearing: acking before edpt0_xfer()
arms the drain buffer would let the host send data with nowhere to
put it. Restore it with the databook-deviation rationale so the
branches don't get "unified" later.

Review follow-up for #3643 (dcd_musb.c l.116 finding).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The goto jumped into the csrl==0 completion switch with RXRDY still
set, making its "When CSRL0 is zero" guard comment untrue on that
path. Handle each deferral state in a self-contained switch instead;
the csrl==0 switch is now only reached with csrl==0 and its comment
is truthful again. Behavior unchanged.

Review follow-up for #3643 (dcd_musb.c l.523 finding).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
MUSBMHDRC 21.1.5 requires the EP0 service routine to check SentStall
and SetupEnd first; the early DATAEND return ran before both, and
SentStall is most likely to fire exactly while DataEnd may still read
back set (auto-STALL after DataEnd, 21.1.7), which would skip the
recovery. The guard also moves below the RXRDY block so a coalesced
DATAEND|RXRDY read cannot swallow a SETUP on cores where the
CPU-set-only DataEnd bit reads back 1; the comment documents the
vendor-dependent read-back.

Review follow-up for #3643 (dcd_musb.c l.445 finding).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
STATUS_OUT_PENDING conflated "edpt0_xfer(STATUS OUT) called, awaiting
confirm IRQ" with "confirm IRQ seen, awaiting edpt0_xfer". The
deferral path completed the status and replayed the saved SETUP from
the ISR in both flavors; in the IRQ-first one, usbd's still-
outstanding edpt0_xfer(STATUS OUT) for the old transfer (queued via
status_stage_xact) then landed in the replayed transfer's state and
corrupted it: NULL pipe0.buf armed plus RXRDYC, so the host's next
DATA OUT drained through a NULL pointer. usbd processes EP0
XFER_COMPLETE events unconditionally, so nothing downstream defuses
it.

Split the state into STATUS_OUT_PENDING_XFER / _IRQ. The deferral
completes and replays only in PENDING_XFER (old transfer already
retired); in PENDING_IRQ it only holds the SETUP and the usbd-driven
edpt0_xfer fires the completion and replays. The DATA_IN deferral now
synthesizes PENDING_IRQ (its remain==0 invariant asserted: a SETUP
before DataEnd raises SetupEnd instead), which also makes the old
deferred-promotion in the csrl==0 DATA_IN case unreachable - dropped.
Assert the drain buffer before the DATA OUT FIFO read as a cheap
backstop for this corruption class.

Review follow-up for #3643 (dcd_musb.c l.503 finding).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The deferral path drains the SETUP but leaves RxPktRdy set, and the
SETUP's IRQ latches after the ISR's clear-on-read intr_tx read - so a
second process_ep0 pass (same ISR, via the intr_tx re-read merge) is
guaranteed and misreads the leftovers: count0==0 fires a spurious
DATA OUT completion, the replay's RXRDYC write turns the second pass
into a phantom csrl==0 DATA IN completion, and a zero-length replay
re-enters the deferral case on a drained FIFO (count0 assert or
garbage saved as a SETUP). The registers cannot expose the staleness:
RxPktRdy and count0 read unchanged until ServicedRxPktRdy is written.

Track it in software: rxrdy_consumed means "RxPktRdy is set in hw but
its packet was already consumed". Set wherever a drained packet's
RXRDY is intentionally left set (OUT/zero-length flow-control parks,
every DATA OUT drain awaiting the next arm, the deferral path);
cleared at every RXRDYC write site (edpt0_xfer arms, dcd_set_address,
STALLED/SETEND recovery, bus reset). The RXRDY block returns early
while parked. Replayed IN requests skip the RXRDYC in
pipe0_start_setup and keep the packet parked until the
edpt0_xfer(DATA IN) arm acks it (before loading the shared FIFO), so
the stale pass sees RXRDY+parked instead of csrl==0. The normal IDLE
path is unchanged - master never re-entered these windows because the
single SETUP edge was always consumed by the pass that parked it; the
deferral is what introduced a pending second pass.

Review follow-up for #3643 (dcd_musb.c l.516 finding).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
dcd_edpt_stall(EP0 OUT) discarded the deferred SETUP and armed
SendStall. A deferred SETUP can only exist once the old transfer's
status stage was seen on the wire, so the request usbd is rejecting
(class callback failing at CONTROL_STAGE_DATA) already succeeded
host-side and the hardware already ACKed the next SETUP - the STALL
would land on that innocent request, which then fails host-side
without any tud callback ever seeing it. Skip the stall and replay
the deferred SETUP; the rejected transfer needs no wire-level stall
since it is already over.

Review follow-up for #3643 (dcd_musb.c l.860 finding).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The actual-STALL path (no deferred SETUP) forced EP0 to IDLE but left
rxrdy_consumed set if the aborted transfer had parked RXRDY via NAK flow
control (e.g. a rejected OUT-data request in DATA_OUT). A subsequent
SETUP IRQ would then hit the parked-gate early return and be ignored,
relying on SentStall/SetupEnd to clear the flag first. Clear it here
so recovery never depends on that ordering.

Addresses Copilot review on #3699.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
pyserial's posix write() raises SerialTimeoutException after partial
progress with the byte count lost, so the retry loop re-sent from the
same offset and could duplicate bytes on the wire — surfacing as bogus
data-mismatch failures that look like device firmware bugs.
write_timeout is already a total per-call deadline, so the loop added
duplication risk without extending the budget: write once and treat a
timeout as fatal. Default bumped 2 -> 10 s to keep the old overall
bound; HIL_SERIAL_WRITE_DEADLINE removed.

The per-character CLI loops keep their existing pacing (the 2 ms sleep
between single-byte writes already spaces them on the wire); no
unbounded ser.flush()/tcdrain is added.

Review follow-up for #3643 (hil_test.py l.257/264 findings).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@hathach hathach force-pushed the claude/musb-ep0-review-fixes branch from 48470f4 to 95d6b75 Compare June 13, 2026 16:35
hathach added a commit that referenced this pull request Jun 13, 2026
The actual-STALL path (no deferred SETUP) forced EP0 to IDLE but left
rxrdy_parked set if the aborted transfer had parked RXRDY via NAK flow
control (e.g. a rejected OUT-data request in DATA_OUT). A subsequent
SETUP IRQ would then hit the parked-gate early return and be ignored,
relying on SentStall/SetupEnd to clear the flag first. Clear it here
so recovery never depends on that ordering.

Addresses Copilot review on #3699.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@HiFiPhile

Copy link
Copy Markdown
Collaborator

Tested ok with my DFU stress test.

@hathach

hathach commented Jun 14, 2026

Copy link
Copy Markdown
Owner Author

@HiFiPhile please dont merge this yet. I am still in the middle of reviewing this. Claude is doing good job so far but this still need some tweak.

@hathach hathach force-pushed the claude/musb-ep0-review-fixes branch from 95d6b75 to ff57edb Compare June 15, 2026 03:57
hathach and others added 5 commits June 15, 2026 14:53
Pure cleanup, no behavior change:
- Extract the EP0 control-transfer state into a named pipe0_state_t
  typedef instead of an anonymous nested struct, and access it through
  a local pipe0_state_t* in the functions that touch it repeatedly.
- Group the pipe0 fields so the two bools sit together and the larger
  tusb_control_request_t deferred_setup is last.
- Reword the deferral comments: "coalesced" -> "combined".

Note: separating the edpt0_xfer DATA_IN/DATA_OUT case (dispatch on
state instead of dir_in) was attempted and reverted — it breaks ADI
MUSB enumeration. usbd can arm the opposite-direction status while
pipe0 is still in a DATA state, and only dir-dispatch routes that
correctly; a comment on the combined case records this.

Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A short IN control response (device sends fewer bytes than wLength —
e.g. the 18-byte device descriptor answering a 64-byte GET_DESCRIPTOR)
left remain_wlength != 0, so the DATA_IN -> STATUS_OUT transition never
fired and pipe0 stayed in DATA_IN through the status stage. usbd then
armed the status-OUT while state was still DATA_IN. Set DATAEND and
transition on the last packet: remain_wlength == 0, or a short packet
(incl. a terminating ZLP) which ends the data stage.

With state now tracking the stage, split edpt0_xfer's DATA handling
into separate DATA_IN / DATA_OUT cases dispatching on state (asserting
state == call direction) instead of the combined dir_in branch.

Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each),
including the #3643 high-CPU-load IRQ-toggle coalescing stress.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Rename process_ep0/process_epin/process_epout/process_bus_reset (all
invoked only from dcd_int_handler) to *_isr, making their ISR context
explicit at every call site. pipe0_process_deferred_setup is left
as-is since it also runs from task context (dcd_edpt_stall).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The deferral (RXRDY-combined) and csrl==0 tail paths in process_ep0_isr
ran the same per-state status-stage logic. Move all of it into one
pipe0_process_status_isr() helper covering every state including
DATA_IN, which picks STATUS_OUT vs STATUS_OUT_PENDING_IRQ from
deferred_setup_valid (a deferred SETUP means the status confirm was
coalesced with it). Both callers now just invoke the helper; the
deferral path saves the SETUP and sets deferred_setup_valid first.

Also drops the deferral path's TU_ASSERT(remain_wlength == 0), which
was wrong for a short last DATA-IN packet, and renames
pipe0_process_deferred_setup -> pipe0_try_deferred_setup (it no-ops
when nothing is deferred).

Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each),
including the #3643 high-CPU-load IRQ-toggle coalescing stress.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
pipe0_read_setup() copied the FIFO into a local union, then copied that
into the caller's struct. Read the two FIFO words straight into the
caller's uint32_t[2] (one copy) and cast to tusb_control_request_t* in
pipe0_start_setup(). pipe0.deferred_setup becomes uint32_t[2] so the
deferral path reads directly into it as well.

Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@hathach

hathach commented Jun 15, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d9468152c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread test/hil/hil_test.py
Mirror the IN-side short-packet fix on the OUT drain: end the data
stage (-> STATUS_IN) when wLength is received OR a short OUT packet
(count0 < CFG_TUD_ENDPOINT0_SIZE) signals the host's end-of-data, not
only when remain_wlength hits exactly 0. Also clamp the
remain_wlength subtraction so a host that overruns wLength can't
underflow it and strand the transfer.

Without this, a control-OUT whose host sends fewer bytes than wLength
left pipe0 in DATA_OUT; usbd then armed STATUS IN and tripped the
split's TU_ASSERT(!dir_in). Found by /code-review; conformant hosts
send exactly wLength so HIL was already green.

Verified: HIL pass on ek_tm4c123gxl and max32666fthr (13/13 each).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
hathach and others added 2 commits June 15, 2026 23:26
…_isr

The helper advances the whole EP0 control state machine on a
completion/confirmation IRQ — it dispatches on pipe0->state and also
fires the DATA_IN completion, not just the status stage — so
"process_status" undersold it. Matches the process_*_isr family.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cleanup from a code-review pass, no behavior change:
- Replace the open-coded "last DATA packet" test (remain_wlength == 0 ||
  len < CFG_TUD_ENDPOINT0_SIZE), duplicated in the edpt0_xfer DATA IN
  arm, pipe0_process_xfer_state_isr, and the DATA OUT drain, with one
  inline pipe0_data_stage_done() so IN and OUT can't drift.
- Correct the xact_len comment (only the IN path reports it; OUT reports
  count0) and the dcd_edpt_stall comment (a deferred SETUP means the old
  transfer ended on the wire, not that its status stage was "seen").

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@hathach hathach changed the title dcd/musb: fix EP0 deferred-SETUP races + HIL serial write fixes (review follow-up for #3643) dcd/musb: fix EP0 deferred-SETUP races (review follow-up for #3643) Jun 16, 2026
@hathach

hathach commented Jun 16, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@hathach hathach changed the title dcd/musb: fix EP0 deferred-SETUP races (review follow-up for #3643) dcd/musb: harden & refactor the EP0 control state machine (review follow-up for #3643) Jun 16, 2026
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

Reviewed commit: ba3b2453e7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@hathach

hathach commented Jun 16, 2026

Copy link
Copy Markdown
Owner Author

@HiFiPhile I am happy the pr-to-pr now, please review and let me know if your stress test still works.

@hathach hathach requested a review from HiFiPhile June 16, 2026 08:34

@HiFiPhile HiFiPhile left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and retested.

@HiFiPhile HiFiPhile merged commit e61c676 into musb_ep0_race Jun 16, 2026
544 of 547 checks passed
@HiFiPhile HiFiPhile deleted the claude/musb-ep0-review-fixes branch June 16, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants