dwc2: preserve EP0 status completion before SETUP#3634
Conversation
On STM32 DWC2, SETUP phase done and EP0 OUT transfer complete can be reported together. Processing SETUP first can overwrite control state before the previous zero-length OUT status stage is acknowledged, which causes DFU DNLOAD/GETSTATUS traffic to lose the status ACK and stall. Queue the EP0 OUT zero-length transfer completion before queuing the SETUP event when the endpoint has no pending OUT data and total_len is zero. This keeps TinyUSB control-transfer ordering intact for the combined interrupt case.
There was a problem hiding this comment.
Pull request overview
This PR adjusts the DWC2 (STM32) device controller driver to preserve TinyUSB control-transfer event ordering when DWC2 reports EP0 OUT transfer completion and SETUP phase done in the same interrupt, avoiding loss of the EP0 OUT status-stage ACK (notably impacting DFU DNLOAD/GETSTATUS sequences).
Changes:
- Detect the combined
setup_phase_done + xfer_completeinterrupt case on EP0 OUT with zero total length. - Enqueue an EP0 OUT zero-length
xfer_completeevent before enqueuing the SETUP event (in both slave and DMA paths).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Track when an EP0 OUT zero-length transfer is actually armed and require that state before synthesizing a status-stage completion ahead of a co-reported SETUP event. This preserves the validated status-before-SETUP ordering fix while avoiding stale zero-length state from producing spurious EP0 OUT completions.
|
| target | .text | .rodata | .data | .bss | total | % diff |
|---|---|---|---|---|---|---|
| xmc4500_relax/hid_generic_inout | 12,964 → 12,900 (-64) | — | — | — | 25,980 → 25,852 (-128) | -0.5% |
| at_start_f435/dfu_runtime | 11,336 → 11,276 (-60) | — | — | — | 12,372 → 12,312 (-60) | -0.5% |
| at_start_f402/dfu_runtime | 11,908 → 11,848 (-60) | — | — | — | 12,920 → 12,860 (-60) | -0.5% |
| xmc4500_relax/hid_multiple_interface | 13,776 → 13,712 (-64) | — | — | — | 27,604 → 27,476 (-128) | -0.5% |
| xmc4500_relax/hid_boot_interface | 13,796 → 13,732 (-64) | — | — | — | 27,644 → 27,516 (-128) | -0.5% |
| sipeed_longan_nano/dfu_runtime | 12,616 → 12,556 (-60) | 594 → 590 (-4) | — | — | 13,878 → 13,814 (-64) | -0.5% |
| at_start_f435/hid_generic_inout | 12,400 → 12,340 (-60) | — | — | — | 13,264 → 13,204 (-60) | -0.5% |
| at_start_f402/hid_generic_inout | 12,972 → 12,912 (-60) | — | — | — | 13,812 → 13,752 (-60) | -0.4% |
| xmc4500_relax/cdc_dual_ports | 14,972 → 14,908 (-64) | — | — | — | 29,996 → 29,868 (-128) | -0.4% |
| xmc4500_relax/printer_to_cdc | 15,032 → 14,968 (-64) | — | — | — | 30,112 → 29,984 (-128) | -0.4% |
|
Thank you, with the test method in #3643 DFU has no more error. Are you still working on it ? |
|
Nice to hear! I have it currently paused, need to finish a MVP product first, then I can test in more detail and look further into it, might take a week or two. But I can briefly try with the PR branch you linked on Monday if our functionality on STM32U5 with DFU still works and report back. |
|
@HiFiPhile I now had time to take a look: Using the branch from #3643, DFU works. But, the fix is not in this PR, I think the actual fix is in previous #3640 with DWC2 changes, using commit |
|
Well, I now realised my dfu-util I tried with different USB hubs and cables, but currently even the old commit Since I don't have a USB sniffer to actually look into the protocol details, I can only test functionality. I'll see if I can find out what needs to change to trigger the DFU failure again. Maybe with delaying the USB interrupt handling within my application I could force the error again. |
|
please allow maintainer write access to PR, I made a few simplifications. |
|
Due to the PR being created from an organization, it's not possible to give PR write access, but I've invited you with write rights to the fork. You may change it there, or if you better want to create a new clean PR with your suggested implementation, thats also fine for me. I have now created a minimal example for STM32U5G9J-DK2 board and tested different commits from before my fix, with my fix or with your fix, but the results are not conclusive: Only broadly delaying the interrupt handling is not a good test: https://github.com/HKM-Messtechnik/tinyusb-stm32u5-dfu-test/tree/main/logs |
|
Update: There was a sniffer PING filter issue 🤦 The failure was due to in tinyusb/src/portable/synopsys/dwc2/dcd_dwc2.c Line 964 in 4c87db3 Please try latest commit, DMA should not be affected. |
d959ef1 to
cd4ca10
Compare
Signed-off-by: HiFiPhile <admin@hifiphile.com>
cd4ca10 to
c240169
Compare
|
My existing test was insufficient and the results not very conclusive, but I have now updated my test repo with a more targeted test: Instead of broadly delaying every USB IRQ, the repro now wraps Test matrix: DFU, Results:
For failing runs the trace shows This matches your analysis: my guarded I have not looked into any DMA details myself. |
Signed-off-by: HiFiPhile <admin@hifiphile.com>
Signed-off-by: HiFiPhile <admin@hifiphile.com>
|
With e520cff I've further simplified the fix. Thanks to your test setup, I've checked my DMA fix
|
|
Sorry for the delayed response now. And have you tested with changing Maybe it could make sense to add a test to the regular HIL, to not only test basic DFU functionality but also have one fixed stress test baseline and prevent possible future regression? I guess a single delayed test could already be sufficient and shouldn't take too much time. |
Yep that's how I tested DMA with your setup, have maybe 1GB data transferred. The DCACHE on U5 is only for external memory like FMC. Actually I'm working on move transfer management into
Already working on F4 (GenID 2.81A) but not yet onwards. |
- GRXSTSP register has internal FIFO, receiving events won't mix up (STATUS OUT & next SETUP) - Improve efficiency, remove 2nd IRQ overhead Signed-off-by: HiFiPhile <admin@hifiphile.com>
Get it done and tested on F407 (2.81A), F746 (3.20A), F723 (3.30A), ESP32S3/ESP32P4 (4.00A), U5A5 (4.11A) with major examples. |
Signed-off-by: HiFiPhile <admin@hifiphile.com>
Signed-off-by: HiFiPhile <admin@hifiphile.com>
To avoid STATUS IN completion of previous control transfer treated as next DATA IN when IRQ latency is high. Signed-off-by: HiFiPhile <admin@hifiphile.com>
Signed-off-by: HiFiPhile <admin@hifiphile.com>
867e775 to
08381d4
Compare
|
Nice, thanks for the work on this! |
|
Thank you for your professionalism ! |






Problem
This started from a timing-sensitive STM32 DWC2 EP0 failure observed during DFU transfers.
With enough interrupt latency, the host could complete the status stage of one control transfer and send the next SETUP before the device driver has processed all pending DWC2 events. TinyUSB could then report EP0 events to the control/DFU state machine in the wrong order.
The visible DFU symptom for me was that the next
DNLOADSETUP could arrive while DFU was still in the previous transfer state, causing the request to be rejected/stalled anddfu-utilto reportLIBUSB_ERROR_PIPE.A related DMA-mode race was also discussed in #3646: EP0 OUT could be re-primed for the next SETUP before the previous status OUT completion accounting consumed the old
DOEPTSIZ, leading to incorrect transferred length accounting.Final Approach
The final merged fix handles DWC2 EP0 ordering closer to the hardware event source than my first attempts.
For slave mode, OUT transfer management is moved into
RXFLVL/GRXSTSPhandling. TheGRXSTSPreceive-status FIFO preserves the order of RX events, so STATUS OUT and the next SETUP are no longer mixed through later OUT endpoint interrupt handling.The final DWC2 changes include:
SETUP_RXvsSETUP_DONEhandle_epout_slave()dma_setup_prepare()is not called from EP0 IN completion before OUT/status accounting is completeValidation
A minimal STM32U5G9J-DK2 DFU-only repro was used with a targeted EP0 status stress test.
The test wraps
dcd_edpt_xfer()and delays only after arming EP0 OUT status ZLP. It also traces TinyUSB DCD/DFU event ordering so failures can be checked for:DNLOADreceived while EP0 status is still pendingDNLOADrejectedFor failing runs, the trace showed SETUP-before-status ordering and DFU
DNLOADrejection while EP0 status was still pending.The DMA path was also tested with the same setup by enabling
CFG_TUD_DWC2_DMA_ENABLE.The final RXFLVL-based implementation was additionally tested by @HiFiPhile on multiple DWC2 revisions / platforms.