Skip to content

dwc2: preserve EP0 status completion before SETUP#3634

Merged
HiFiPhile merged 12 commits into
hathach:masterfrom
HKM-Messtechnik:fix/Dwc2Stm32U5
May 30, 2026
Merged

dwc2: preserve EP0 status completion before SETUP#3634
HiFiPhile merged 12 commits into
hathach:masterfrom
HKM-Messtechnik:fix/Dwc2Stm32U5

Conversation

@Alex-Schaefer

@Alex-Schaefer Alex-Schaefer commented May 9, 2026

Copy link
Copy Markdown
Contributor

Problem

This started from a timing-sensitive STM32 DWC2 EP0 failure observed during DFU transfers.

With enough interrupt latency, the host could complete the status stage of one control transfer and send the next SETUP before the device driver has processed all pending DWC2 events. TinyUSB could then report EP0 events to the control/DFU state machine in the wrong order.

The visible DFU symptom for me was that the next DNLOAD SETUP could arrive while DFU was still in the previous transfer state, causing the request to be rejected/stalled and dfu-util to report LIBUSB_ERROR_PIPE.

A related DMA-mode race was also discussed in #3646: EP0 OUT could be re-primed for the next SETUP before the previous status OUT completion accounting consumed the old DOEPTSIZ, leading to incorrect transferred length accounting.

Final Approach

The final merged fix handles DWC2 EP0 ordering closer to the hardware event source than my first attempts.

For slave mode, OUT transfer management is moved into RXFLVL / GRXSTSP handling. The GRXSTSP receive-status FIFO preserves the order of RX events, so STATUS OUT and the next SETUP are no longer mixed through later OUT endpoint interrupt handling.

The final DWC2 changes include:

  • handling EP0 OUT completion from the RX status path
  • emitting SETUP from the RX status path, with GenID-specific handling for SETUP_RX vs SETUP_DONE
  • removing slave-mode OUT completion handling from handle_epout_slave()
  • processing IN endpoint interrupts before OUT/RXFLVL handling, avoiding stale STATUS IN completion being treated as a later DATA IN completion under high interrupt latency
  • adjusting DMA EP0 setup re-priming so dma_setup_prepare() is not called from EP0 IN completion before OUT/status accounting is complete

Validation

A minimal STM32U5G9J-DK2 DFU-only repro was used with a targeted EP0 status stress test.

The test wraps dcd_edpt_xfer() and delays only after arming EP0 OUT status ZLP. It also traces TinyUSB DCD/DFU event ordering so failures can be checked for:

  • SETUP emitted before EP0 status completion
  • DFU DNLOAD received while EP0 status is still pending
  • DFU DNLOAD rejected

For failing runs, the trace showed SETUP-before-status ordering and DFU DNLOAD rejection while EP0 status was still pending.

The DMA path was also tested with the same setup by enabling CFG_TUD_DWC2_DMA_ENABLE.

The final RXFLVL-based implementation was additionally tested by @HiFiPhile on multiple DWC2 revisions / platforms.

On STM32 DWC2, SETUP phase done and EP0 OUT transfer complete can be
reported together. Processing SETUP first can overwrite control state
before the previous zero-length OUT status stage is acknowledged, which
causes DFU DNLOAD/GETSTATUS traffic to lose the status ACK and stall.

Queue the EP0 OUT zero-length transfer completion before queuing the
SETUP event when the endpoint has no pending OUT data and total_len is
zero. This keeps TinyUSB control-transfer ordering intact for the
combined interrupt case.
Copilot AI review requested due to automatic review settings May 9, 2026 07:14
@Alex-Schaefer Alex-Schaefer marked this pull request as draft May 9, 2026 07:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the DWC2 (STM32) device controller driver to preserve TinyUSB control-transfer event ordering when DWC2 reports EP0 OUT transfer completion and SETUP phase done in the same interrupt, avoiding loss of the EP0 OUT status-stage ACK (notably impacting DFU DNLOAD/GETSTATUS sequences).

Changes:

  • Detect the combined setup_phase_done + xfer_complete interrupt case on EP0 OUT with zero total length.
  • Enqueue an EP0 OUT zero-length xfer_complete event before enqueuing the SETUP event (in both slave and DMA paths).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/portable/synopsys/dwc2/dcd_dwc2.c Outdated
Comment thread src/portable/synopsys/dwc2/dcd_dwc2.c Outdated
Track when an EP0 OUT zero-length transfer is actually armed and require that state before synthesizing a status-stage completion ahead of a co-reported SETUP event.

This preserves the validated status-before-SETUP ordering fix while avoiding stale zero-length state from producing spurious EP0 OUT completions.
@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown

MemBrowse Memory Report

Top 10 targets by memory change (%) (out of 2346 targets) View Project Dashboard →

target .text .rodata .data .bss total % diff
xmc4500_relax/hid_generic_inout 12,964 → 12,900 (-64) 25,980 → 25,852 (-128) -0.5%
at_start_f435/dfu_runtime 11,336 → 11,276 (-60) 12,372 → 12,312 (-60) -0.5%
at_start_f402/dfu_runtime 11,908 → 11,848 (-60) 12,920 → 12,860 (-60) -0.5%
xmc4500_relax/hid_multiple_interface 13,776 → 13,712 (-64) 27,604 → 27,476 (-128) -0.5%
xmc4500_relax/hid_boot_interface 13,796 → 13,732 (-64) 27,644 → 27,516 (-128) -0.5%
sipeed_longan_nano/dfu_runtime 12,616 → 12,556 (-60) 594 → 590 (-4) 13,878 → 13,814 (-64) -0.5%
at_start_f435/hid_generic_inout 12,400 → 12,340 (-60) 13,264 → 13,204 (-60) -0.5%
at_start_f402/hid_generic_inout 12,972 → 12,912 (-60) 13,812 → 13,752 (-60) -0.4%
xmc4500_relax/cdc_dual_ports 14,972 → 14,908 (-64) 29,996 → 29,868 (-128) -0.4%
xmc4500_relax/printer_to_cdc 15,032 → 14,968 (-64) 30,112 → 29,984 (-128) -0.4%

@HiFiPhile

Copy link
Copy Markdown
Collaborator

Thank you, with the test method in #3643 DFU has no more error. Are you still working on it ?

@Alex-Schaefer

Copy link
Copy Markdown
Contributor Author

Nice to hear!

I have it currently paused, need to finish a MVP product first, then I can test in more detail and look further into it, might take a week or two.

But I can briefly try with the PR branch you linked on Monday if our functionality on STM32U5 with DFU still works and report back.

@Alex-Schaefer

Copy link
Copy Markdown
Contributor Author

@HiFiPhile I now had time to take a look:

Using the branch from #3643, DFU works. But, the fix is not in this PR, I think the actual fix is in previous #3640 with DWC2 changes, using commit 1f662779c44dd913f2c26f9bddc4f37ee70f16dc my DFU functionality also works.

@HiFiPhile

Copy link
Copy Markdown
Collaborator

Hum #3640 only applies to ISO transfer.

Actually I can see the difference with/without your patch with the stress test posted in #3643

image

@Alex-Schaefer

Copy link
Copy Markdown
Contributor Author

Well, I now realised my dfu-util LIBUSB_ERROR_PIPE error is not reproducable reliable.

I tried with different USB hubs and cables, but currently even the old commit 4c7fd70e5, which has previously not worked for my DFU application, works just fine. I guess since this issue is timing related it could make sense that it's not always reproducable.

Since I don't have a USB sniffer to actually look into the protocol details, I can only test functionality. I'll see if I can find out what needs to change to trigger the DFU failure again. Maybe with delaying the USB interrupt handling within my application I could force the error again.

@HiFiPhile

Copy link
Copy Markdown
Collaborator

please allow maintainer write access to PR, I made a few simplifications.

@Alex-Schaefer

Copy link
Copy Markdown
Contributor Author

Due to the PR being created from an organization, it's not possible to give PR write access, but I've invited you with write rights to the fork. You may change it there, or if you better want to create a new clean PR with your suggested implementation, thats also fine for me.

I have now created a minimal example for STM32U5G9J-DK2 board and tested different commits from before my fix, with my fix or with your fix, but the results are not conclusive: Only broadly delaying the interrupt handling is not a good test: https://github.com/HKM-Messtechnik/tinyusb-stm32u5-dfu-test/tree/main/logs

@HiFiPhile

Copy link
Copy Markdown
Collaborator

I've tried your repo (had to modify linker as I've Nucleo-U5A5ZJ), with 500 delay count I can reproduce the failure but can't explain the underlying issue 🙌 Maybe I've overlooked something... (or my sniffer is lying to me ?)

To me it looks like no issue !

  • Transfer 0
    Look at transaction 0, it has NAKs between OUT packets due to IRQ delay, which is normal
image image
  • Status
    After transaction 0, we got DFU_BUSY status, 1ms later it becomes DFU_IDLE.
image
  • Transfer 1
    It really bugs me, SETUP packet is acked correctly, but as soon as 1st OUT packet get NAKed libusb returns broken pipe ! Even no retry or 5s timeout !
    What's worst is the symptome is reproducible on both Linux 7.0 and Windows 11, EPIPE is returned directly by OS backend.
image image

@HiFiPhile

HiFiPhile commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Update: There was a sniffer PING filter issue 🤦

The failure was due to inep0_status_complete_before_setup's condition doepint_bm.xfer_complete could be false.
A xfer_complete IRQ fired and reset the flag, but since doepint_bm.setup_packet_rx is also true as SETUP packet is already arrived, no xfer completion event is queued:

if (!doepint_bm.status_phase_rx && !doepint_bm.setup_packet_rx) {

Please try latest commit, DMA should not be affected.

@HiFiPhile HiFiPhile marked this pull request as ready for review May 22, 2026 10:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/portable/synopsys/dwc2/dcd_dwc2.c Outdated
Comment thread src/portable/synopsys/dwc2/dcd_dwc2.c Outdated
@Alex-Schaefer

Copy link
Copy Markdown
Contributor Author

My existing test was insufficient and the results not very conclusive, but I have now updated my test repo with a more targeted test:

Instead of broadly delaying every USB IRQ, the repro now wraps dcd_edpt_xfer() and only holds IRQs immediately after arming EP0 OUT status ZLP. I think this should be a good way to trigger the actual issue.
It also traces TinyUSB DCD/DFU event ordering via wrappers, so failures can be checked for SETUP-before-status and DFU DNLOAD rejection while EP0 status is still pending.

Test matrix: DFU, bwPollTimeout=0, holds 10,12,14,16,18,20,22,25 us, 20 transfers each.

Results:

commit 10us 12us 14us 16us 18us 20us 22us 25us
4c7fd70e5 20/20 14/20 0/20 0/20 0/20 0/20 0/20 0/20
16bc0548d 20/20 18/20 6/20 0/20 0/20 0/20 0/20 0/20
c4cd6c85e 20/20 17/20 2/20 0/20 0/20 0/20 0/20 0/20
650e8a194 20/20 20/20 20/20 20/20 20/20 20/20 20/20 20/20

For failing runs the trace shows SETUP before status > 0, DNLOAD while status pending > 0, and DNLOAD rejected > 0. For 650e8a194, status completions match hold triggers and all those bad-ordering counters stay at zero.

This matches your analysis: my guarded xfer_complete condition in 16bc0548d improves the margin slightly but is not sufficient. Handling EP0 status OUT completion from RXFLVL/RX_COMPLETE in your 650e8a194 fixes this tested STM32U5 DWC2 no-DMA DFU case.

I have not looked into any DMA details myself.

@HiFiPhile

Copy link
Copy Markdown
Collaborator

Thank you, it looks nice !

If you are interested in DMA there is a another issue. With DMA dma_setup_prepare() needs to be called each after EP0 RX complete, there is a race condition about it #3646.

608b77d partially fix it and turn it into this xfer_complete and setup_phase_done issue.

HiFiPhile added 2 commits May 27, 2026 00:32
Signed-off-by: HiFiPhile <admin@hifiphile.com>
Signed-off-by: HiFiPhile <admin@hifiphile.com>
@HiFiPhile

Copy link
Copy Markdown
Collaborator

With e520cff I've further simplified the fix.

Thanks to your test setup, I've checked my DMA fix
a560281 is actually good

EP0 status hold us DFU poll timeout ms Result Passed Failed Status completed / hold triggered SETUP before status DNLOAD pending DNLOAD rejected Trace Failure
0 0 PASS 10 0 53347/53347 0 0 0 final
20 0 PASS 10 0 53347/53347 0 0 0 final
40 0 PASS 10 0 53347/53347 0 0 0 final
60 0 PASS 10 0 53347/53347 0 0 0 final
80 0 PASS 10 0 53347/53347 0 0 0 final
100 0 PASS 10 0 53347/53347 0 0 0 final
120 0 PASS 10 0 53349/53349 0 0 0 final
140 0 PASS 10 0 53347/53347 0 0 0 final
160 0 PASS 10 0 53347/53347 0 0 0 final
180 0 PASS 10 0 53347/53347 0 0 0 final
200 0 PASS 10 0 53347/53347 0 0 0 final

@Alex-Schaefer

Copy link
Copy Markdown
Contributor Author

Sorry for the delayed response now.
As I understand it with your latest commit, the DMA problem from #3646 could now be fixed, if there is also some confirmation from kubeqz?

And have you tested with changing CFG_TUD_DWC2_DMA_ENABLE in my test setup? As I understand it, since DCACHE is disabled in my CubeMX config, TinyUSB should then work without further changes in my test setup using DMA.

Maybe it could make sense to add a test to the regular HIL, to not only test basic DFU functionality but also have one fixed stress test baseline and prevent possible future regression? I guess a single delayed test could already be sufficient and shouldn't take too much time.

@HiFiPhile

Copy link
Copy Markdown
Collaborator

As I understand it with your latest commit, the DMA problem from #3646 could now be fixed, if there is also some confirmation from kubeqz?

And have you tested with changing CFG_TUD_DWC2_DMA_ENABLE in my test setup? As I understand it, since DCACHE is disabled in my CubeMX config, TinyUSB should then work without further changes in my test setup using DMA.

Yep that's how I tested DMA with your setup, have maybe 1GB data transferred. The DCACHE on U5 is only for external memory like FMC.

Actually I'm working on move transfer management into handle_rxflvl_irq so we can remove handle_epout_slave with 2 advantages:

  • GRXSTSP register has internal FIFO, receiving events won't mix up (fixes this issue)
  • Improve efficiency, remove 2nd IRQ overhead

Already working on F4 (GenID 2.81A) but not yet onwards.

- GRXSTSP register has internal FIFO, receiving events won't mix up (STATUS OUT & next SETUP)
- Improve efficiency, remove 2nd IRQ overhead

Signed-off-by: HiFiPhile <admin@hifiphile.com>
@HiFiPhile

HiFiPhile commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Actually I'm working on move transfer management into handle_rxflvl_irq so we can remove handle_epout_slave with 2 advantages:

Get it done and tested on F407 (2.81A), F746 (3.20A), F723 (3.30A), ESP32S3/ESP32P4 (4.00A), U5A5 (4.11A) with major examples.

HiFiPhile added 4 commits May 29, 2026 00:23
Signed-off-by: HiFiPhile <admin@hifiphile.com>
Signed-off-by: HiFiPhile <admin@hifiphile.com>
To avoid STATUS IN completion of previous control transfer treated as next DATA IN  when IRQ latency is high.

Signed-off-by: HiFiPhile <admin@hifiphile.com>
Signed-off-by: HiFiPhile <admin@hifiphile.com>
@HiFiPhile HiFiPhile merged commit d3ae3a0 into hathach:master May 30, 2026
396 of 398 checks passed
@Alex-Schaefer

Copy link
Copy Markdown
Contributor Author

Nice, thanks for the work on this!
I have updated the PR description and would now delete our fork soon, since this is merged.

@HiFiPhile

Copy link
Copy Markdown
Collaborator

Thank you for your professionalism !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants