Skip to content

Support qwen3.5 loss mask for multi-turn SFT#1742

Merged
zhuzilin merged 2 commits intoTHUDM:mainfrom
huang3eng:main
Mar 22, 2026
Merged

Support qwen3.5 loss mask for multi-turn SFT#1742
zhuzilin merged 2 commits intoTHUDM:mainfrom
huang3eng:main

Conversation

@huang3eng
Copy link
Copy Markdown
Contributor

Background

--loss-mask-type defaults to qwen. When Qwen3.5 SFT is launched with the default setting, SFT rollout still goes through the legacy qwen loss-mask path, which is incompatible with Qwen3.5 multi-turn chat-template behavior and can fail with:

jinja2.exceptions.TemplateError: No user query found in messages.

Using --loss-mask-type qwen3 avoids the immediate crash, but it still does not match Qwen3.5 masking semantics on multi-turn conversations. For historical assistant turns, Qwen3.5 usually keeps only the final answer, while the current qwen3 path may reconstruct extra reasoning content and supervise unnecessary thinking tokens. This increases token count and slows down SFT.

Changes

  • add qwen3_5 as a valid --loss-mask-type
  • implement a dedicated Qwen3.5 multi-turn loss-mask generator based on the fully rendered conversation
  • derive token-level supervision from rendered-text character spans via offset_mapping
  • validate rendered-text tokenization against apply_chat_template(..., tokenize=True) output
  • add a defensive check in SFT rollout to ensure token_ids and loss_mask always have the same length
  • add a Qwen3.5 SFT script that explicitly uses --loss-mask-type qwen3_5
  • add unit tests covering single-turn parity, multi-turn divergence, and tool-call flow behavior

Why this PR

  • fixes the Qwen3.5 SFT failure when the default qwen loss-mask path is used
  • makes multi-turn masking match Qwen3.5 chat-template behavior
  • avoids supervising unnecessary historical reasoning tokens
  • reduces wasted training tokens and improves SFT efficiency

Scope

This PR does not change the global default of --loss-mask-type. Instead, it introduces a Qwen3.5-specific option and updates the Qwen3.5 SFT entry script to use it explicitly, which keeps existing Qwen/Qwen3 behavior unchanged.

Testing

  • python -m pytest tests/utils/test_loss_mask_type_qwen35.py

@huang3eng
Copy link
Copy Markdown
Contributor Author

@zhuzilin @Zhuohao-Li Please help review it. 💗

Copy link
Copy Markdown
Contributor

@Zhuohao-Li Zhuohao-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@zhuzilin zhuzilin merged commit 7f2a03b into THUDM:main Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants