Support qwen3.5 loss mask for multi-turn SFT by huang3eng · Pull Request #1742 · THUDM/slime

huang3eng · 2026-03-19T16:21:43Z

Background

--loss-mask-type defaults to qwen. When Qwen3.5 SFT is launched with the default setting, SFT rollout still goes through the legacy qwen loss-mask path, which is incompatible with Qwen3.5 multi-turn chat-template behavior and can fail with:

jinja2.exceptions.TemplateError: No user query found in messages.

Using --loss-mask-type qwen3 avoids the immediate crash, but it still does not match Qwen3.5 masking semantics on multi-turn conversations. For historical assistant turns, Qwen3.5 usually keeps only the final answer, while the current qwen3 path may reconstruct extra reasoning content and supervise unnecessary thinking tokens. This increases token count and slows down SFT.

Changes

add qwen3_5 as a valid --loss-mask-type
implement a dedicated Qwen3.5 multi-turn loss-mask generator based on the fully rendered conversation
derive token-level supervision from rendered-text character spans via offset_mapping
validate rendered-text tokenization against apply_chat_template(..., tokenize=True) output
add a defensive check in SFT rollout to ensure token_ids and loss_mask always have the same length
add a Qwen3.5 SFT script that explicitly uses --loss-mask-type qwen3_5
add unit tests covering single-turn parity, multi-turn divergence, and tool-call flow behavior

Why this PR

fixes the Qwen3.5 SFT failure when the default qwen loss-mask path is used
makes multi-turn masking match Qwen3.5 chat-template behavior
avoids supervising unnecessary historical reasoning tokens
reduces wasted training tokens and improves SFT efficiency

Scope

This PR does not change the global default of --loss-mask-type. Instead, it introduces a Qwen3.5-specific option and updates the Qwen3.5 SFT entry script to use it explicitly, which keeps existing Qwen/Qwen3 behavior unchanged.

Testing

python -m pytest tests/utils/test_loss_mask_type_qwen35.py

huang3eng · 2026-03-19T16:24:09Z

@zhuzilin @Zhuohao-Li Please help review it. 💗

Zhuohao-Li

lgtm, thanks!

Support qwen3.5 loss mask for multi-turn SFT

b4e7661

Zhuohao-Li reviewed Mar 20, 2026

View reviewed changes

style: format qwen3.5 loss mask code

843a5ba

zhuzilin merged commit 7f2a03b into THUDM:main Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support qwen3.5 loss mask for multi-turn SFT#1742

Support qwen3.5 loss mask for multi-turn SFT#1742
zhuzilin merged 2 commits intoTHUDM:mainfrom
huang3eng:main

huang3eng commented Mar 19, 2026

Uh oh!

huang3eng commented Mar 19, 2026

Uh oh!

Zhuohao-Li left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huang3eng commented Mar 19, 2026

Background

Changes

Why this PR

Scope

Testing

Uh oh!

huang3eng commented Mar 19, 2026

Uh oh!

Zhuohao-Li left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants