Skip to content

fix: quote $MOE_LAYER_FREQ#1689

Merged
zhuzilin merged 1 commit intoTHUDM:mainfrom
lawrence-harmonic:fix/quote_moe_layer_freq
Mar 22, 2026
Merged

fix: quote $MOE_LAYER_FREQ#1689
zhuzilin merged 1 commit intoTHUDM:mainfrom
lawrence-harmonic:fix/quote_moe_layer_freq

Conversation

@lawrence-harmonic
Copy link
Contributor

@lawrence-harmonic lawrence-harmonic commented Mar 8, 2026

If there is a file named "1", then --moe-layer-freq will be wrong.

Explanation:

$ echo [1,1,1]
[1,1,1]
$ touch 1
$ echo [1,1,1]
1
$ echo "[1,1,1]"
[1,1,1]

In SLIME context:

$ source scripts/models/qwen3.5-35B-A3B.sh
$ echo ${MODEL_ARGS[@]}
--spec slime_plugins.models.qwen3_5 get_qwen3_5_spec --disable-bias-linear --qk-layernorm --group-query-attention --num-attention-heads 16 --num-query-groups 2 --kv-channels 256 --num-layers 40 --hidden-size 2048 --ffn-hidden-size 512 --use-gated-attention --normalization RMSNorm --apply-layernorm-1p --position-embedding-type rope --norm-epsilon 1e-6 --rotary-percent 0.25 --swiglu --untie-embeddings-and-output-weights --vocab-size 248320 --rotary-base 10000000 --moe-ffn-hidden-size 512 --moe-shared-expert-intermediate-size 512 --moe-router-score-function softmax --moe-token-dispatcher-type alltoall --moe-router-topk 8 --moe-layer-freq [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] --num-experts 256 --moe-grouped-gemm --moe-token-drop-policy probs --moe-router-dtype fp32 --moe-permute-fusion --moe-aux-loss-coeff 0 --attention-output-gate --moe-shared-expert-gate
$ touch 1
$ source scripts/models/qwen3.5-35B-A3B.sh
$ echo ${MODEL_ARGS[@]}
--spec slime_plugins.models.qwen3_5 get_qwen3_5_spec --disable-bias-linear --qk-layernorm --group-query-attention --num-attention-heads 16 --num-query-groups 2 --kv-channels 256 --num-layers 40 --hidden-size 2048 --ffn-hidden-size 512 --use-gated-attention --normalization RMSNorm --apply-layernorm-1p --position-embedding-type rope --norm-epsilon 1e-6 --rotary-percent 0.25 --swiglu --untie-embeddings-and-output-weights --vocab-size 248320 --rotary-base 10000000 --moe-ffn-hidden-size 512 --moe-shared-expert-intermediate-size 512 --moe-router-score-function softmax --moe-token-dispatcher-type alltoall --moe-router-topk 8 --moe-layer-freq 1 --num-experts 256 --moe-grouped-gemm --moe-token-drop-policy probs --moe-router-dtype fp32 --moe-permute-fusion --moe-aux-loss-coeff 0 --attention-output-gate --moe-shared-expert-gate

If there is a file named "1", then `--moe-layer-freq` will be wrong.

Explanation:
```
$ echo [1,1,1]
[1,1,1]
$ touch 1
$ echo [1,1,1]
1
$ echo "[1,1,1]"
[1,1,1]
```

In SLIME context:
```
$ source scripts/models/qwen3.5-35B-A3B.sh
$ echo ${MODEL_ARGS[@]}
--spec slime_plugins.models.qwen3_5 get_qwen3_5_spec --disable-bias-linear --qk-layernorm --group-query-attention --num-attention-heads 16 --num-query-groups 2 --kv-channels 256 --num-layers 40 --hidden-size 2048 --ffn-hidden-size 512 --use-gated-attention --normalization RMSNorm --apply-layernorm-1p --position-embedding-type rope --norm-epsilon 1e-6 --rotary-percent 0.25 --swiglu --untie-embeddings-and-output-weights --vocab-size 248320 --rotary-base 10000000 --moe-ffn-hidden-size 512 --moe-shared-expert-intermediate-size 512 --moe-router-score-function softmax --moe-token-dispatcher-type alltoall --moe-router-topk 8 --moe-layer-freq [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] --num-experts 256 --moe-grouped-gemm --moe-token-drop-policy probs --moe-router-dtype fp32 --moe-permute-fusion --moe-aux-loss-coeff 0 --attention-output-gate --moe-shared-expert-gate
$ touch 1
$ source scripts/models/qwen3.5-35B-A3B.sh
$ echo ${MODEL_ARGS[@]}
--spec slime_plugins.models.qwen3_5 get_qwen3_5_spec --disable-bias-linear --qk-layernorm --group-query-attention --num-attention-heads 16 --num-query-groups 2 --kv-channels 256 --num-layers 40 --hidden-size 2048 --ffn-hidden-size 512 --use-gated-attention --normalization RMSNorm --apply-layernorm-1p --position-embedding-type rope --norm-epsilon 1e-6 --rotary-percent 0.25 --swiglu --untie-embeddings-and-output-weights --vocab-size 248320 --rotary-base 10000000 --moe-ffn-hidden-size 512 --moe-shared-expert-intermediate-size 512 --moe-router-score-function softmax --moe-token-dispatcher-type alltoall --moe-router-topk 8 --moe-layer-freq 1 --num-experts 256 --moe-grouped-gemm --moe-token-drop-policy probs --moe-router-dtype fp32 --moe-permute-fusion --moe-aux-loss-coeff 0 --attention-output-gate --moe-shared-expert-gate
```
@lawrence-harmonic lawrence-harmonic force-pushed the fix/quote_moe_layer_freq branch from bed48c9 to f776604 Compare March 8, 2026 22:46
@zhuzilin zhuzilin merged commit 10f21d8 into THUDM:main Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants