add Mistral-small-3.1 and Pixtral vision support#4591
Open
yicycyc wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR 新增模型支持:Mistral-small-3.1 Pixtral
复现针对Mistral-small-3.1和Pixtral在transformers端的对应实现进行
mistral-small-3.1的视觉头复用了Pixtral,Pixtral在transformers中的实现只是一个视觉头,Pixtral-12b模型在transformers中通过llava式的方式调用pixtral视觉头和mistral文本组合得到,paddle缺乏前置,因此本PR实现的是Mistral-small-3.1的完整模型以及Pixtral的transformers实现。
精度验证均在缩层模型下进行。
主要改动
主要改动:
新增
mistral3模型:Mistral3ConfigMistral3ModelMistral3ForConditionalGenerationMistral3ForCausalLMalias新增
pixtral视觉塔:PixtralVisionConfigPixtralVisionModelPixtralImageProcessorPixtralProcessor补充 Auto 映射:
AutoConfigAutoModelAutoModelForCausalLMAutoModelForConditionalGenerationAutoProcessorAutoImageProcessor补充多模态数据处理:
image_sizes在 dataset/collate 流程中传递更新模型列表、能力矩阵与模型单测
前向对齐验证
模型:
mistral-small-3.1缩层模型两侧加载完全相同的
.npy输入:input_idsattention_maskpixel_valuesimage_sizes输入样例为一张
224x224图片加 prompt:总 token 数为
256。结果
1.01e-061.72e-0500生成对齐
Text-only 生成
Transformers 生成的 10 个 token:
[117577, 115201, 83673, 64162, 107744, 111937, 11254, 111937, 119792, 62615]
PaddleFormers 生成的 10 个 token:
[117577, 115201, 83673, 64162, 107744, 111937, 11254, 111937, 119792, 62615]
Multimodal 生成
输入:112x112 的测试图片 + "Describe this image." 文本 prompt
Transformers 生成的 10 个 token:
[64162, 18845, 14124, 16814, 5744, 31026, 33565, 34868, 14456, 61350]
PaddleFormers 生成的 10 个 token:
[64162, 18845, 14124, 16814, 5744, 31026, 33565, 34868, 14456, 61350]
两侧完全一致
训练验证
1、文本
使用 GSM8K 做 BF16 full-SFT,Paddle 4 卡 sharding stage3 跑满 300 step,并与 Torch/ms-swift ZeRO-3 训练曲线对比。
共同设置:
max_seq_len = 512global batch size = 4max_steps = 300learning_rate = 1e-5warmup_steps = 0weight_decay = 0seed = 42训练结果:
13.35937513.401804-0.0424299.4843759.4112790.0730964.5859384.784250-0.1983124.0117194.166615-0.1548973.6250003.814764-0.1897643.5000002.8747400.6252603.4257813.515725-0.0899443.2500003.552930-0.3029303.2070313.1515310.0555003.0898443.604705-0.5148612、多模态
使用 https://github.com/PaddlePaddle/PaddleFormers/blob/develop/docs/zh/dataset_format.md#24-%E5%A4%9A%E6%A8%A1%E6%80%81%E6%8C%87%E4%BB%A4%E5%BE%AE%E8%B0%83sft%E6%95%B0%E6%8D%AE%E6%A0%BC%E5%BC%8F 数据集 做 BF16 full-SFT,Paddle 4 卡 sharding stage3 跑满 300 step,并与 Torch/ms-swift ZeRO-3 训练曲线对比。
max_seq_len / max_length = 4096global batch size = 4max_steps = 300learning_rate = 1e-5warmup_steps = 0weight_decay = 0训练结果:
12.68750012.6620240.0254769.6250009.5948380.0301629.6250009.894209-0.2692097.5156257.706814-0.1911896.2343756.1510440.0833311.8984381.3362390.5621990.5075530.0971690.4103840.2846680.1200100.1646580.1550290.0104350.1445940.0427990.630276-0.5874770.8103030.0161470.7941560.5781270.1035800.474548