Skip to content

[Model] Add T5Gemma2 model plugin integration#8

Open
akh64bit wants to merge 10 commits intovllm-project:masterfrom
akh64bit:ak/t5gemma2
Open

[Model] Add T5Gemma2 model plugin integration#8
akh64bit wants to merge 10 commits intovllm-project:masterfrom
akh64bit:ak/t5gemma2

Conversation

@akh64bit
Copy link
Copy Markdown

@akh64bit akh64bit commented Mar 5, 2026

This PR implements the T5Gemma2 encoder-decoder model as an out-of-tree vLLM plugin, moving the implementation from the core vllm repository as suggested in vllm-project/vllm#32617.

Changes

  • Migrated the model implementation into vllm_bart_plugin/t5gemma2.py.
  • Fixed the q, k, and v tensor reshaping bugs in attention inputs (MMEncoderAttention and vLLM's Attention expect (num_tokens, num_heads, head_dim), not (1, num_tokens, hidden_size)).
  • Ensured residual connections are correctly placed in the T5Gemma2DecoderLayer block.
  • Ensured the RoPE (rotary_emb) is correctly applied during the forward pass.
  • Implemented SupportsMultiModal interface mapping.
  • Updated Attention imports to work with the latest vLLM versions.
  • Registered the model within vllm_bart_plugin/__init__.py.

How to test locally

To run this plugin alongside vLLM, ensure you have the required specific transformers fork and vLLM installed, then install the plugin in editable mode:

# 1. Clone and install the custom transformers library locally
git clone https://github.com/akh64bit/transformers.git -b t5gemma2
cd transformers
pip install -e .

# 2. Install the plugin in editable mode
cd ../bart-plugin
pip install -e .

# 3. Test loading the model using standard vLLM offline inference
python -c "
from vllm import LLM
# Will auto-load from the bart-plugin registry
llm = LLM(model='google/t5gemma-2-270m-270m', trust_remote_code=True)
print('Model loaded successfully!')
"

Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
@akh64bit
Copy link
Copy Markdown
Author

akh64bit commented Mar 5, 2026

I've added an example_t5gemma2_usage.py script to demonstrate how to use the model with the plugin. You can test it out by running python example_t5gemma2_usage.py after following the setup instructions in the PR description.

akh64bit added 8 commits March 5, 2026 03:49
Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
Signed-off-by: Akhilesh Kumar <akhilbussiness@gmail.com>
@Bullish-Design
Copy link
Copy Markdown

Awesome! I've been looking for a good way to experiment with T5gemma2. How do you find its performance in comparison with other models of similar size? Are there any issues/limitations/things to be aware of when using it with vLLM?

Copy link
Copy Markdown
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing @akh64bit !
Will look to get this merged after v0.16 support, as I see some of the changes are related to moving imports around (to sync with upstream)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants