Skip to content

[Usage]:Can the GPT attention plugin be used for self-attention in the encoder? #11311

@0Unicorn

Description

@0Unicorn

System Info

System Information:

  • CUDA version:13.0
  • GPU model(s):NVIDIA L20
  • TensorRT-LLM version:1.2.0

How would you like to use TensorRT-LLM

My model's encoder needs to use rope and sliding window, but it seems the BERT attention plugin doesn't support these features. Can the GPT attention plugin be used for the encoder model? If not, how can this problem be solved?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions