[Usage]:Can the GPT attention plugin be used for self-attention in the encoder?

### System Info

**System Information:**
- CUDA version:13.0
- GPU model(s):NVIDIA L20
- TensorRT-LLM version:1.2.0

### How would you like to use TensorRT-LLM

My model's encoder needs to use `rope` and `sliding window`, but it seems the `BERT attention plugin` doesn't support these features. Can the `GPT attention plugin` be used for the encoder model? If not, how can this problem be solved?


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]:Can the GPT attention plugin be used for self-attention in the encoder? #11311

System Info

How would you like to use TensorRT-LLM

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Usage]:Can the GPT attention plugin be used for self-attention in the encoder? #11311

Description

System Info

How would you like to use TensorRT-LLM

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions