System Info
System Information:
- CUDA version:13.0
- GPU model(s):NVIDIA L20
- TensorRT-LLM version:1.2.0
How would you like to use TensorRT-LLM
My model's encoder needs to use rope and sliding window, but it seems the BERT attention plugin doesn't support these features. Can the GPT attention plugin be used for the encoder model? If not, how can this problem be solved?
Before submitting a new issue...
System Info
System Information:
How would you like to use TensorRT-LLM
My model's encoder needs to use
ropeandsliding window, but it seems theBERT attention plugindoesn't support these features. Can theGPT attention pluginbe used for the encoder model? If not, how can this problem be solved?Before submitting a new issue...