[Bug]: Random "CUDA error: an illegal instruction was encountered" for gpt-oss 120B at Nvidia Spark.

### System Info

NVIDIA Spark
 - aarch64, 128GB, NVIDIA GB10, Driver 580.126.09 CUDA Version: 13.0 

Docker images:

nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc2
nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc8
...
(I think all others with spark support)

But working:
nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev


 

### Who can help?

@yuanjingx87: Is this bug reproducable? How could I find the source of the illegal instruction. Can I try somehing? (I  have already build the latest version from source... but it does not help).

Many thanks for your help,
Thomas.

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Start a docker image serving the model at port 8355. After the model is running test it via another docker image running gpt_oss.evals. After some minutes and test cases (random, progress to 1% or 5%) the crash happens.

export MODEL_HANDLE="openai/gpt-oss-120b"
export DOCKER_IMAGE="nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc2"

docker run \
  -e MODEL_HANDLE=$MODEL_HANDLE \
  -e HF_TOKEN=$HF_TOKEN \
  -v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
  -v $HOME/.cache/harmony-reqs/:/root/.cache/harmony-reqs/ \
  --rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
  --gpus=all --ipc=host --network host \
  $DOCKER_IMAGE \
  bash -c '
    export TIKTOKEN_ENCODINGS_BASE="/root/.cache/harmony-reqs" && \
    mkdir -p $TIKTOKEN_ENCODINGS_BASE && \
    wget -nc -P $TIKTOKEN_ENCODINGS_BASE https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken && \
    wget -nc -P $TIKTOKEN_ENCODINGS_BASE https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken && \
    hf download $MODEL_HANDLE && \
	cat > /tmp/extra-llm-api-config.yml <<EOF
print_iter_log: false
enable_iter_perf_stats: false
enable_autotuner: true
enable_chunked_prefill: true
kv_cache_config:
  dtype: "auto"
  max_tokens: 250000
cuda_graph_config:
  enable_padding: true
disable_overlap_scheduler: true
EOF
    trtllm-serve "$MODEL_HANDLE" --max_batch_size 4 --trust_remote_code --port 8355 --extra_llm_api_options /tmp/extra-llm-api-config.yml --max_num_tokens 61440 --max_seq_len 61440
    '

#after the server is up and running start the test from a second console:

docker run \
  --rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
  --gpus=all --ipc=host --network host \
  nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc2 \
  bash -c '
    pip install gpt-oss
	export OPENAI_API_KEY=dummy
    python -m gpt_oss.evals --model openai/gpt-oss-120b --eval aime25 --base-url http://0.0.0.0:8355/v1/ --sampler=chat_completions --n-threads 64 --reasoning-effort high
    '
	
	
#after some minutes (random...) the crash happens....

### Expected behavior

The model sould be served without crash.

### actual behavior

After some usage -> crash with CUDA illegal instruction.

[TensorRT-LLM][TRACE] std::tuple<std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > > tensorrt_llm::batch_manager::MicroBatchScheduler::operator()(tensorrt_llm::batch_manager::RequestVector&, const tensorrt_llm::batch_manager::ReqIdsSet&, SizeType32, std::optional<int>) const stop
[02/05/2026-06:58:48] [TRT-LLM] [V] has 8 active_requests, scheduled 0 context requests and 8 generation requests
[TensorRT-LLM][TRACE] Created event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Created event 0xf00c97274d50
[TensorRT-LLM][TRACE] Destroyed event 0xf00c97274d50
[TensorRT-LLM][TRACE] Destroyed event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Created event 0xf00c97274d50
[TensorRT-LLM][TRACE] Created event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Destroyed event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Destroyed event 0xf00c97274d50
[02/05/2026-06:58:48] [TRT-LLM] [V] Detected use_mrope: False
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2455, in _forward_step
    outputs = forward(scheduled_requests, self.resource_manager,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2437, in forward
    return self.model_engine.forward(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/pytorch/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/utils.py", line 109, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 3455, in forward
    outputs = self.cuda_graph_runner.replay(key, inputs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py", line 397, in replay
    self.graphs[key].replay()
  File "/root/pytorch/torch/cuda/graphs.py", line 142, in replay
    super().replay()
torch.AcceleratorError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [E] Encountered an error in forward function: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[TensorRT-LLM][DEBUG] Set request 8 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 9 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 10 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 11 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 12 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 13 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 14 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 15 from state 13 to 20
[02/05/2026-06:58:48] [TRT-LLM] [V] after gather, rank = 0, responses = [(8, LlmResponse(request_id=8, error_msg="CUDA error: an illegal instruction was encountered\nSearch for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", result=None, client_id=2)), (9, LlmResponse(request_id=9, error_msg="CUDA error: an illegal instruction was encountered\nSearch for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", result=None, client_id=3)), (10, LlmResponse(request_id=10, error_msg="CUDA error: an illegal instruction was encountered\nSearch for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", result=None, client_id=4)), (11, LlmResponse(request_id=11, error_msg="CUDA error: an illegal instruction was encountered\nSearch for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", result=None, client_id=5)), (12, LlmResponse(request_id=12, error_msg="CUDA error: an illegal instruction was encountered\nSearch for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", result=None, client_id=6)), (13, LlmResponse(request_id=13, error_msg="CUDA error: an illegal instruction was encountered\nSearch for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", result=None, client_id=7)), (14, LlmResponse(request_id=14, error_msg="CUDA error: an illegal instruction was encountered\nSearch for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", result=None, client_id=8)), (15, LlmResponse(request_id=15, error_msg="CUDA error: an illegal instruction was encountered\nSearch for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", result=None, client_id=9))]
[02/05/2026-06:58:48] [TRT-LLM] [V] Client [worker_result_queue] connecting to tcp://127.0.0.1:35963 in PAIR

terminate called after throwing an instance of 'c10::AcceleratorError'
  what():  CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from currentStreamCaptureStatusMayInitCtx at /root/pytorch/c10/cuda/CUDAGraphsC10Utils.h:71 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xe8 (0xf029b81bdd98 in /root/pytorch/torch/lib/libc10.so)
frame #1: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) + 0x290 (0xf029b82c21a0 in /root/pytorch/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x10b9668 (0xf02765ea9668 in /root/pytorch/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0x5cc184 (0xf027a450c184 in /root/pytorch/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::~TensorImpl() + 0x14 (0xf029b8198a38 in /root/pytorch/torch/lib/libc10.so)
frame #5: <unknown function> + 0xbf3414 (0xf027a4b33414 in /root/pytorch/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0xbf3794 (0xf027a4b33794 in /root/pytorch/torch/lib/libtorch_python.so)
frame #7: /usr/bin/python() [0x4f9d6c]
frame #8: /usr/bin/python() [0x523b30]
frame #9: /usr/bin/python() [0x4f9dcc]
frame #10: /usr/bin/python() [0x523b30]
frame #11: _PyEval_EvalFrameDefault + 0x508 (0x564b3c in /usr/bin/python)
frame #12: /usr/bin/python() [0x4c7024]
frame #13: _PyEval_EvalFrameDefault + 0x3cf8 (0x56832c in /usr/bin/python)
frame #14: /usr/bin/python() [0x4c7024]
frame #15: /usr/bin/python() [0x6e5050]
frame #16: /usr/bin/python() [0x686e10]
frame #17: <unknown function> + 0x8595c (0xf029d988595c in /usr/lib/aarch64-linux-gnu/libc.so.6)
frame #18: <unknown function> + 0xeb89c (0xf029d98eb89c in /usr/lib/aarch64-linux-gnu/libc.so.6)

[gx10-c635:47402] *** Process received signal ***
[gx10-c635:47402] Signal: Aborted (6)
[gx10-c635:47402] Signal code:  (-6)
[gx10-c635:47402] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xf029d9b32968]
[gx10-c635:47402] [ 1] /usr/lib/aarch64-linux-gnu/libc.so.6(+0x87608)[0xf029d9887608]
[gx10-c635:47402] [ 2] /usr/lib/aarch64-linux-gnu/libc.so.6(gsignal+0x1c)[0xf029d983cb3c]
[gx10-c635:47402] [ 3] /usr/lib/aarch64-linux-gnu/libc.so.6(abort+0xf4)[0xf029d9827e00]
[gx10-c635:47402] [ 4] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x1d8)[0xf029b84ad4d8]
[gx10-c635:47402] [ 5] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(+0xaa570)[0xf029b84aa570]
[gx10-c635:47402] [ 6] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(__cxa_call_terminate+0x44)[0xf029b84a0dec]
[gx10-c635:47402] [ 7] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0xa4)[0xf029b84a9964]
[gx10-c635:47402] [ 8] /usr/lib/aarch64-linux-gnu/libgcc_s.so.1(+0x17354)[0xf029b83d7354]
[gx10-c635:47402] [ 9] /usr/lib/aarch64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0x84)[0xf029b83d7924]
[gx10-c635:47402] [10] /root/pytorch/torch/lib/libtorch_cuda.so(+0x10ba138)[0xf02765eaa138]
[gx10-c635:47402] [11] /root/pytorch/torch/lib/libtorch_python.so(+0x5cc184)[0xf027a450c184]
[gx10-c635:47402] [12] /root/pytorch/torch/lib/libc10.so(_ZN3c1010TensorImplD0Ev+0x14)[0xf029b8198a38]
[gx10-c635:47402] [13] /root/pytorch/torch/lib/libtorch_python.so(+0xbf3414)[0xf027a4b33414]
[gx10-c635:47402] [14] /root/pytorch/torch/lib/libtorch_python.so(+0xbf3794)[0xf027a4b33794]
[gx10-c635:47402] [15] /usr/bin/python[0x4f9d6c]
[gx10-c635:47402] [16] /usr/bin/python[0x523b30]
[gx10-c635:47402] [17] /usr/bin/python[0x4f9dcc]
[gx10-c635:47402] [18] /usr/bin/python[0x523b30]
[gx10-c635:47402] [19] /usr/bin/python(_PyEval_EvalFrameDefault+0x508)[0x564b3c]
[gx10-c635:47402] [20] /usr/bin/python[0x4c7024]
[gx10-c635:47402] [21] /usr/bin/python(_PyEval_EvalFrameDefault+0x3cf8)[0x56832c]
[gx10-c635:47402] [22] /usr/bin/python[0x4c7024]
[gx10-c635:47402] [23] /usr/bin/python[0x6e5050]
[gx10-c635:47402] [24] /usr/bin/python[0x686e10]
[gx10-c635:47402] [25] /usr/lib/aarch64-linux-gnu/libc.so.6(+0x8595c)[0xf029d988595c]
[gx10-c635:47402] [26] /usr/lib/aarch64-linux-gnu/libc.so.6(+0xeb89c)[0xf029d98eb89c]
[gx10-c635:47402] *** End of error message ***
[02/05/2026-06:58:48] [TRT-LLM] [V] Reset Python GC thresholds to default value: (700, 10, 10)
[02/05/2026-06:58:48] [TRT-LLM] [V] Set Python GC threshold to customized value: 20000
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
    response = await create_harmony_response(promise, postproc_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
    await promise.aresult()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
    await self._aresult_step()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
    self._handle_response(response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
    GenerationResultBase._handle_response(self, response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


INFO:     127.0.0.1:41676 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
    response = await create_harmony_response(promise, postproc_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
    await promise.aresult()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
    await self._aresult_step()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
    self._handle_response(response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
    GenerationResultBase._handle_response(self, response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


INFO:     127.0.0.1:41666 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
    response = await create_harmony_response(promise, postproc_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
    await promise.aresult()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
    await self._aresult_step()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
    self._handle_response(response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
    GenerationResultBase._handle_response(self, response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


INFO:     127.0.0.1:41710 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
    response = await create_harmony_response(promise, postproc_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
    await promise.aresult()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
    await self._aresult_step()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
    self._handle_response(response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
    GenerationResultBase._handle_response(self, response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


INFO:     127.0.0.1:41726 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
    response = await create_harmony_response(promise, postproc_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
    await promise.aresult()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
    await self._aresult_step()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
    self._handle_response(response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
    GenerationResultBase._handle_response(self, response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


INFO:     127.0.0.1:41716 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
    response = await create_harmony_response(promise, postproc_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
    await promise.aresult()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
    await self._aresult_step()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
    self._handle_response(response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
    GenerationResultBase._handle_response(self, response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


INFO:     127.0.0.1:41698 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
    response = await create_harmony_response(promise, postproc_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
    await promise.aresult()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
    await self._aresult_step()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
    self._handle_response(response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
    GenerationResultBase._handle_response(self, response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


INFO:     127.0.0.1:41742 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
    response = await create_harmony_response(promise, postproc_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
    await promise.aresult()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
    await self._aresult_step()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
    self._handle_response(response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
    GenerationResultBase._handle_response(self, response)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
    handler(response.error_msg)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
    raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


INFO:     127.0.0.1:41692 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41698) is disconnected, abort 2
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41676) is disconnected, abort 3
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41692) is disconnected, abort 4
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41666) is disconnected, abort 5
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41716) is disconnected, abort 6
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41710) is disconnected, abort 7
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41726) is disconnected, abort 8
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41742) is disconnected, abort 9
--------------------------------------------------------------------------
Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.



### additional notes

The above output was created with a custom torch build with TORCH_USE_CUDA_DSA enabled.  TLLM_LOG_LEVEL=TRACE and CUDA_LAUNCH_BLOCKING=1 were defined. But I can not see the source of the error.

Is the problem reproducable? What can I try next?

Many thanks for yourhelp.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Random "CUDA error: an illegal instruction was encountered" for gpt-oss 120B at Nvidia Spark. #11313

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Random "CUDA error: an illegal instruction was encountered" for gpt-oss 120B at Nvidia Spark. #11313

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions