System Info
NVIDIA Spark
- aarch64, 128GB, NVIDIA GB10, Driver 580.126.09 CUDA Version: 13.0
Docker images:
nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc2
nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc8
...
(I think all others with spark support)
But working:
nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev
Who can help?
@yuanjingx87: Is this bug reproducable? How could I find the source of the illegal instruction. Can I try somehing? (I have already build the latest version from source... but it does not help).
Many thanks for your help,
Thomas.
Information
Tasks
Reproduction
Start a docker image serving the model at port 8355. After the model is running test it via another docker image running gpt_oss.evals. After some minutes and test cases (random, progress to 1% or 5%) the crash happens.
export MODEL_HANDLE="openai/gpt-oss-120b"
export DOCKER_IMAGE="nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc2"
docker run
-e MODEL_HANDLE=$MODEL_HANDLE
-e HF_TOKEN=$HF_TOKEN
-v $HOME/.cache/huggingface/:/root/.cache/huggingface/
-v $HOME/.cache/harmony-reqs/:/root/.cache/harmony-reqs/
--rm -it --ulimit memlock=-1 --ulimit stack=67108864
--gpus=all --ipc=host --network host
$DOCKER_IMAGE
bash -c '
export TIKTOKEN_ENCODINGS_BASE="/root/.cache/harmony-reqs" &&
mkdir -p $TIKTOKEN_ENCODINGS_BASE &&
wget -nc -P $TIKTOKEN_ENCODINGS_BASE https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken &&
wget -nc -P $TIKTOKEN_ENCODINGS_BASE https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken &&
hf download $MODEL_HANDLE &&
cat > /tmp/extra-llm-api-config.yml <<EOF
print_iter_log: false
enable_iter_perf_stats: false
enable_autotuner: true
enable_chunked_prefill: true
kv_cache_config:
dtype: "auto"
max_tokens: 250000
cuda_graph_config:
enable_padding: true
disable_overlap_scheduler: true
EOF
trtllm-serve "$MODEL_HANDLE" --max_batch_size 4 --trust_remote_code --port 8355 --extra_llm_api_options /tmp/extra-llm-api-config.yml --max_num_tokens 61440 --max_seq_len 61440
'
#after the server is up and running start the test from a second console:
docker run
--rm -it --ulimit memlock=-1 --ulimit stack=67108864
--gpus=all --ipc=host --network host
nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc2
bash -c '
pip install gpt-oss
export OPENAI_API_KEY=dummy
python -m gpt_oss.evals --model openai/gpt-oss-120b --eval aime25 --base-url http://0.0.0.0:8355/v1/ --sampler=chat_completions --n-threads 64 --reasoning-effort high
'
#after some minutes (random...) the crash happens....
Expected behavior
The model sould be served without crash.
actual behavior
After some usage -> crash with CUDA illegal instruction.
[TensorRT-LLM][TRACE] std::tuple<std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > > tensorrt_llm::batch_manager::MicroBatchScheduler::operator()(tensorrt_llm::batch_manager::RequestVector&, const tensorrt_llm::batch_manager::ReqIdsSet&, SizeType32, std::optional) const stop
[02/05/2026-06:58:48] [TRT-LLM] [V] has 8 active_requests, scheduled 0 context requests and 8 generation requests
[TensorRT-LLM][TRACE] Created event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Created event 0xf00c97274d50
[TensorRT-LLM][TRACE] Destroyed event 0xf00c97274d50
[TensorRT-LLM][TRACE] Destroyed event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Created event 0xf00c97274d50
[TensorRT-LLM][TRACE] Created event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Destroyed event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Destroyed event 0xf00c97274d50
[02/05/2026-06:58:48] [TRT-LLM] [V] Detected use_mrope: False
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2455, in _forward_step
outputs = forward(scheduled_requests, self.resource_manager,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2437, in forward
return self.model_engine.forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/pytorch/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/utils.py", line 109, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 3455, in forward
outputs = self.cuda_graph_runner.replay(key, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py", line 397, in replay
self.graphs[key].replay()
File "/root/pytorch/torch/cuda/graphs.py", line 142, in replay
super().replay()
torch.AcceleratorError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [E] Encountered an error in forward function: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[TensorRT-LLM][DEBUG] Set request 8 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 9 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 10 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 11 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 12 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 13 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 14 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 15 from state 13 to 20
[02/05/2026-06:58:48] [TRT-LLM] [V] after gather, rank = 0, responses = [(8, LlmResponse(request_id=8, error_msg="CUDA error: an illegal instruction was encountered\nSearch for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with TORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=2)), (9, LlmResponse(request_id=9, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n", result=None, client_id=3)), (10, LlmResponse(request_id=10, error_msg="CUDA error: an illegal instruction was encountered\nSearch for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with TORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=4)), (11, LlmResponse(request_id=11, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n", result=None, client_id=5)), (12, LlmResponse(request_id=12, error_msg="CUDA error: an illegal instruction was encountered\nSearch for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with TORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=6)), (13, LlmResponse(request_id=13, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n", result=None, client_id=7)), (14, LlmResponse(request_id=14, error_msg="CUDA error: an illegal instruction was encountered\nSearch for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with TORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=8)), (15, LlmResponse(request_id=15, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n", result=None, client_id=9))]
[02/05/2026-06:58:48] [TRT-LLM] [V] Client [worker_result_queue] connecting to tcp://127.0.0.1:35963 in PAIR
terminate called after throwing an instance of 'c10::AcceleratorError'
what(): CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from currentStreamCaptureStatusMayInitCtx at /root/pytorch/c10/cuda/CUDAGraphsC10Utils.h:71 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0xe8 (0xf029b81bdd98 in /root/pytorch/torch/lib/libc10.so)
frame #1: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) + 0x290 (0xf029b82c21a0 in /root/pytorch/torch/lib/libc10_cuda.so)
frame #2: + 0x10b9668 (0xf02765ea9668 in /root/pytorch/torch/lib/libtorch_cuda.so)
frame #3: + 0x5cc184 (0xf027a450c184 in /root/pytorch/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::~TensorImpl() + 0x14 (0xf029b8198a38 in /root/pytorch/torch/lib/libc10.so)
frame #5: + 0xbf3414 (0xf027a4b33414 in /root/pytorch/torch/lib/libtorch_python.so)
frame #6: + 0xbf3794 (0xf027a4b33794 in /root/pytorch/torch/lib/libtorch_python.so)
frame #7: /usr/bin/python() [0x4f9d6c]
frame #8: /usr/bin/python() [0x523b30]
frame #9: /usr/bin/python() [0x4f9dcc]
frame #10: /usr/bin/python() [0x523b30]
frame #11: _PyEval_EvalFrameDefault + 0x508 (0x564b3c in /usr/bin/python)
frame #12: /usr/bin/python() [0x4c7024]
frame #13: _PyEval_EvalFrameDefault + 0x3cf8 (0x56832c in /usr/bin/python)
frame #14: /usr/bin/python() [0x4c7024]
frame #15: /usr/bin/python() [0x6e5050]
frame #16: /usr/bin/python() [0x686e10]
frame #17: + 0x8595c (0xf029d988595c in /usr/lib/aarch64-linux-gnu/libc.so.6)
frame #18: + 0xeb89c (0xf029d98eb89c in /usr/lib/aarch64-linux-gnu/libc.so.6)
[gx10-c635:47402] *** Process received signal ***
[gx10-c635:47402] Signal: Aborted (6)
[gx10-c635:47402] Signal code: (-6)
[gx10-c635:47402] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xf029d9b32968]
[gx10-c635:47402] [ 1] /usr/lib/aarch64-linux-gnu/libc.so.6(+0x87608)[0xf029d9887608]
[gx10-c635:47402] [ 2] /usr/lib/aarch64-linux-gnu/libc.so.6(gsignal+0x1c)[0xf029d983cb3c]
[gx10-c635:47402] [ 3] /usr/lib/aarch64-linux-gnu/libc.so.6(abort+0xf4)[0xf029d9827e00]
[gx10-c635:47402] [ 4] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x1d8)[0xf029b84ad4d8]
[gx10-c635:47402] [ 5] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(+0xaa570)[0xf029b84aa570]
[gx10-c635:47402] [ 6] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(__cxa_call_terminate+0x44)[0xf029b84a0dec]
[gx10-c635:47402] [ 7] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0xa4)[0xf029b84a9964]
[gx10-c635:47402] [ 8] /usr/lib/aarch64-linux-gnu/libgcc_s.so.1(+0x17354)[0xf029b83d7354]
[gx10-c635:47402] [ 9] /usr/lib/aarch64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0x84)[0xf029b83d7924]
[gx10-c635:47402] [10] /root/pytorch/torch/lib/libtorch_cuda.so(+0x10ba138)[0xf02765eaa138]
[gx10-c635:47402] [11] /root/pytorch/torch/lib/libtorch_python.so(+0x5cc184)[0xf027a450c184]
[gx10-c635:47402] [12] /root/pytorch/torch/lib/libc10.so(_ZN3c1010TensorImplD0Ev+0x14)[0xf029b8198a38]
[gx10-c635:47402] [13] /root/pytorch/torch/lib/libtorch_python.so(+0xbf3414)[0xf027a4b33414]
[gx10-c635:47402] [14] /root/pytorch/torch/lib/libtorch_python.so(+0xbf3794)[0xf027a4b33794]
[gx10-c635:47402] [15] /usr/bin/python[0x4f9d6c]
[gx10-c635:47402] [16] /usr/bin/python[0x523b30]
[gx10-c635:47402] [17] /usr/bin/python[0x4f9dcc]
[gx10-c635:47402] [18] /usr/bin/python[0x523b30]
[gx10-c635:47402] [19] /usr/bin/python(_PyEval_EvalFrameDefault+0x508)[0x564b3c]
[gx10-c635:47402] [20] /usr/bin/python[0x4c7024]
[gx10-c635:47402] [21] /usr/bin/python(_PyEval_EvalFrameDefault+0x3cf8)[0x56832c]
[gx10-c635:47402] [22] /usr/bin/python[0x4c7024]
[gx10-c635:47402] [23] /usr/bin/python[0x6e5050]
[gx10-c635:47402] [24] /usr/bin/python[0x686e10]
[gx10-c635:47402] [25] /usr/lib/aarch64-linux-gnu/libc.so.6(+0x8595c)[0xf029d988595c]
[gx10-c635:47402] [26] /usr/lib/aarch64-linux-gnu/libc.so.6(+0xeb89c)[0xf029d98eb89c]
[gx10-c635:47402] *** End of error message ***
[02/05/2026-06:58:48] [TRT-LLM] [V] Reset Python GC thresholds to default value: (700, 10, 10)
[02/05/2026-06:58:48] [TRT-LLM] [V] Set Python GC threshold to customized value: 20000
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO: 127.0.0.1:41676 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO: 127.0.0.1:41666 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO: 127.0.0.1:41710 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO: 127.0.0.1:41726 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO: 127.0.0.1:41716 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO: 127.0.0.1:41698 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO: 127.0.0.1:41742 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO: 127.0.0.1:41692 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41698) is disconnected, abort 2
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41676) is disconnected, abort 3
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41692) is disconnected, abort 4
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41666) is disconnected, abort 5
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41716) is disconnected, abort 6
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41710) is disconnected, abort 7
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41726) is disconnected, abort 8
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41742) is disconnected, abort 9
Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
additional notes
The above output was created with a custom torch build with TORCH_USE_CUDA_DSA enabled. TLLM_LOG_LEVEL=TRACE and CUDA_LAUNCH_BLOCKING=1 were defined. But I can not see the source of the error.
Is the problem reproducable? What can I try next?
Many thanks for yourhelp.
Before submitting a new issue...
System Info
NVIDIA Spark
Docker images:
nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc2
nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc8
...
(I think all others with spark support)
But working:
nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev
Who can help?
@yuanjingx87: Is this bug reproducable? How could I find the source of the illegal instruction. Can I try somehing? (I have already build the latest version from source... but it does not help).
Many thanks for your help,
Thomas.
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
Start a docker image serving the model at port 8355. After the model is running test it via another docker image running gpt_oss.evals. After some minutes and test cases (random, progress to 1% or 5%) the crash happens.
export MODEL_HANDLE="openai/gpt-oss-120b"
export DOCKER_IMAGE="nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc2"
docker run
-e MODEL_HANDLE=$MODEL_HANDLE
-e HF_TOKEN=$HF_TOKEN
-v $HOME/.cache/huggingface/:/root/.cache/huggingface/
-v $HOME/.cache/harmony-reqs/:/root/.cache/harmony-reqs/
--rm -it --ulimit memlock=-1 --ulimit stack=67108864
--gpus=all --ipc=host --network host
$DOCKER_IMAGE
bash -c '
export TIKTOKEN_ENCODINGS_BASE="/root/.cache/harmony-reqs" &&
mkdir -p $TIKTOKEN_ENCODINGS_BASE &&
wget -nc -P $TIKTOKEN_ENCODINGS_BASE https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken &&
wget -nc -P $TIKTOKEN_ENCODINGS_BASE https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken &&
hf download $MODEL_HANDLE &&
cat > /tmp/extra-llm-api-config.yml <<EOF
print_iter_log: false
enable_iter_perf_stats: false
enable_autotuner: true
enable_chunked_prefill: true
kv_cache_config:
dtype: "auto"
max_tokens: 250000
cuda_graph_config:
enable_padding: true
disable_overlap_scheduler: true
EOF
trtllm-serve "$MODEL_HANDLE" --max_batch_size 4 --trust_remote_code --port 8355 --extra_llm_api_options /tmp/extra-llm-api-config.yml --max_num_tokens 61440 --max_seq_len 61440
'
#after the server is up and running start the test from a second console:
docker run
--rm -it --ulimit memlock=-1 --ulimit stack=67108864
--gpus=all --ipc=host --network host
nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc2
bash -c '
pip install gpt-oss
export OPENAI_API_KEY=dummy
python -m gpt_oss.evals --model openai/gpt-oss-120b --eval aime25 --base-url http://0.0.0.0:8355/v1/ --sampler=chat_completions --n-threads 64 --reasoning-effort high
'
#after some minutes (random...) the crash happens....
Expected behavior
The model sould be served without crash.
actual behavior
After some usage -> crash with CUDA illegal instruction.
[TensorRT-LLM][TRACE] std::tuple<std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > > tensorrt_llm::batch_manager::MicroBatchScheduler::operator()(tensorrt_llm::batch_manager::RequestVector&, const tensorrt_llm::batch_manager::ReqIdsSet&, SizeType32, std::optional) const stop
[02/05/2026-06:58:48] [TRT-LLM] [V] has 8 active_requests, scheduled 0 context requests and 8 generation requests
[TensorRT-LLM][TRACE] Created event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Created event 0xf00c97274d50
[TensorRT-LLM][TRACE] Destroyed event 0xf00c97274d50
[TensorRT-LLM][TRACE] Destroyed event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Created event 0xf00c97274d50
[TensorRT-LLM][TRACE] Created event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Destroyed event 0xf00c981b27f0
[TensorRT-LLM][TRACE] Destroyed event 0xf00c97274d50
[02/05/2026-06:58:48] [TRT-LLM] [V] Detected use_mrope: False
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2455, in _forward_step
outputs = forward(scheduled_requests, self.resource_manager,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2437, in forward
return self.model_engine.forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/pytorch/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/utils.py", line 109, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 3455, in forward
outputs = self.cuda_graph_runner.replay(key, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py", line 397, in replay
self.graphs[key].replay()
File "/root/pytorch/torch/cuda/graphs.py", line 142, in replay
super().replay()
torch.AcceleratorError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [E] Encountered an error in forward function: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[TensorRT-LLM][DEBUG] Set request 8 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 9 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 10 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 11 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 12 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 13 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 14 from state 13 to 20
[TensorRT-LLM][DEBUG] Set request 15 from state 13 to 20
[02/05/2026-06:58:48] [TRT-LLM] [V] after gather, rank = 0, responses = [(8, LlmResponse(request_id=8, error_msg="CUDA error: an illegal instruction was encountered\nSearch for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=2)), (9, LlmResponse(request_id=9, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=3)), (10, LlmResponse(request_id=10, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=4)), (11, LlmResponse(request_id=11, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=5)), (12, LlmResponse(request_id=12, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=6)), (13, LlmResponse(request_id=13, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=7)), (14, LlmResponse(request_id=14, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=8)), (15, LlmResponse(request_id=15, error_msg="CUDA error: an illegal instruction was encountered\nSearch forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n", result=None, client_id=9))][02/05/2026-06:58:48] [TRT-LLM] [V] Client [worker_result_queue] connecting to tcp://127.0.0.1:35963 in PAIR
terminate called after throwing an instance of 'c10::AcceleratorError'
what(): CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.Exception raised from currentStreamCaptureStatusMayInitCtx at /root/pytorch/c10/cuda/CUDAGraphsC10Utils.h:71 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0xe8 (0xf029b81bdd98 in /root/pytorch/torch/lib/libc10.so)
frame #1: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) + 0x290 (0xf029b82c21a0 in /root/pytorch/torch/lib/libc10_cuda.so)
frame #2: + 0x10b9668 (0xf02765ea9668 in /root/pytorch/torch/lib/libtorch_cuda.so)
frame #3: + 0x5cc184 (0xf027a450c184 in /root/pytorch/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::~TensorImpl() + 0x14 (0xf029b8198a38 in /root/pytorch/torch/lib/libc10.so)
frame #5: + 0xbf3414 (0xf027a4b33414 in /root/pytorch/torch/lib/libtorch_python.so)
frame #6: + 0xbf3794 (0xf027a4b33794 in /root/pytorch/torch/lib/libtorch_python.so)
frame #7: /usr/bin/python() [0x4f9d6c]
frame #8: /usr/bin/python() [0x523b30]
frame #9: /usr/bin/python() [0x4f9dcc]
frame #10: /usr/bin/python() [0x523b30]
frame #11: _PyEval_EvalFrameDefault + 0x508 (0x564b3c in /usr/bin/python)
frame #12: /usr/bin/python() [0x4c7024]
frame #13: _PyEval_EvalFrameDefault + 0x3cf8 (0x56832c in /usr/bin/python)
frame #14: /usr/bin/python() [0x4c7024]
frame #15: /usr/bin/python() [0x6e5050]
frame #16: /usr/bin/python() [0x686e10]
frame #17: + 0x8595c (0xf029d988595c in /usr/lib/aarch64-linux-gnu/libc.so.6)
frame #18: + 0xeb89c (0xf029d98eb89c in /usr/lib/aarch64-linux-gnu/libc.so.6)
[gx10-c635:47402] *** Process received signal ***
[gx10-c635:47402] Signal: Aborted (6)
[gx10-c635:47402] Signal code: (-6)
[gx10-c635:47402] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xf029d9b32968]
[gx10-c635:47402] [ 1] /usr/lib/aarch64-linux-gnu/libc.so.6(+0x87608)[0xf029d9887608]
[gx10-c635:47402] [ 2] /usr/lib/aarch64-linux-gnu/libc.so.6(gsignal+0x1c)[0xf029d983cb3c]
[gx10-c635:47402] [ 3] /usr/lib/aarch64-linux-gnu/libc.so.6(abort+0xf4)[0xf029d9827e00]
[gx10-c635:47402] [ 4] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x1d8)[0xf029b84ad4d8]
[gx10-c635:47402] [ 5] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(+0xaa570)[0xf029b84aa570]
[gx10-c635:47402] [ 6] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(__cxa_call_terminate+0x44)[0xf029b84a0dec]
[gx10-c635:47402] [ 7] /usr/lib/aarch64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0xa4)[0xf029b84a9964]
[gx10-c635:47402] [ 8] /usr/lib/aarch64-linux-gnu/libgcc_s.so.1(+0x17354)[0xf029b83d7354]
[gx10-c635:47402] [ 9] /usr/lib/aarch64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0x84)[0xf029b83d7924]
[gx10-c635:47402] [10] /root/pytorch/torch/lib/libtorch_cuda.so(+0x10ba138)[0xf02765eaa138]
[gx10-c635:47402] [11] /root/pytorch/torch/lib/libtorch_python.so(+0x5cc184)[0xf027a450c184]
[gx10-c635:47402] [12] /root/pytorch/torch/lib/libc10.so(_ZN3c1010TensorImplD0Ev+0x14)[0xf029b8198a38]
[gx10-c635:47402] [13] /root/pytorch/torch/lib/libtorch_python.so(+0xbf3414)[0xf027a4b33414]
[gx10-c635:47402] [14] /root/pytorch/torch/lib/libtorch_python.so(+0xbf3794)[0xf027a4b33794]
[gx10-c635:47402] [15] /usr/bin/python[0x4f9d6c]
[gx10-c635:47402] [16] /usr/bin/python[0x523b30]
[gx10-c635:47402] [17] /usr/bin/python[0x4f9dcc]
[gx10-c635:47402] [18] /usr/bin/python[0x523b30]
[gx10-c635:47402] [19] /usr/bin/python(_PyEval_EvalFrameDefault+0x508)[0x564b3c]
[gx10-c635:47402] [20] /usr/bin/python[0x4c7024]
[gx10-c635:47402] [21] /usr/bin/python(_PyEval_EvalFrameDefault+0x3cf8)[0x56832c]
[gx10-c635:47402] [22] /usr/bin/python[0x4c7024]
[gx10-c635:47402] [23] /usr/bin/python[0x6e5050]
[gx10-c635:47402] [24] /usr/bin/python[0x686e10]
[gx10-c635:47402] [25] /usr/lib/aarch64-linux-gnu/libc.so.6(+0x8595c)[0xf029d988595c]
[gx10-c635:47402] [26] /usr/lib/aarch64-linux-gnu/libc.so.6(+0xeb89c)[0xf029d98eb89c]
[gx10-c635:47402] *** End of error message ***
[02/05/2026-06:58:48] [TRT-LLM] [V] Reset Python GC thresholds to default value: (700, 10, 10)
[02/05/2026-06:58:48] [TRT-LLM] [V] Set Python GC threshold to customized value: 20000
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.INFO: 127.0.0.1:41676 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.INFO: 127.0.0.1:41666 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.INFO: 127.0.0.1:41710 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.INFO: 127.0.0.1:41726 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.INFO: 127.0.0.1:41716 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.INFO: 127.0.0.1:41698 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.INFO: 127.0.0.1:41742 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [V] Exception in _handle_response: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [E] Error in harmony chat completion: %s CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.[02/05/2026-06:58:48] [TRT-LLM] [V] Error details: %s Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 957, in chat_harmony
response = await create_harmony_response(promise, postproc_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 866, in create_harmony_response
await promise.aresult()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 794, in aresult
await self._aresult_step()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 772, in _aresult_step
self._handle_response(response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 629, in _handle_response
GenerationResultBase._handle_response(self, response)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 44, in wrapper
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/utils.py", line 40, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 476, in _handle_response
handler(response.error_msg)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/executor.py", line 280, in _handle_background_error
raise error
tensorrt_llm.executor.utils.RequestError: CUDA error: an illegal instruction was encountered
Search for
cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.INFO: 127.0.0.1:41692 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41698) is disconnected, abort 2
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41676) is disconnected, abort 3
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41692) is disconnected, abort 4
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41666) is disconnected, abort 5
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41716) is disconnected, abort 6
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41710) is disconnected, abort 7
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41726) is disconnected, abort 8
[02/05/2026-06:58:48] [TRT-LLM] [I] Address(host='127.0.0.1', port=41742) is disconnected, abort 9
Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
additional notes
The above output was created with a custom torch build with TORCH_USE_CUDA_DSA enabled. TLLM_LOG_LEVEL=TRACE and CUDA_LAUNCH_BLOCKING=1 were defined. But I can not see the source of the error.
Is the problem reproducable? What can I try next?
Many thanks for yourhelp.
Before submitting a new issue...