Skip to content

[framework] _continue_generate corrupts conversation when truncated response contains tool_calls #909

@lobstersyrup

Description

@lobstersyrup

Description

_continue_generate() in ms_agent/llm/openai_llm.py creates invalid conversation history when a truncated assistant response contains tool_calls. The method appends the partial message (with dangling tool_calls) to the message history without executing the tools first, then makes another API call. Providers that strictly validate the OpenAI spec reject the resulting conversation state.

Command

The deep_research/v2 pipeline with any LLM that generates long responses (tested with deepseek-v4-pro). The reporter sub-agent generates a response containing both text content and tool_calls; when the response hits finish_reason: length, the continue-generation path corrupts the conversation.

What happened

The DeepSeek API returns:

openai.BadRequestError: Error code: 400 - {
  'error': {
    'message': "An assistant message with 'tool_calls' must be followed by tool messages
                responding to each 'tool_call_id'. (insufficient tool messages following
                tool_calls message)",
    'type': 'invalid_request_error'
  }
}

The error retries 3 times (all fail identically), then the sub-agent crashes with RuntimeError: Sub-agent reporter_tool failed.

What was expected

When an assistant message has tool_calls, those tools should be executed and their responses appended to the conversation history BEFORE any subsequent LLM calls. The continue-gen path should exit early and let the normal tool execution loop handle the tool_calls.

Root cause

In openai_llm.py, _continue_generate (lines 504-541) and _stream_continue_generate (lines 274-364) check finish_reason but never check whether new_message.tool_calls is non-empty:

# _continue_generate, line 524-535:
new_message = self._format_output_message(completion)
if completion.choices[0].finish_reason in ['length', 'null'] and ...:
    completion = self._call_llm_for_continue_gen(
        messages, new_message, tools, **kwargs)

_call_llm_for_continue_gen (line 487-502) appends new_message (with its tool_calls) to messages, then calls _call_llm. The API receives:

assistant: {"role": "assistant", "content": "I'll write the report...",
            "tool_calls": [{"id": "call_abc", "function": {"name": "write_file", ...}}]}  # APPENDED
# NO tool response with tool_call_id="call_abc"
# Next: user/system message, or another assistant message

DeepSeek validates the entire message list and rejects it.

Affected code

  • ms_agent/llm/openai_llm.py - _continue_generate() (lines 504-541)
  • ms_agent/llm/openai_llm.py - _stream_continue_generate() (lines 274-364)
  • ms_agent/llm/openai_llm.py - _call_llm_for_continue_gen() (lines 487-502)

Reproduction

The bug triggers reliably when ALL of these conditions hold:

  1. The model generates a response that includes both content text and tool_calls
  2. The response exceeds the model's max_tokens, causing finish_reason: length
  3. The continue-generation logic fires (max_continue_runs not yet exhausted)

This is most likely to occur with agents that produce long mixed text+tool responses (report writers, code generators with tool calls mid-response).

Suggested fix

Before entering the continue-gen path, check if the truncated message has tool_calls. If it does, return the message as-is and let the normal step() loop handle tool execution. The tool calls will be executed, responses appended, and the next LLM call will have a valid conversation.

# In _continue_generate and _stream_continue_generate:
new_message = self._format_output_message(completion)
if new_message.tool_calls:
    # Let tool execution handle this - don't try to continue
    return new_message
if completion.choices[0].finish_reason in ['length', 'null'] and ...:
    # safe to continue - no dangling tool_calls
    ...

Workaround

None. The bug is in the core continue-generation logic and cannot be worked around via config. Any agent that generates long tool-calling responses will eventually hit it.

Versions / Dependencies

  • MS-Agent: v0.11.0 (PyPI, installed via pip install 'ms-agent[research]')
  • Python: 3.11
  • OS: Linux (Docker python:3.11-slim)
  • OpenAI SDK: 2.33.0
  • LLM: deepseek-v4-pro (via openai_base_url: https://api.deepseek.com/v1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions