docker · dgageot · Jun 12, 2026 · Jun 12, 2026 · Jun 12, 2026
@@ -108,13 +108,13 @@ See the [Model Providers]({{ '/providers/overview/' | relative_url }}) section f
 
 Control how much the model "thinks" before responding:
 
-| Provider   | Format     | Values                                       | Default                          |
-| ---------- | ---------- | -------------------------------------------- | -------------------------------- |
-| OpenAI     | string     | `minimal`, `low`, `medium`, `high`, `xhigh`  | `medium` (always-reasoning models only) |
-| Anthropic  | int        | 1024–32768 tokens                            | off                              |
-| Gemini 2.5 | int        | 0 (off), -1 (dynamic), or token count        | -1 (dynamic)                     |
-| Gemini 3   | string     | `minimal`, `low`, `medium`, `high`           | varies                           |
-| All        | string/int | `none` or `0` to disable                     | —                                |
+| Provider   | Format     | Values                                                              | Default                          |
+| ---------- | ---------- | ------------------------------------------------------------------- | -------------------------------- |
+| OpenAI     | string     | `minimal`, `low`, `medium`, `high`, `xhigh`                         | `medium` (always-reasoning models only) |
+| Anthropic  | int or str | 1024–32768 tokens, or `adaptive`, `adaptive/<effort>`, effort level | off                              |
+| Gemini 2.5 | int        | 0 (off), -1 (dynamic), or token count                               | -1 (dynamic)                     |
+| Gemini 3   | string     | `minimal`, `low`, `medium`, `high`                                  | varies                           |
+| All        | string/int | `none` or `0` to disable                                            | —                                |
 
 ```yaml
 models:

@@ -138,19 +138,24 @@ models:
   gpt:
     provider: openai
     model: gpt-5
-    thinking_budget: low # minimal | low | medium | high | xhigh | max | adaptive/<level>
+    thinking_budget: low # minimal | low | medium | high | xhigh
 ```
 
 ### Anthropic
 
-Uses an integer token budget (1024–32768):
+Uses an integer token budget (1024–32768), or — on adaptive-capable models (Opus 4.6+) — `adaptive`, `adaptive/<effort>`, or a bare effort level:
 
 ```yaml
 models:
   claude:
     provider: anthropic
     model: claude-sonnet-4-5
     thinking_budget: 16384 # must be < max_tokens
+
+  opus:
+    provider: anthropic
+    model: claude-opus-4-6
+    thinking_budget: adaptive # or adaptive/<effort>, or low | medium | high | xhigh | max
 ```
 
 ### Google Gemini 2.5
@@ -161,7 +166,7 @@ Uses an integer token budget. `0` disables, `-1` lets the model decide:
 models:
   gemini:
     provider: google
-    model: gemini-3.5-flash
+    model: gemini-2.5-flash
     thinking_budget: -1 # dynamic (default)
 ```
 
@@ -174,7 +179,7 @@ models:
   gemini3:
     provider: google
     model: gemini-3-flash
-    thinking_budget: medium # minimal | low | medium | high | xhigh | max | adaptive/<level>
+    thinking_budget: medium # minimal | low | medium | high
 ```
 
 ### Disabling Thinking
@@ -185,6 +190,10 @@ Works for all providers:
 thinking_budget: none # or 0
 ```
 
+Models that always reason (OpenAI o-series, gpt-5, Gemini 3) fall back to the API default and still reason internally.
+
+See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for per-provider details, including AWS Bedrock and Docker Model Runner.
+
 ## Task Budget
 
 **Anthropic-only.**
@@ -369,7 +378,7 @@ providers:
     provider: anthropic
     token_key: MY_ANTHROPIC_KEY
     max_tokens: 16384
-    thinking_budget: high
+    thinking_budget: 8192
     temperature: 0.5
 
 models:
@@ -381,7 +390,7 @@ models:
   claude_fast:
     provider: my_anthropic
     model: claude-haiku-4-5
-    thinking_budget: low  # Overrides provider default
+    thinking_budget: 1024  # Overrides provider default
 ```
 
 See [Provider Definitions]({{ '/providers/custom/' | relative_url }}) for the full list of inheritable properties.
@@ -412,7 +412,7 @@ providers:
     provider: anthropic
     token_key: TEAM_ANTHROPIC_KEY
     max_tokens: 32768
-    thinking_budget: high
+    thinking_budget: 16384
 
 models:
   azure_gpt:

@@ -1,12 +1,12 @@
 ---
 title: "Thinking / Reasoning"
-description: "Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, and AWS Bedrock."
+description: "Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, AWS Bedrock, and Docker Model Runner."
 permalink: /guides/thinking/
 ---
 
 # Thinking / Reasoning
 
-_Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, and AWS Bedrock._
+_Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, AWS Bedrock, and Docker Model Runner._
 
 ## What Is Thinking?
 
@@ -22,14 +22,18 @@ docker-agent exposes this through a single `thinking_budget` field on any named
 
 ## Quick Reference
 
-| Provider       | Format     | Values                                                       | Default      |
-| -------------- | ---------- | ------------------------------------------------------------ | ------------ |
-| OpenAI         | string     | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`, `adaptive/<level>` (`max` only via `adaptive/max`) | `medium`     |
-| Anthropic      | int or str | 1024–32768 tokens, or `adaptive`, `low`–`max`, `none`        | off          |
-| Gemini 2.5     | int        | `0` (off), `-1` (dynamic), or token count (max 24576 / 32768) | `-1` (dynamic)|
-| Gemini 3       | string     | `minimal`, `low`, `medium`, `high`                           | model-dependent |
-| AWS Bedrock    | int or str | 1024–32768 tokens (`minimal`–`max` mapped to tokens)         | off          |
-| xAI / Mistral  | string     | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`          | off          |
+| Provider            | Format     | Values                                                                                  | Default            |
+| ------------------- | ---------- | --------------------------------------------------------------------------------------- | ------------------ |
+| OpenAI              | string     | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`                                     | `medium` (API default) |
+| Anthropic           | int or str | 1024–32768 tokens, or `minimal`–`max`, `adaptive`, `adaptive/<effort>`, `none`          | off                |
+| Gemini 2.5          | int        | `0` (off), `-1` (dynamic), or token count (max 24576 / 32768)                           | `-1` (dynamic)     |
+| Gemini 3            | string     | `minimal`, `low`, `medium`, `high`                                                      | API default (model-dependent) |
+| AWS Bedrock         | int or str | 1024–32768 tokens (`minimal`–`max` mapped to tokens); `adaptive`, `adaptive/<effort>` for Opus 4.6+ (rejected by older Claude models) | off                |
+| Docker Model Runner | int or str | token count, `minimal`–`max` (mapped to tokens), `adaptive` (unlimited), `none`         | engine default     |
+
+String values are case-insensitive. The full set of accepted strings is `none`, `minimal`, `low`, `medium`, `high`, `xhigh`, `max`, `adaptive`, and `adaptive/<effort>` — but each provider only honors the subset listed above. Unsupported values either fail at request time (OpenAI) or are mapped/ignored as described per provider below.
+
+> `thinking_budget` is only applied by the providers listed above. Other OpenAI-compatible providers (xAI, Mistral, Ollama, …) currently ignore it — see [xAI and Mistral](#xai-grok-and-mistral).
 
 ## OpenAI
 
@@ -47,33 +51,21 @@ models:
 
 | Level     | Description                                              |
 | --------- | -------------------------------------------------------- |
-| `none`    | Disable thinking (alias for `0`). Minimum output floor still applies. |
+| `none`    | Don't request extra reasoning (alias for `0`); the API's own default still applies. |
 | `minimal` | Fastest; lightest reasoning pass.                        |
 | `low`     | Quick reasoning for straightforward tasks.               |
 | `medium`  | Balanced default.                                        |
 | `high`    | More thorough; recommended for complex tasks.            |
 | `xhigh`   | Near-maximum effort; slower but most accurate.           |
 
+These five effort levels (`minimal`–`xhigh`) are the **only** values accepted for OpenAI. Token counts, `max`, `adaptive`, and `adaptive/<effort>` are rejected with a configuration error at request time. Older models (o1, o3-mini) only accept `low`/`medium`/`high` — sending an unsupported level returns an API error.
+
 <div class="callout callout-warning" markdown="1">
 <div class="callout-title">Tokens and max_tokens
 </div>
-  <p>OpenAI reasoning models always reason internally — even with <code>thinking_budget: none</code> there are hidden reasoning tokens that count against <code>max_tokens</code>. docker-agent automatically raises the output-token floor so hidden reasoning cannot starve visible text output.</p>
+  <p>OpenAI reasoning models always reason internally — even with <code>thinking_budget: none</code> there are hidden reasoning tokens that count against <code>max_tokens</code>. docker-agent automatically raises the output-token floor for its internal low-effort calls (e.g. title generation) so hidden reasoning cannot starve visible text output.</p>
 </div>
 
-### Adaptive thinking (OpenAI)
-
-Use `adaptive/<level>` to let the model decide when to apply extended reasoning and scale effort dynamically:
-
-```yaml
-models:
-  gpt-adaptive:
-    provider: openai
-    model: gpt-5-mini
-    thinking_budget: adaptive/medium  # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max
-```
-
-> `max` is only valid inside the `adaptive/` prefix — `thinking_budget: max` (bare) is not accepted by OpenAI.
-
 ## Anthropic
 
 Anthropic Claude supports two thinking modes: a **token budget** (older models) and **adaptive / effort-based** thinking (newer models).
@@ -94,30 +86,43 @@ docker-agent auto-adjusts `max_tokens` when you set a thinking budget but leave
 
 ### Adaptive thinking (Claude Opus 4.6+)
 
-Newer Claude models support adaptive thinking, where the model decides how much to think. Use `adaptive` or pair it with an effort level:
+Newer Claude models support adaptive thinking, where the model decides how much to think. **Claude Opus 4.6, 4.7 and 4.8 only support adaptive thinking** — they reject token-based budgets. Use `adaptive`, `adaptive/<effort>`, or a bare effort level — on Anthropic, a bare effort level like `high` is shorthand for adaptive thinking at that effort:
 
 ```yaml
 models:
   claude-adaptive:
     provider: anthropic
     model: claude-opus-4-6
-    thinking_budget: adaptive          # model decides effort
+    thinking_budget: adaptive          # model decides effort (defaults to high)
 
   claude-adaptive-low:
     provider: anthropic
     model: claude-opus-4-6
-    thinking_budget: low               # adaptive with low effort: low | medium | high | max
+    thinking_budget: low               # same as adaptive/low
+
+  claude-adaptive-max:
+    provider: anthropic
+    model: claude-opus-4-6
+    thinking_budget: adaptive/max      # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max
 ```
 
 **Adaptive effort levels:**
 
 | Level     | Description                                       |
 | --------- | ------------------------------------------------- |
+| `minimal` | Treated as `low` (bare form only).                |
 | `low`     | Minimal thinking; fastest adaptive mode.          |
 | `medium`  | Moderate effort.                                  |
 | `high`    | Thorough reasoning; default for `adaptive`.       |
+| `xhigh`   | Very high effort (newer models, e.g. Opus 4.7+).  |
 | `max`     | Maximum effort.                                   |
 
+<div class="callout callout-warning" markdown="1">
+<div class="callout-title">Effort strings require adaptive-capable models
+</div>
+  <p>Every string effort value on Anthropic is sent as adaptive thinking (<code>output_config.effort</code>), which only newer Claude models (Opus 4.6+) accept. For older models like Sonnet 4.5, use an integer token budget instead. Conversely, models that <em>only</em> support adaptive thinking (Opus 4.6, 4.7, 4.8) automatically have token budgets coerced to <code>adaptive</code> (a warning is logged).</p>
+</div>
+
 ### Disabling thinking
 
 ```yaml
@@ -223,7 +228,7 @@ models:
 
 ## AWS Bedrock (Claude)
 
-Bedrock Claude uses a token budget like Anthropic, but only supports integer token values. String effort levels (`minimal`–`max`) are mapped automatically:
+Bedrock Claude uses a token budget like Anthropic. String effort levels (`minimal`–`max`) are mapped automatically:
 
 | Effort level | Token budget |
 | ------------ | ------------ |
@@ -251,30 +256,47 @@ models:
       # interleaved_thinking is auto-enabled when thinking_budget is set
 ```
 
+**Claude Opus 4.6+ on Bedrock requires adaptive thinking** — these models reject `thinking.type=enabled` (token budgets). Configure them with `adaptive` or `adaptive/<effort>`; docker-agent auto-coerces token budgets and effort levels on these models with a warning:
+
+```yaml
+models:
+  bedrock-opus-adaptive:
+    provider: amazon-bedrock
+    model: global.anthropic.claude-opus-4-8
+    thinking_budget: adaptive/high
+    provider_opts:
+      region: us-east-1
+```
+
 <div class="callout callout-warning" markdown="1">
 <div class="callout-title">Bedrock thinking requirements
 </div>
-  <p>Bedrock Claude requires <code>thinking_budget</code> to be ≥ 1024 and less than <code>max_tokens</code>. docker-agent logs a warning and ignores the budget if either condition is violated. Interleaved thinking requires the <code>interleaved-thinking-2025-05-14</code> beta header, which docker-agent adds automatically; it is auto-enabled whenever a thinking budget is set on a Bedrock-hosted Claude model.</p>
+  <p>Bedrock Claude requires token-based <code>thinking_budget</code> values to be ≥ 1024 and less than <code>max_tokens</code>. docker-agent logs a warning and ignores the budget if either condition is violated. Interleaved thinking requires the <code>interleaved-thinking-2025-05-14</code> beta header, which docker-agent adds automatically; it is auto-enabled whenever a token thinking budget is set on a Bedrock-hosted Claude model (adaptive thinking interleaves on its own).</p>
 </div>
 
-## xAI (Grok) and Mistral
+## Docker Model Runner (local models)
 
-Both providers use the OpenAI-compatible API and accept the same effort strings:
+For local models, `thinking_budget` is forwarded to the inference engine. Both token counts and effort strings work; effort strings map to the same token scale as Bedrock (`minimal`=1024 … `xhigh`/`max`=32768), and `adaptive` means unlimited:
 
 ```yaml
 models:
-  grok-thinker:
-    provider: xai
-    model: grok-3
-    thinking_budget: high   # minimal | low | medium | high | xhigh | none
-
-  mistral-thinker:
-    provider: mistral
-    model: mistral-large-latest
-    thinking_budget: medium  # minimal | low | medium | high | xhigh | none
+  local:
+    provider: dmr
+    model: ai/qwen3
+    thinking_budget: medium   # llama.cpp: reasoning-budget=8192; vLLM: thinking_token_budget=8192
 ```
 
-Thinking is **off by default** for these providers. Set `thinking_budget` explicitly to enable it.
+- **llama.cpp**: sent as `reasoning-budget` at model-configure time.
+- **vLLM**: sent as `thinking_token_budget` on each request.
+- **MLX / SGLang**: no reasoning-budget knob; the value is silently ignored.
+
+See the [Docker Model Runner provider page]({{ '/providers/dmr/' | relative_url }}) for details.
+
+## xAI (Grok) and Mistral
+
+xAI and Mistral run through docker-agent's OpenAI-compatible client, but the `reasoning_effort` parameter is only sent for OpenAI reasoning model names (o-series, gpt-5). **Setting `thinking_budget` on Grok or Mistral models currently has no effect** — the value is accepted by config validation but never sent to the API.
+
+Grok and Mistral reasoning models (e.g. `grok-3-mini`, `magistral`) manage reasoning on their own; for non-reasoning models, consider the [think tool]({{ '/tools/think/' | relative_url }}) instead.
 
 ## Disabling Thinking
 
@@ -293,6 +315,8 @@ models:
     thinking_budget: 0
 ```
 
+`none` and `0` clear docker-agent's thinking configuration — no thinking parameter is sent. Models that always reason (OpenAI o-series, gpt-5, Gemini 3) then fall back to the API's default behavior and still reason internally; only models with optional thinking (Gemini 2.5, Claude, local models) are fully disabled.
+
 ## Choosing an Effort Level
 
 | Task complexity                  | Recommended level       |
@@ -321,18 +345,18 @@ Define a provider with a default `thinking_budget` and all models that reference
 providers:
   deep-anthropic:
     provider: anthropic
-    thinking_budget: high
+    thinking_budget: adaptive/high
     max_tokens: 32768
 
 models:
   claude-smart:
     provider: deep-anthropic
-    model: claude-sonnet-4-5   # inherits thinking_budget: high
+    model: claude-opus-4-6     # inherits thinking_budget: adaptive/high
 
   claude-faster:
     provider: deep-anthropic
-    model: claude-haiku-4-5
-    thinking_budget: low       # overrides to low
+    model: claude-opus-4-6
+    thinking_budget: low       # overrides to adaptive/low
 ```
 
 ## Full Example

@@ -104,7 +104,9 @@ models:
 
 ## Thinking Budget
 
-Anthropic uses integer token budgets (1024–32768). Thinking is off unless you set `thinking_budget`; when set, interleaved thinking is auto-enabled:
+Anthropic accepts either an integer token budget or a string effort value. Thinking is off unless you set `thinking_budget`; when set, interleaved thinking is auto-enabled.
+
+**Token budget** (1024–32768; works on all extended-thinking Claude models):
 
 ```yaml
 models:
@@ -114,6 +116,23 @@ models:
     thinking_budget: 16384 # must be < max_tokens
 ```
 
+**Adaptive / effort-based** (Claude Opus 4.6+ only — every string value is sent as adaptive thinking via `output_config.effort`):
+
+```yaml
+models:
+  opus-adaptive:
+    provider: anthropic
+    model: claude-opus-4-6
+    thinking_budget: adaptive # model decides effort (defaults to high)
+
+  opus-effort:
+    provider: anthropic
+    model: claude-opus-4-6
+    thinking_budget: high # low | medium | high | xhigh | max (same as adaptive/<effort>)
+```
+
+On models that reject token-based thinking (Opus 4.6, 4.7, 4.8), an integer budget is automatically coerced to `adaptive` with a logged warning. See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for the full cross-provider reference.
+
 ## Interleaved Thinking
 
 Auto-enabled whenever a thinking budget is configured on a Claude model. Allows tool calls during model reasoning for more integrated problem-solving: