Automatic model routing: route image inputs to vision-capable model, return text to text-only model

### Describe the feature or problem you''d like to solve

When using a text-only model like DeepSeek as the primary model, image inputs are either ignored or cause errors, because DeepSeek lacks vision capabilities. Users must manually switch models with /model every time they want to include an image, which breaks workflow flow.

### Proposed solution

Automatic model routing based on input modality. When a user''s prompt includes images (pasted, drag-and-drop, or @-referenced), Copilot CLI should:

1. Detect that the primary model doesn''t support vision
2. Auto-route the image(s) to a vision-capable model (e.g., GPT-4o, Claude Sonnet 4.5) configured as the "vision fallback"
3. Receive a text description of the image(s) from the vision model
4. Inject that text description into the prompt sent to the primary (text-only) model
5. The user only sees the final response from their chosen primary model, with the image description transparently provided as context

Configuration (example .copilot/config or copilot-instructions.md):
```
vision_fallback_model: "gpt-4o"
vision_fallback_behavior: "describe_and_forward"
```

### Example prompts or workflows

1. User has DeepSeek selected. Pastes a screenshot of a UI bug and types "Fix this layout issue". CLI detects image + text-only model -> sends image to GPT-4o -> GPT-4o returns text description -> that description is prepended to DeepSeek''s prompt -> DeepSeek fixes the code.

2. User @-references a diagram.png file: "Implement this architecture diagram". Same routing flow - architecture described in text, DeepSeek implements.

3. /model deepseek is active. User pastes error screenshot. Without needing to switch models, user gets a code fix.

4. Works in reverse too - if the primary model IS vision-capable, no routing occurs; image is sent directly.

5. Configurable: user can disable auto-routing or choose which vision model to use as fallback.

### Additional context

- This is inspired by the existing sub-agent and delegation architecture already present in Copilot CLI
- Reduces friction for users who prefer text-only models for cost/performance but occasionally need vision capabilities
- Could be implemented as a lightweight pre-processing step before the main model invocation
- Related: the /fleet and custom agents infrastructure could potentially be leveraged for this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic model routing: route image inputs to vision-capable model, return text to text-only model #3295

Describe the feature or problem you''d like to solve

Proposed solution

Example prompts or workflows

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Automatic model routing: route image inputs to vision-capable model, return text to text-only model #3295

Description

Describe the feature or problem you''d like to solve

Proposed solution

Example prompts or workflows

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions