Guide developers to the right next step after every `azd ai agent` command and add `azd ai agent doctor`

After every successful `azd ai agent` command, the CLI should inspect the project's real state -- `azure.yaml` services, azd env vars, running endpoints -- and guide the developer exactly what to do next. We also need a new **`azd ai agent doctor`** command for on-demand health checks.

Right now, `azd ai agent init` tells you to deploy. That's it. No mention of local dev, no prereq checks, no adaptation to the sample you picked. A developer who just scaffolded a responses-protocol agent has no idea they can run it locally. A developer with unresolved toolbox deps gets told to `run` when they should `azd provision` first. We fix this by making every success path exit with state-aware, actionable guidance, and by giving developers `doctor` as a dedicated diagnostic.

Scope: `init`, `run`, `invoke` (local and remote), `show`, and `deploy` (via post-deploy hooks). Error-path guidance is out of scope -- [`exterrors`](https://github.com/Azure/azure-dev/tree/main/cli/azd/extensions/azure.ai.agents/internal/exterrors) already handles that well and doesn't need rework.

Assumes the [Unified azure.yaml proposal](unify-azure-yaml.md) has landed -- `azure.yaml` is the sole config file, `agent.yaml` and `agent.manifest.yaml` are gone.

## Current Problems

1. **Init pushes you straight to deploy.** After `azd ai agent init`, the CLI checks `AZURE_AI_PROJECT_ID` and suggests either `azd deploy <service>` or `azd up` ([init.go#L1718-L1733](https://github.com/Azure/azure-dev/blob/main/cli/azd/extensions/azure.ai.agents/internal/cmd/init.go#L1718-L1733), [init_from_code.go#L136-L140](https://github.com/Azure/azure-dev/blob/main/cli/azd/extensions/azure.ai.agents/internal/cmd/init_from_code.go#L136-L140)). Both are deploy paths. Local dev (`run` -> `invoke --local`) never gets mentioned. For prototypers, local iteration should be the default -- deployment is "when you're ready," not the only option.

2. **Run's invoke hint is static and often wrong.** After the agent starts, `run` prints `After startup, in another terminal, try: azd ai agent invoke --local "Hello!"` ([run.go#L157-L158](https://github.com/Azure/azure-dev/blob/main/cli/azd/extensions/azure.ai.agents/internal/cmd/run.go#L157-L158)). Same string every time, regardless of protocol. An invocations-protocol agent needs `'{"message": "Hello!"}'`, not a bare string. The hint works for responses-protocol agents and is misleading for everything else.

3. **Success paths are silent.** Error paths have solid suggestions via the [`exterrors` package](https://github.com/Azure/azure-dev/tree/main/cli/azd/extensions/azure.ai.agents/internal/exterrors) (`Validation`, `Dependency`, `Auth`, `Configuration`, `Compatibility` -- each with a `Suggestion` field). But when a command succeeds? Nothing. After a successful `invoke --local`, `show`, or remote `invoke`, the developer gets output and a blank terminal. No "here's what to try next."

4. **Init conflates provision and deploy.** Some samples need `azd provision` before `run` -- to create model deployments, toolboxes, or connections via Bicep -- but they don't need a full `azd up` (which also deploys the agent). Today init suggests `azd up` or `azd deploy` without distinguishing these, so developers deploy prematurely when all they wanted was to set up deps for local dev.

5. **No way to recover context or diagnose issues in one shot.** Two concrete gaps: (a) You lost the CLI's suggestion -- cleared the terminal, closed it, switched contexts -- and now you don't remember what to do next. There's no command to ask "where was I?" (b) You hit an error from a command and don't know how to fix it. The error message might say what went wrong, but it doesn't check the broader context -- are your RBAC roles correct? Is the Foundry project reachable? Are your env vars stale? A `doctor` command could run comprehensive checks and give you the specific commands to fix what's broken.

## Solution Hypothesis

Every `azd ai agent` command's success path runs a **next-step resolver** that:

1. Assembles project state from multiple sources (see State Resolution below).
2. Walks a decision tree to pick the right suggestion for the command that just ran and the state it left behind.
3. After `run`, probes the agent's OpenAPI endpoint (`/invocations/docs/openapi.json`) to discover protocol-specific invoke payloads and build exact example commands. The extension [already has this code](https://github.com/Azure/azure-dev/blob/991b9743b4ab14ee8561677ffa627735c875a796/cli/azd/extensions/azure.ai.agents/internal/cmd/helpers.go#L296) -- the spec is optional and depends on the agent author including it.

### State Resolution

The resolver pulls project state from several sources, in priority order. Higher layers win, and the resolver works with whatever's available:

1. **Explicit flags** (`--project-endpoint`, `--agent`, etc.) -- always authoritative.
2. **Live runtime state** -- probe running agent endpoints, Foundry API status checks.
3. **`azure.yaml`** -- services, protocols, `uses` dependencies, `config.env` references.
4. **azd environment** (`.azure/<env>/.env`) -- which `${...}` variables are actually populated.

This layering matters because we're moving toward environment-optional -- data-plane commands work with just `--project-endpoint`, and brownfield developers may have set up Foundry manually. The resolver should give useful guidance with partial state, not go silent when the azd environment is missing or stale.

Output is human-readable text appended after the command's normal output, separated by a blank line.

### What We Can Ship Today

We can ship **better command choice** without any dependencies -- the resolver tells you which command to run next based on actual project state, not a hardcoded string. Works for every sample, every protocol, today.

**Exact invoke payload examples** (showing the right JSON body for your specific agent) work today for agents that include an OpenAPI spec. The invocation protocol already supports serving a spec at `GET /invocations/docs/openapi.json`, and the extension already has [`fetchOpenAPISpec`](https://github.com/Azure/azure-dev/blob/991b9743b4ab14ee8561677ffa627735c875a796/cli/azd/extensions/azure.ai.agents/internal/cmd/helpers.go#L296) to fetch and cache it. This is opt-in -- the agent author must define the spec (see [human-in-the-loop example](https://github.com/microsoft-foundry/foundry-samples/blob/fd4ecab832fa491b9ffb4abca862c073777b5e53/samples/python/hosted-agents/bring-your-own/invocations/human-in-the-loop/main.py#L119)). We include it by default in our samples so the getting-started happy path always has rich examples. When the spec isn't available, the resolver falls back to protocol-generic examples (`"Hello!"` for responses, `'{"message": "Hello!"}'` for invocations) plus a helpful hint to check `/invocations` directly to verify the right payload shape.

## Success Measures

- **No more dead-end exits.** A developer who runs `init` should always see a runnable next command -- not just "deploy."
- **Init-to-invoke is discoverable from CLI output alone.** You shouldn't need docs to figure out how to get from `init` to a working `invoke --local`.
- **Fewer misdirected commands.** If your deps aren't provisioned, you hear `azd provision`, not `azd ai agent run`. If you're ready for Azure, you hear `azd up`.
- **One-pass diagnosis.** `azd ai agent doctor` surfaces permission issues, missing env vars, and unreachable endpoints in a single run, instead of making you discover them through trial and error across five different commands.

## Decision Tree

The resolver fires after each command and evaluates the following. All `azure.yaml` field references use the post-unification schema from the [Unified azure.yaml proposal](unify-azure-yaml.md).

### State Inputs

| Input | Source | What it tells us |
|---|---|---|
| `HasProjectEndpoint` | azd env: `AZURE_AI_PROJECT_ENDPOINT` is set and non-empty | Foundry project is provisioned |
| `HasUnresolvedInfraVars` | Walk `azure.yaml` service configs; collect `${...}` references that map to known Bicep outputs (e.g., `AZURE_AI_MODEL_DEPLOYMENT_NAME`, connection IDs); check which are missing from azd env | Infra-provisioned dependencies haven't been deployed |
| `HasUnresolvedManualVars` | Remaining `${...}` references not attributable to Bicep outputs (e.g., API keys, custom config); check which are missing from `.env` or azd env | User-supplied values need manual setup |
| `HasToolboxes` | `azure.yaml` has services with `host: azure.ai.toolbox` | Project uses toolboxes that need server-side provisioning |
| `HasConnections` | `azure.yaml` agent `config.env` references connection env vars (e.g., `${GITHUB_MCP_CONN}`) that are unresolved | Connections not yet provisioned via Bicep |
| `Protocol` | `azure.yaml` agent `config.protocols[0].protocol` | Determines invoke payload shape |
| `IsDeployed` | azd env: agent deployment metadata or prior successful deploy | Agent has been pushed to Foundry |
| `AgentStatus` | Foundry API: agent show status | Runtime state of the deployed agent |
| `HasOpenAPI` | Probe running agent at `GET /invocations/docs/openapi.json` | Rich invoke examples available (opt-in, agent author defines spec) |

### After `init`

Today, init checks `AZURE_AI_PROJECT_ID` and branches between `azd deploy` and `azd up`. We expand this to cover local dev, distinguish provision from deploy, and inspect deeper state:

```
IF HasUnresolvedInfraVars:
    -> "azd provision" (create Foundry project and/or dependencies for local development)
ELSE IF HasUnresolvedManualVars:
    -> "azd env set <KEY> <value>" (user-supplied config needed in the azd environment)
ELSE:
    -> "azd ai agent run" (everything is set up, start locally)
```

**Example output (everything ready, project set up):**

```
Next:  azd ai agent run                       -- start the agent locally
       azd ai agent invoke --local "Hello!"    -- test it in another terminal

When ready to deploy to Azure, run azd deploy.
```

**Example output (dependencies not yet created):**

```
Next:  azd provision  -- set up your Foundry project, models, and connections

This creates your Foundry project and any model deployments, toolboxes,
or connections your agent needs for local development.
Once that finishes, run 'azd ai agent run' to start locally.

When ready to deploy to Azure, run azd deploy.
```

**Example output (project ready, but manual config values missing):**

```
Your project is ready, but some values need to be set:
  - MY_API_KEY (not set)

Set them in your azd environment:
  azd env set MY_API_KEY <your-value>

Then run 'azd ai agent run' to start locally.
```

### After `run` (on successful startup)

`run` is a long-lived foreground command. Today it prints a static hint: `After startup, in another terminal, try: azd ai agent invoke --local "Hello!"` ([run.go#L157-L158](https://github.com/Azure/azure-dev/blob/main/cli/azd/extensions/azure.ai.agents/internal/cmd/run.go#L157-L158)). We make this protocol-aware:

```
IF Protocol == "invocations":
    -> invoke --local '{"message": "Hello!"}'
ELSE IF Protocol == "responses":
    -> invoke --local "Hello!"
ELSE:
    -> invoke --local "Hello!" (generic)

+ HasOpenAPI -> replace generic payload with OpenAPI-derived example
```

**Example output (responses protocol):**

```
Agent is running at http://localhost:8088

Next:  azd ai agent invoke --local "Hello!"  -- test it in another terminal

Press Ctrl+C to stop.
```

**Example output (invocations protocol, with OpenAPI):**

```
Agent is running at http://localhost:8088

Next:  azd ai agent invoke --local '{"message": "What hotels are available in Seattle?"}'

Press Ctrl+C to stop.
```

### After `invoke --local`

Today, invoke's failure path already has one suggestion: when the HTTP connection fails, it prints `could not connect to localhost:<port> -- is the agent running? Start it with: azd ai agent run` ([invoke.go#L303](https://github.com/Azure/azure-dev/blob/main/cli/azd/extensions/azure.ai.agents/internal/cmd/invoke.go#L303), [invoke.go#L507](https://github.com/Azure/azure-dev/blob/main/cli/azd/extensions/azure.ai.agents/internal/cmd/invoke.go#L507)). We add success-path guidance:

```
IF success:
    -> "azd deploy" (push the agent to Azure)
    + "azd ai agent monitor --follow" (view logs after deploying)
```

To get to a successful `invoke --local`, the developer must have already provisioned (dependencies exist) and run the agent locally. So the natural next step is `azd deploy` -- infrastructure is already there, just push the agent.

**Example output (single agent):**

```
Next:  azd deploy  -- deploy the agent to Azure
       azd ai agent monitor --follow  -- view logs after deploying
```

**Example output (multi-agent project):**

```
Next:  azd deploy  -- deploy all agents to Azure

After deploying:
  azd ai agent invoke my-agent "Hello!"       -- test my-agent remotely
  azd ai agent invoke other-agent "Hello!"    -- test other-agent remotely
```

### After `invoke` (remote)

```
IF success:
    -> "azd ai agent show <agent>" (check agent status and details)
    + "azd ai agent monitor --follow" (view live logs)
IF failure:
    -> "azd ai agent monitor --follow" (check logs for errors)
```

### After `show`

Today, `show` has error-path guidance when the agent name or version can't be resolved: `Run 'azd deploy' first to deploy the agent, or check your azd environment values` ([show.go#L70-L83](https://github.com/Azure/azure-dev/blob/main/cli/azd/extensions/azure.ai.agents/internal/cmd/show.go#L70-L83)). We add success-path guidance:

```
IF status == "active" OR status == "idle":
    -> "azd ai agent invoke <agent> 'Hello!'" (send a test message)
IF status == "failed" OR status == "":
    -> "azd ai agent monitor --follow" (check logs)
ELSE:
    -> "azd ai agent show <agent>" (agent is in a transitional state, check again shortly)
```

### After `deploy` (via post-deploy hook)

Deployment goes through core `azd up` / `azd deploy`. The extension's deploy hooks ([service_target_agent.go](https://github.com/Azure/azure-dev/blob/main/cli/azd/extensions/azure.ai.agents/internal/project/service_target_agent.go)) run during deployment but don't print next-step guidance when they finish. We add agent-specific suggestions to the post-deploy hook's exit path:

```
IF success:
    -> "azd ai agent show <agent>" (verify the agent is running)
    + "azd ai agent invoke <agent> 'Hello!'" (test the deployment)
IF failure:
    -> "azd ai agent monitor --follow" (check deployment logs)
```

**Example output (success, single agent):**

```
Next:  azd ai agent show echo-agent              -- verify it's running
       azd ai agent invoke echo-agent "Hello!"    -- test the deployment
```

**Example output (success, multi-agent project):**

```
Next:  azd ai agent show my-agent        -- verify my-agent is running
       azd ai agent show other-agent     -- verify other-agent is running
       azd ai agent invoke my-agent "Hello!"     -- test my-agent
       azd ai agent invoke other-agent "Hello!"  -- test other-agent
```

## `azd ai agent doctor` -- On-Demand Health Check

### Why This Matters

Two specific gaps drive this:

1. **You lost context.** You cleared the terminal, closed it, switched to another task. The CLI told you what to do next, but that output is gone. There's no command to ask "where was I?" -- `doctor` fills that role by re-evaluating project state and telling you the next step.

2. **You hit an error and don't know how to fix it.** A command failed, but the error only tells you what went wrong in that command's scope. It doesn't check the broader picture -- are your RBAC roles correct? Is the Foundry project reachable? Are env vars stale? Is the model deployment actually there? `doctor` runs comprehensive checks across the full project and gives you the specific commands to fix what's broken.

### What `doctor` Does

`doctor` runs a fixed set of checks in order, top to bottom. Each check either passes (with a summary) or fails (with a specific remediation command). The checks are designed to cover the full surface area that individual commands can't see on their own.

**Check 1: azure.yaml validity.**
Verify the file exists, parses correctly, and declares at least one `azure.ai.agent` service.
- Pass: `azure.yaml: valid (2 services: echo-agent, agent-toolbox)`
- Fail: `azure.yaml: not found` or `azure.yaml: parse error at line 14`
- Fix: `azd ai agent init` (if missing) or manual edit (if malformed)

**Check 2: Authentication.**
Run the equivalent of `az account show` to verify the developer is signed in and has an active token.
- Pass: `Authentication: signed in as user@contoso.com`
- Fail: `Authentication: not signed in`
- Fix: `az login`

**Check 3: Foundry project reachability.**
Check that `AZURE_AI_PROJECT_ENDPOINT` is set, then probe the endpoint to confirm the project exists and responds.
- Pass: `Foundry project: reachable (endpoint: https://...)`
- Fail (not set): `Foundry project: not provisioned`
- Fail (unreachable): `Foundry project: endpoint set but not reachable (HTTP 403 or timeout)`
- Fix: `azd provision` (if not set) or check network/firewall (if unreachable)

**Check 4: Model deployments.**
For each `${...}` reference in `config.env` that maps to a model deployment, verify the env var is set and (if the project is reachable) that the deployment actually exists in the Foundry project.
- Pass: `Model deployment: gpt-4o (deployed)`
- Fail: `Model deployment: AZURE_AI_MODEL_DEPLOYMENT_NAME not set`
- Fix: `azd provision` (if defined in Bicep) or manual creation in the Foundry portal

**Check 5: Toolboxes.**
For each `azure.ai.toolbox` service in `azure.yaml`, verify it has been provisioned server-side.
- Pass: `Toolbox 'agent-toolbox': provisioned (3 tools)`
- Fail: `Toolbox 'agent-toolbox': not provisioned`
- Fix: `azd provision`

**Check 6: Connections.**
For each connection env var referenced in toolbox or agent configs, verify it's set and (if reachable) that the connection exists in the Foundry project.
- Pass: `Connection GITHUB_MCP_CONN: set`
- Fail: `Connection GITHUB_MCP_CONN: not set`
- Fix: `azd provision` or check Bicep outputs

**Check 7: RBAC permissions.**
Query the Foundry project's role assignments to verify the current identity has the required roles. This is the check that individual commands almost never do, and it's one of the most common sources of confusing errors.
- Pass: `Permissions: sufficient (Contributor + Cognitive Services User)`
- Fail: `Permissions: missing 'Cognitive Services User' role on the Foundry project`
- Fix: `az role assignment create --assignee user@contoso.com --role "Cognitive Services User" --scope <resource-id>`
- Alt: `Or ask your admin to grant access.`

**Check 8: Agent status.**
If the agent has been deployed (deployment metadata exists in azd env), query Foundry for its current status.
- Pass: `Agent 'echo-agent': active (v3)`
- Fail: `Agent 'echo-agent': failed`
- Fix: `azd ai agent monitor --follow` (check logs)

**Check 9: Manual env vars.**
Any `${...}` references in `azure.yaml` that aren't attributable to Bicep outputs -- verify they're set in `.env` or azd env.
- Pass: `MY_API_KEY: set`
- Fail: `MY_API_KEY: not set`
- Fix: `Add to .env`

### Example Output (all healthy)

```
azd ai agent doctor

  azure.yaml:       valid (2 services: echo-agent, agent-toolbox)
  Authentication:   signed in as user@contoso.com
  Foundry project:  reachable
  Model deployment: gpt-4o (deployed)
  Toolbox:          agent-toolbox (ready, 3 tools)
  Permissions:      sufficient
  Agent:            echo-agent (active, v3)

All checks passed.

Next:  azd ai agent invoke echo-agent "Hello!"  -- test the deployed agent
```

### Example Output (issues found)

```
azd ai agent doctor

  azure.yaml:       valid (2 services: echo-agent, agent-toolbox)
  Authentication:   signed in as user@contoso.com
  Foundry project:  not created
  Model deployment: not set
  Toolbox:          not created
  Permissions:      unknown (no project yet)
  Agent:            not deployed

2 issues found:

  1. Foundry project hasn't been created yet.
     Run:  azd provision

  2. Model deployment AZURE_AI_MODEL_DEPLOYMENT_NAME is not set.
     This will be created by 'azd provision' if defined in your Bicep templates.

Next:  azd provision  -- set up your Foundry project, models, and connections
```

### Example Output (permission issue)

```
azd ai agent doctor

  azure.yaml:       valid (1 service: echo-agent)
  Authentication:   signed in as user@contoso.com
  Foundry project:  reachable
  Model deployment: gpt-4o (deployed)
  Permissions:      missing 'Cognitive Services User' role
  Agent:            echo-agent (failed)

1 issue found:

  1. Current identity is missing the 'Cognitive Services User' role
     on the Foundry project resource.
     Run:  az role assignment create --assignee user@contoso.com \
             --role "Cognitive Services User" \
             --scope /subscriptions/.../resourceGroups/.../providers/...
     Or ask your admin to grant access.

Next:  azd ai agent monitor --follow  -- check agent logs for details
```

### How `doctor` and Next-Step Hints Relate

They cover different situations:

- **Next-step hints** fire automatically after each command succeeds. They answer "what should I do now?" If you're in the flow and following the CLI's guidance, you may never need `doctor`.
- **`doctor`** is for when you've lost that thread -- cleared the terminal, hit an error you can't diagnose, or are picking up a project cold. It re-evaluates everything from scratch and gives you both the status and the fix.

Under the hood, both share the same `ProjectState` assembly logic. `doctor` runs every check unconditionally; the resolver runs only the subset relevant to whatever command just finished.

## Suggestion Output Format

Every next-step suggestion follows the same format:

```
Next:  <primary command>  -- <description>
       <secondary command>  -- <description>
```

Ground rules:
- **One primary command** -- the single most useful next action.
- **At most one secondary** -- an alternate or follow-up.
- **Each line** is `<command>  -- <description>` (two spaces before `--`).
- **Multi-agent projects** repeat per agent on separate lines. No selection menus.
- **`Next:` label** is always present so suggestions stand out visually from command output.
- **README/docs pointers** go on a separate line below the commands when relevant.

## OpenAPI Discovery for Invoke Payloads

The decision tree handles "which command next" well, but the invoke payload problem -- "what do I actually *send* to this agent?" -- needs richer input than `azure.yaml` can provide.

### The Gap

Every sample has a different payload shape:
- Responses protocol: `{"input": "Hello!"}`
- Invocations protocol: `{"message": "Hello!"}` (or custom shapes)
- Custom protocols: anything goes

The CLI can infer protocol from `azure.yaml`, but the actual payload schema -- especially for invocations-protocol agents with custom request bodies -- lives in the agent's code, not config.

### What Already Exists

The invocation protocol already defines a well-known OpenAPI endpoint, and the extension already has code to consume it:

1. **Protocol convention:** Agents can serve an OpenAPI 3.0 spec at `GET /invocations/docs/openapi.json`. This is part of the invocation protocol -- not a new proposal.

2. **Extension support:** [`fetchOpenAPISpec` in helpers.go](https://github.com/Azure/azure-dev/blob/991b9743b4ab14ee8561677ffa627735c875a796/cli/azd/extensions/azure.ai.agents/internal/cmd/helpers.go#L296) already probes the well-known path, fetches the spec, and caches it to disk (`openapi-{agent}-{local|remote}.json` in the azd env directory). Failures are non-fatal and silently ignored.

3. **SDK support:** The Python SDK (`azure-ai-agentserver-invocations`) supports this via the `openapi_spec` parameter on `InvocationAgentServerHost`. See the [human-in-the-loop sample](https://github.com/microsoft-foundry/foundry-samples/blob/fd4ecab832fa491b9ffb4abca862c073777b5e53/samples/python/hosted-agents/bring-your-own/invocations/human-in-the-loop/main.py#L119) for a working example -- the agent defines an `OPENAPI_SPEC` dict describing its endpoints and request schemas, then passes it to `InvocationAgentServerHost(openapi_spec=OPENAPI_SPEC)`.

### Opt-In by the Agent Author

This is **not automatic**. The agent developer must define and include the OpenAPI spec for it to be served. The SDK does not generate it from handler types today. This means:

- **Samples we own:** We include OpenAPI specs by default in our samples, so the getting-started happy path always produces rich invoke examples. No extra work for developers using our templates.
- **Bring-your-own agents:** Developers who write their own agents can opt in by defining an `openapi_spec` and passing it to the server host. If they don't, the CLI falls back gracefully.

### CLI Behavior

After `azd ai agent run` starts the agent, the CLI probes `GET /invocations/docs/openapi.json`. Two paths:

1. **Spec available:** Parse it, grab the invoke endpoint and an example payload (from `example` or `schema` fields), and build the exact invoke command shown to the developer.

2. **Spec unavailable (404, timeout, etc.):** Fall back to protocol-generic suggestions (`"Hello!"` for responses, `'{"message": "Hello!"}'` for invocations) **plus a helpful hint**:

```
Tip:  curl http://localhost:8088/invocations/docs/openapi.json
      to check if your agent exposes an API spec with the exact payload format.
```

This nudges developers toward verifying the right payload shape for their agent's `/invocations` endpoint without requiring the spec to exist.

No cross-team dependencies remain. The protocol convention exists, the extension code exists, and samples can include specs today.

### Example: OpenAPI-Enhanced Output After `run`

Without OpenAPI (generic):
```
Agent is running at http://localhost:8088

  azd ai agent invoke --local "Hello!"
```

With OpenAPI (sample-specific):
```
Agent is running at http://localhost:8088

  azd ai agent invoke --local '{"message": "What hotels are available in Seattle?"}'

  Available endpoints:
    POST /invocations  -- send a message to the agent
    GET  /health       -- check agent health
```

## Implementation Sketch

### Command Coverage Matrix

| Command | Trigger point | Primary suggestion | State inputs used |
|---|---|---|---|
| `init` | Command exit | `run` or `azd provision` | HasUnresolvedInfraVars, HasUnresolvedManualVars |
| `run` | Agent starts listening | `invoke --local` | Protocol, HasOpenAPI |
| `invoke --local` | Command exit | `azd deploy` | -- |
| `invoke` (remote) | Command exit | `show` or `monitor --follow` | -- |
| `show` | Command exit | `invoke` or `monitor --follow` | AgentStatus |
| `deploy` (post-deploy hook) | Hook exit | `show` or `invoke` | -- |
| `doctor` | Command exit | Varies based on check results | All state inputs |

### Next-Step Resolver (core)

We already have two patterns to generalize: the `AZURE_AI_PROJECT_ID` check in [init.go](https://github.com/Azure/azure-dev/blob/main/cli/azd/extensions/azure.ai.agents/internal/cmd/init.go#L1718-L1733) (state-aware branching) and [`exterrors`](https://github.com/Azure/azure-dev/tree/main/cli/azd/extensions/azure.ai.agents/internal/exterrors) factories (structured error + suggestion). The resolver unifies both into a single function called at command completion points (command exit for most, startup-success for `run`):

```
func resolveNextSteps(cmd string, exitCode int, project ProjectState) []Suggestion
```

Where `ProjectState` is assembled by reading:
- `azure.yaml` (parsed once, cached)
- azd environment variables
- (optionally) a running agent's OpenAPI endpoint

Each `Suggestion` has:
- `command`: the CLI command to run (e.g., `azd ai agent run`)
- `description`: one-line explanation (e.g., `start the agent locally`)
- `priority`: ordering when multiple suggestions apply

The resolver outputs a formatted block:

```
func printNextSteps(suggestions []Suggestion) {
    // numbered list for multi-step, single line for one suggestion
}
```

### Where It Hooks In

The resolver fires at command completion -- `PostRun` for short-lived commands, the "listening" callback for `run`. No new CLI flags needed; it's always-on. If output is piped or we're in CI, guidance goes to stderr so it doesn't mess with stdout parsing.

## Downstream Impact

- **`azure.ai.agents` extension:** New `resolveNextSteps` function for success-path guidance, plus `doctor`. Both read `azure.yaml` and azd env -- already available in the command context.
- **Agent Server SDKs:** No changes needed. The invocation protocol already supports `GET /invocations/docs/openapi.json` and the Python SDK already accepts `openapi_spec` on `InvocationAgentServerHost`.
- **Samples:** Add `openapi_spec` definitions to our invocations-protocol samples so the getting-started happy path always produces rich invoke examples. Responses-protocol samples need no changes.
- **Foundry Toolkit for VS Code:** No impact. This is CLI output only.

## Alternatives Considered

### AZD Hooks for Sample-Specific Guidance

Instead of the CLI reading state and inferring suggestions, each sample ships hook scripts (`postinit.sh`, `postrun.sh`) that echo guidance to the terminal.

What's good about this:
- Hooks are first-class in azd. No new mechanism.
- Sample authors control the messaging.

Why I don't think it works:
- Spawning a shell to echo 3 lines is heavy, and now you need platform-specific scripts (sh vs ps1).
- Hooks fire on every run with no opt-out. Re-running a command re-fires the guidance.
- Hooks can't inspect deep project state (unresolved env vars, sibling service status) without reimplementing the resolver in every sample's hook script. That's duplication, not reuse.
- The state-aware decision tree has to live in the CLI anyway (for non-sample-specific stuff). Hooks would create two parallel guidance systems.

### README Markers

Standardized HTML-comment markers in each sample's README (e.g., ``) that the CLI parses and displays.

What's good about this:
- Declarative, no script execution.
- Sample authors write guidance once; README readers and the CLI both benefit.

Why I don't think it works:
- Parsing HTML comments from markdown is fragile and introduces a convention that doesn't exist anywhere else in azd.
- Markers are static -- they can't adapt to actual project state (provisioned? deployed? env vars resolved?).
- OpenAPI at `/invocations/docs/openapi.json` already handles the invoke payload problem, which is the main gap markers would fill.
- For non-payload guidance (e.g., "run azd up first"), the built-in decision tree handles it without needing README content.

### Static Hints in `azure.yaml`

A `config.samples` block in `azure.yaml` where sample authors declare example payloads:

```yaml
config:
  samples:
    invoke: '{"message": "What hotels are in Seattle?"}'
```

What's good about this:
- Lives in a file the CLI already reads. No new parsing.
- Declarative and simple.

Why I don't think it works:
- Duplicates info that already exists in agent code (and in the OpenAPI spec at `/invocations/docs/openapi.json` when the agent includes one).
- Complex payloads make YAML quoting painful.
- Doesn't scale to agents with multiple endpoints or evolving request schemas.
- The config section is intended to match the API.
- OpenAPI at `/invocations/docs/openapi.json` is strictly more capable and already available.

## Scope Boundaries

**In the CLI (next-step resolver):** State-aware command suggestions on success paths, driven by azure.yaml structure, azd env state, and exit status. Deterministic logic that every project benefits from.

**In the CLI (`doctor`):** Comprehensive project health checks -- auth, permissions, reachability, env var resolution, agent status. One pass, all the answers.

**In the Agent Server SDKs (no changes needed):** The invocation protocol already supports serving OpenAPI at `/invocations/docs/openapi.json`. The Python SDK already supports `openapi_spec` on `InvocationAgentServerHost`. Agent authors opt in by defining and passing a spec -- we include it by default in our samples.

**Already covered:** Error-path guidance. [`exterrors`](https://github.com/Azure/azure-dev/tree/main/cli/azd/extensions/azure.ai.agents/internal/exterrors) handles this across the extension. We don't touch it here.

**Out of scope:** CI/CD pipeline guidance. Multi-agent orchestration. Error message formatting (that's [`exterrors`](https://github.com/Azure/azure-dev/tree/main/cli/azd/extensions/azure.ai.agents/internal/exterrors) territory and a future Thread 10 audit).

Input	Source	What it tells us
`HasProjectEndpoint`	azd env: `AZURE_AI_PROJECT_ENDPOINT` is set and non-empty	Foundry project is provisioned
`HasUnresolvedInfraVars`	Walk `azure.yaml` service configs; collect `${...}` references that map to known Bicep outputs (e.g., `AZURE_AI_MODEL_DEPLOYMENT_NAME`, connection IDs); check which are missing from azd env	Infra-provisioned dependencies haven't been deployed
`HasUnresolvedManualVars`	Remaining `${...}` references not attributable to Bicep outputs (e.g., API keys, custom config); check which are missing from `.env` or azd env	User-supplied values need manual setup
`HasToolboxes`	`azure.yaml` has services with `host: azure.ai.toolbox`	Project uses toolboxes that need server-side provisioning
`HasConnections`	`azure.yaml` agent `config.env` references connection env vars (e.g., `${GITHUB_MCP_CONN}`) that are unresolved	Connections not yet provisioned via Bicep
`Protocol`	`azure.yaml` agent `config.protocols[0].protocol`	Determines invoke payload shape
`IsDeployed`	azd env: agent deployment metadata or prior successful deploy	Agent has been pushed to Foundry
`AgentStatus`	Foundry API: agent show status	Runtime state of the deployed agent
`HasOpenAPI`	Probe running agent at `GET /invocations/docs/openapi.json`	Rich invoke examples available (opt-in, agent author defines spec)

Command	Trigger point	Primary suggestion	State inputs used
`init`	Command exit	`run` or `azd provision`	HasUnresolvedInfraVars, HasUnresolvedManualVars
`run`	Agent starts listening	`invoke --local`	Protocol, HasOpenAPI
`invoke --local`	Command exit	`azd deploy`	--
`invoke` (remote)	Command exit	`show` or `monitor --follow`	--
`show`	Command exit	`invoke` or `monitor --follow`	AgentStatus
`deploy` (post-deploy hook)	Hook exit	`show` or `invoke`	--
`doctor`	Command exit	Varies based on check results	All state inputs

Guide developers to the right next step after every azd ai agent command and add azd ai agent doctor #7975

Description

Current Problems

Solution Hypothesis

State Resolution

What We Can Ship Today

Success Measures

Decision Tree

State Inputs

After init

After run (on successful startup)

After invoke --local

After invoke (remote)

After show

After deploy (via post-deploy hook)

azd ai agent doctor -- On-Demand Health Check

Why This Matters

What doctor Does

Example Output (all healthy)

Example Output (issues found)

Example Output (permission issue)

How doctor and Next-Step Hints Relate

Suggestion Output Format

OpenAPI Discovery for Invoke Payloads

The Gap

What Already Exists

Opt-In by the Agent Author

CLI Behavior

Example: OpenAPI-Enhanced Output After run

Implementation Sketch

Command Coverage Matrix

Next-Step Resolver (core)

Where It Hooks In

Downstream Impact

Alternatives Considered

AZD Hooks for Sample-Specific Guidance

README Markers

Static Hints in azure.yaml

Scope Boundaries

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Guide developers to the right next step after every `azd ai agent` command and add `azd ai agent doctor` #7975

After `init`

After `run` (on successful startup)

After `invoke --local`

After `invoke` (remote)

After `show`

After `deploy` (via post-deploy hook)

`azd ai agent doctor` -- On-Demand Health Check

What `doctor` Does

How `doctor` and Next-Step Hints Relate

Example: OpenAPI-Enhanced Output After `run`

Static Hints in `azure.yaml`