After every successful azd ai agent command, the CLI should inspect the project's real state -- azure.yaml services, azd env vars, running endpoints -- and guide the developer exactly what to do next. We also need a new azd ai agent doctor command for on-demand health checks.
Right now, azd ai agent init tells you to deploy. That's it. No mention of local dev, no prereq checks, no adaptation to the sample you picked. A developer who just scaffolded a responses-protocol agent has no idea they can run it locally. A developer with unresolved toolbox deps gets told to run when they should azd provision first. We fix this by making every success path exit with state-aware, actionable guidance, and by giving developers doctor as a dedicated diagnostic.
Scope: init, run, invoke (local and remote), show, and deploy (via post-deploy hooks). Error-path guidance is out of scope -- exterrors already handles that well and doesn't need rework.
Assumes the Unified azure.yaml proposal has landed -- azure.yaml is the sole config file, agent.yaml and agent.manifest.yaml are gone.
Current Problems
-
Init pushes you straight to deploy. After azd ai agent init, the CLI checks AZURE_AI_PROJECT_ID and suggests either azd deploy <service> or azd up (init.go#L1718-L1733, init_from_code.go#L136-L140). Both are deploy paths. Local dev (run -> invoke --local) never gets mentioned. For prototypers, local iteration should be the default -- deployment is "when you're ready," not the only option.
-
Run's invoke hint is static and often wrong. After the agent starts, run prints After startup, in another terminal, try: azd ai agent invoke --local "Hello!" (run.go#L157-L158). Same string every time, regardless of protocol. An invocations-protocol agent needs '{"message": "Hello!"}', not a bare string. The hint works for responses-protocol agents and is misleading for everything else.
-
Success paths are silent. Error paths have solid suggestions via the exterrors package (Validation, Dependency, Auth, Configuration, Compatibility -- each with a Suggestion field). But when a command succeeds? Nothing. After a successful invoke --local, show, or remote invoke, the developer gets output and a blank terminal. No "here's what to try next."
-
Init conflates provision and deploy. Some samples need azd provision before run -- to create model deployments, toolboxes, or connections via Bicep -- but they don't need a full azd up (which also deploys the agent). Today init suggests azd up or azd deploy without distinguishing these, so developers deploy prematurely when all they wanted was to set up deps for local dev.
-
No way to recover context or diagnose issues in one shot. Two concrete gaps: (a) You lost the CLI's suggestion -- cleared the terminal, closed it, switched contexts -- and now you don't remember what to do next. There's no command to ask "where was I?" (b) You hit an error from a command and don't know how to fix it. The error message might say what went wrong, but it doesn't check the broader context -- are your RBAC roles correct? Is the Foundry project reachable? Are your env vars stale? A doctor command could run comprehensive checks and give you the specific commands to fix what's broken.
Solution Hypothesis
Every azd ai agent command's success path runs a next-step resolver that:
- Assembles project state from multiple sources (see State Resolution below).
- Walks a decision tree to pick the right suggestion for the command that just ran and the state it left behind.
- After
run, probes the agent's OpenAPI endpoint (/invocations/docs/openapi.json) to discover protocol-specific invoke payloads and build exact example commands. The extension already has this code -- the spec is optional and depends on the agent author including it.
State Resolution
The resolver pulls project state from several sources, in priority order. Higher layers win, and the resolver works with whatever's available:
- Explicit flags (
--project-endpoint, --agent, etc.) -- always authoritative.
- Live runtime state -- probe running agent endpoints, Foundry API status checks.
azure.yaml -- services, protocols, uses dependencies, config.env references.
- azd environment (
.azure/<env>/.env) -- which ${...} variables are actually populated.
This layering matters because we're moving toward environment-optional -- data-plane commands work with just --project-endpoint, and brownfield developers may have set up Foundry manually. The resolver should give useful guidance with partial state, not go silent when the azd environment is missing or stale.
Output is human-readable text appended after the command's normal output, separated by a blank line.
What We Can Ship Today
We can ship better command choice without any dependencies -- the resolver tells you which command to run next based on actual project state, not a hardcoded string. Works for every sample, every protocol, today.
Exact invoke payload examples (showing the right JSON body for your specific agent) work today for agents that include an OpenAPI spec. The invocation protocol already supports serving a spec at GET /invocations/docs/openapi.json, and the extension already has fetchOpenAPISpec to fetch and cache it. This is opt-in -- the agent author must define the spec (see human-in-the-loop example). We include it by default in our samples so the getting-started happy path always has rich examples. When the spec isn't available, the resolver falls back to protocol-generic examples ("Hello!" for responses, '{"message": "Hello!"}' for invocations) plus a helpful hint to check /invocations directly to verify the right payload shape.
Success Measures
- No more dead-end exits. A developer who runs
init should always see a runnable next command -- not just "deploy."
- Init-to-invoke is discoverable from CLI output alone. You shouldn't need docs to figure out how to get from
init to a working invoke --local.
- Fewer misdirected commands. If your deps aren't provisioned, you hear
azd provision, not azd ai agent run. If you're ready for Azure, you hear azd up.
- One-pass diagnosis.
azd ai agent doctor surfaces permission issues, missing env vars, and unreachable endpoints in a single run, instead of making you discover them through trial and error across five different commands.
Decision Tree
The resolver fires after each command and evaluates the following. All azure.yaml field references use the post-unification schema from the Unified azure.yaml proposal.
State Inputs
| Input |
Source |
What it tells us |
HasProjectEndpoint |
azd env: AZURE_AI_PROJECT_ENDPOINT is set and non-empty |
Foundry project is provisioned |
HasUnresolvedInfraVars |
Walk azure.yaml service configs; collect ${...} references that map to known Bicep outputs (e.g., AZURE_AI_MODEL_DEPLOYMENT_NAME, connection IDs); check which are missing from azd env |
Infra-provisioned dependencies haven't been deployed |
HasUnresolvedManualVars |
Remaining ${...} references not attributable to Bicep outputs (e.g., API keys, custom config); check which are missing from .env or azd env |
User-supplied values need manual setup |
HasToolboxes |
azure.yaml has services with host: azure.ai.toolbox |
Project uses toolboxes that need server-side provisioning |
HasConnections |
azure.yaml agent config.env references connection env vars (e.g., ${GITHUB_MCP_CONN}) that are unresolved |
Connections not yet provisioned via Bicep |
Protocol |
azure.yaml agent config.protocols[0].protocol |
Determines invoke payload shape |
IsDeployed |
azd env: agent deployment metadata or prior successful deploy |
Agent has been pushed to Foundry |
AgentStatus |
Foundry API: agent show status |
Runtime state of the deployed agent |
HasOpenAPI |
Probe running agent at GET /invocations/docs/openapi.json |
Rich invoke examples available (opt-in, agent author defines spec) |
After init
Today, init checks AZURE_AI_PROJECT_ID and branches between azd deploy and azd up. We expand this to cover local dev, distinguish provision from deploy, and inspect deeper state:
IF HasUnresolvedInfraVars:
-> "azd provision" (create Foundry project and/or dependencies for local development)
ELSE IF HasUnresolvedManualVars:
-> "azd env set <KEY> <value>" (user-supplied config needed in the azd environment)
ELSE:
-> "azd ai agent run" (everything is set up, start locally)
Example output (everything ready, project set up):
Next: azd ai agent run -- start the agent locally
azd ai agent invoke --local "Hello!" -- test it in another terminal
When ready to deploy to Azure, run azd deploy.
Example output (dependencies not yet created):
Next: azd provision -- set up your Foundry project, models, and connections
This creates your Foundry project and any model deployments, toolboxes,
or connections your agent needs for local development.
Once that finishes, run 'azd ai agent run' to start locally.
When ready to deploy to Azure, run azd deploy.
Example output (project ready, but manual config values missing):
Your project is ready, but some values need to be set:
- MY_API_KEY (not set)
Set them in your azd environment:
azd env set MY_API_KEY <your-value>
Then run 'azd ai agent run' to start locally.
After run (on successful startup)
run is a long-lived foreground command. Today it prints a static hint: After startup, in another terminal, try: azd ai agent invoke --local "Hello!" (run.go#L157-L158). We make this protocol-aware:
IF Protocol == "invocations":
-> invoke --local '{"message": "Hello!"}'
ELSE IF Protocol == "responses":
-> invoke --local "Hello!"
ELSE:
-> invoke --local "Hello!" (generic)
+ HasOpenAPI -> replace generic payload with OpenAPI-derived example
Example output (responses protocol):
Agent is running at http://localhost:8088
Next: azd ai agent invoke --local "Hello!" -- test it in another terminal
Press Ctrl+C to stop.
Example output (invocations protocol, with OpenAPI):
Agent is running at http://localhost:8088
Next: azd ai agent invoke --local '{"message": "What hotels are available in Seattle?"}'
Press Ctrl+C to stop.
After invoke --local
Today, invoke's failure path already has one suggestion: when the HTTP connection fails, it prints could not connect to localhost:<port> -- is the agent running? Start it with: azd ai agent run (invoke.go#L303, invoke.go#L507). We add success-path guidance:
IF success:
-> "azd deploy" (push the agent to Azure)
+ "azd ai agent monitor --follow" (view logs after deploying)
To get to a successful invoke --local, the developer must have already provisioned (dependencies exist) and run the agent locally. So the natural next step is azd deploy -- infrastructure is already there, just push the agent.
Example output (single agent):
Next: azd deploy -- deploy the agent to Azure
azd ai agent monitor --follow -- view logs after deploying
Example output (multi-agent project):
Next: azd deploy -- deploy all agents to Azure
After deploying:
azd ai agent invoke my-agent "Hello!" -- test my-agent remotely
azd ai agent invoke other-agent "Hello!" -- test other-agent remotely
After invoke (remote)
IF success:
-> "azd ai agent show <agent>" (check agent status and details)
+ "azd ai agent monitor --follow" (view live logs)
IF failure:
-> "azd ai agent monitor --follow" (check logs for errors)
After show
Today, show has error-path guidance when the agent name or version can't be resolved: Run 'azd deploy' first to deploy the agent, or check your azd environment values (show.go#L70-L83). We add success-path guidance:
IF status == "active" OR status == "idle":
-> "azd ai agent invoke <agent> 'Hello!'" (send a test message)
IF status == "failed" OR status == "":
-> "azd ai agent monitor --follow" (check logs)
ELSE:
-> "azd ai agent show <agent>" (agent is in a transitional state, check again shortly)
After deploy (via post-deploy hook)
Deployment goes through core azd up / azd deploy. The extension's deploy hooks (service_target_agent.go) run during deployment but don't print next-step guidance when they finish. We add agent-specific suggestions to the post-deploy hook's exit path:
IF success:
-> "azd ai agent show <agent>" (verify the agent is running)
+ "azd ai agent invoke <agent> 'Hello!'" (test the deployment)
IF failure:
-> "azd ai agent monitor --follow" (check deployment logs)
Example output (success, single agent):
Next: azd ai agent show echo-agent -- verify it's running
azd ai agent invoke echo-agent "Hello!" -- test the deployment
Example output (success, multi-agent project):
Next: azd ai agent show my-agent -- verify my-agent is running
azd ai agent show other-agent -- verify other-agent is running
azd ai agent invoke my-agent "Hello!" -- test my-agent
azd ai agent invoke other-agent "Hello!" -- test other-agent
azd ai agent doctor -- On-Demand Health Check
Why This Matters
Two specific gaps drive this:
-
You lost context. You cleared the terminal, closed it, switched to another task. The CLI told you what to do next, but that output is gone. There's no command to ask "where was I?" -- doctor fills that role by re-evaluating project state and telling you the next step.
-
You hit an error and don't know how to fix it. A command failed, but the error only tells you what went wrong in that command's scope. It doesn't check the broader picture -- are your RBAC roles correct? Is the Foundry project reachable? Are env vars stale? Is the model deployment actually there? doctor runs comprehensive checks across the full project and gives you the specific commands to fix what's broken.
What doctor Does
doctor runs a fixed set of checks in order, top to bottom. Each check either passes (with a summary) or fails (with a specific remediation command). The checks are designed to cover the full surface area that individual commands can't see on their own.
Check 1: azure.yaml validity.
Verify the file exists, parses correctly, and declares at least one azure.ai.agent service.
- Pass:
azure.yaml: valid (2 services: echo-agent, agent-toolbox)
- Fail:
azure.yaml: not found or azure.yaml: parse error at line 14
- Fix:
azd ai agent init (if missing) or manual edit (if malformed)
Check 2: Authentication.
Run the equivalent of az account show to verify the developer is signed in and has an active token.
- Pass:
Authentication: signed in as user@contoso.com
- Fail:
Authentication: not signed in
- Fix:
az login
Check 3: Foundry project reachability.
Check that AZURE_AI_PROJECT_ENDPOINT is set, then probe the endpoint to confirm the project exists and responds.
- Pass:
Foundry project: reachable (endpoint: https://...)
- Fail (not set):
Foundry project: not provisioned
- Fail (unreachable):
Foundry project: endpoint set but not reachable (HTTP 403 or timeout)
- Fix:
azd provision (if not set) or check network/firewall (if unreachable)
Check 4: Model deployments.
For each ${...} reference in config.env that maps to a model deployment, verify the env var is set and (if the project is reachable) that the deployment actually exists in the Foundry project.
- Pass:
Model deployment: gpt-4o (deployed)
- Fail:
Model deployment: AZURE_AI_MODEL_DEPLOYMENT_NAME not set
- Fix:
azd provision (if defined in Bicep) or manual creation in the Foundry portal
Check 5: Toolboxes.
For each azure.ai.toolbox service in azure.yaml, verify it has been provisioned server-side.
- Pass:
Toolbox 'agent-toolbox': provisioned (3 tools)
- Fail:
Toolbox 'agent-toolbox': not provisioned
- Fix:
azd provision
Check 6: Connections.
For each connection env var referenced in toolbox or agent configs, verify it's set and (if reachable) that the connection exists in the Foundry project.
- Pass:
Connection GITHUB_MCP_CONN: set
- Fail:
Connection GITHUB_MCP_CONN: not set
- Fix:
azd provision or check Bicep outputs
Check 7: RBAC permissions.
Query the Foundry project's role assignments to verify the current identity has the required roles. This is the check that individual commands almost never do, and it's one of the most common sources of confusing errors.
- Pass:
Permissions: sufficient (Contributor + Cognitive Services User)
- Fail:
Permissions: missing 'Cognitive Services User' role on the Foundry project
- Fix:
az role assignment create --assignee user@contoso.com --role "Cognitive Services User" --scope <resource-id>
- Alt:
Or ask your admin to grant access.
Check 8: Agent status.
If the agent has been deployed (deployment metadata exists in azd env), query Foundry for its current status.
- Pass:
Agent 'echo-agent': active (v3)
- Fail:
Agent 'echo-agent': failed
- Fix:
azd ai agent monitor --follow (check logs)
Check 9: Manual env vars.
Any ${...} references in azure.yaml that aren't attributable to Bicep outputs -- verify they're set in .env or azd env.
- Pass:
MY_API_KEY: set
- Fail:
MY_API_KEY: not set
- Fix:
Add to .env
Example Output (all healthy)
azd ai agent doctor
azure.yaml: valid (2 services: echo-agent, agent-toolbox)
Authentication: signed in as user@contoso.com
Foundry project: reachable
Model deployment: gpt-4o (deployed)
Toolbox: agent-toolbox (ready, 3 tools)
Permissions: sufficient
Agent: echo-agent (active, v3)
All checks passed.
Next: azd ai agent invoke echo-agent "Hello!" -- test the deployed agent
Example Output (issues found)
azd ai agent doctor
azure.yaml: valid (2 services: echo-agent, agent-toolbox)
Authentication: signed in as user@contoso.com
Foundry project: not created
Model deployment: not set
Toolbox: not created
Permissions: unknown (no project yet)
Agent: not deployed
2 issues found:
1. Foundry project hasn't been created yet.
Run: azd provision
2. Model deployment AZURE_AI_MODEL_DEPLOYMENT_NAME is not set.
This will be created by 'azd provision' if defined in your Bicep templates.
Next: azd provision -- set up your Foundry project, models, and connections
Example Output (permission issue)
azd ai agent doctor
azure.yaml: valid (1 service: echo-agent)
Authentication: signed in as user@contoso.com
Foundry project: reachable
Model deployment: gpt-4o (deployed)
Permissions: missing 'Cognitive Services User' role
Agent: echo-agent (failed)
1 issue found:
1. Current identity is missing the 'Cognitive Services User' role
on the Foundry project resource.
Run: az role assignment create --assignee user@contoso.com \
--role "Cognitive Services User" \
--scope /subscriptions/.../resourceGroups/.../providers/...
Or ask your admin to grant access.
Next: azd ai agent monitor --follow -- check agent logs for details
How doctor and Next-Step Hints Relate
They cover different situations:
- Next-step hints fire automatically after each command succeeds. They answer "what should I do now?" If you're in the flow and following the CLI's guidance, you may never need
doctor.
doctor is for when you've lost that thread -- cleared the terminal, hit an error you can't diagnose, or are picking up a project cold. It re-evaluates everything from scratch and gives you both the status and the fix.
Under the hood, both share the same ProjectState assembly logic. doctor runs every check unconditionally; the resolver runs only the subset relevant to whatever command just finished.
Suggestion Output Format
Every next-step suggestion follows the same format:
Next: <primary command> -- <description>
<secondary command> -- <description>
Ground rules:
- One primary command -- the single most useful next action.
- At most one secondary -- an alternate or follow-up.
- Each line is
<command> -- <description> (two spaces before --).
- Multi-agent projects repeat per agent on separate lines. No selection menus.
Next: label is always present so suggestions stand out visually from command output.
- README/docs pointers go on a separate line below the commands when relevant.
OpenAPI Discovery for Invoke Payloads
The decision tree handles "which command next" well, but the invoke payload problem -- "what do I actually send to this agent?" -- needs richer input than azure.yaml can provide.
The Gap
Every sample has a different payload shape:
- Responses protocol:
{"input": "Hello!"}
- Invocations protocol:
{"message": "Hello!"} (or custom shapes)
- Custom protocols: anything goes
The CLI can infer protocol from azure.yaml, but the actual payload schema -- especially for invocations-protocol agents with custom request bodies -- lives in the agent's code, not config.
What Already Exists
The invocation protocol already defines a well-known OpenAPI endpoint, and the extension already has code to consume it:
-
Protocol convention: Agents can serve an OpenAPI 3.0 spec at GET /invocations/docs/openapi.json. This is part of the invocation protocol -- not a new proposal.
-
Extension support: fetchOpenAPISpec in helpers.go already probes the well-known path, fetches the spec, and caches it to disk (openapi-{agent}-{local|remote}.json in the azd env directory). Failures are non-fatal and silently ignored.
-
SDK support: The Python SDK (azure-ai-agentserver-invocations) supports this via the openapi_spec parameter on InvocationAgentServerHost. See the human-in-the-loop sample for a working example -- the agent defines an OPENAPI_SPEC dict describing its endpoints and request schemas, then passes it to InvocationAgentServerHost(openapi_spec=OPENAPI_SPEC).
Opt-In by the Agent Author
This is not automatic. The agent developer must define and include the OpenAPI spec for it to be served. The SDK does not generate it from handler types today. This means:
- Samples we own: We include OpenAPI specs by default in our samples, so the getting-started happy path always produces rich invoke examples. No extra work for developers using our templates.
- Bring-your-own agents: Developers who write their own agents can opt in by defining an
openapi_spec and passing it to the server host. If they don't, the CLI falls back gracefully.
CLI Behavior
After azd ai agent run starts the agent, the CLI probes GET /invocations/docs/openapi.json. Two paths:
-
Spec available: Parse it, grab the invoke endpoint and an example payload (from example or schema fields), and build the exact invoke command shown to the developer.
-
Spec unavailable (404, timeout, etc.): Fall back to protocol-generic suggestions ("Hello!" for responses, '{"message": "Hello!"}' for invocations) plus a helpful hint:
Tip: curl http://localhost:8088/invocations/docs/openapi.json
to check if your agent exposes an API spec with the exact payload format.
This nudges developers toward verifying the right payload shape for their agent's /invocations endpoint without requiring the spec to exist.
No cross-team dependencies remain. The protocol convention exists, the extension code exists, and samples can include specs today.
Example: OpenAPI-Enhanced Output After run
Without OpenAPI (generic):
Agent is running at http://localhost:8088
azd ai agent invoke --local "Hello!"
With OpenAPI (sample-specific):
Agent is running at http://localhost:8088
azd ai agent invoke --local '{"message": "What hotels are available in Seattle?"}'
Available endpoints:
POST /invocations -- send a message to the agent
GET /health -- check agent health
Implementation Sketch
Command Coverage Matrix
| Command |
Trigger point |
Primary suggestion |
State inputs used |
init |
Command exit |
run or azd provision |
HasUnresolvedInfraVars, HasUnresolvedManualVars |
run |
Agent starts listening |
invoke --local |
Protocol, HasOpenAPI |
invoke --local |
Command exit |
azd deploy |
-- |
invoke (remote) |
Command exit |
show or monitor --follow |
-- |
show |
Command exit |
invoke or monitor --follow |
AgentStatus |
deploy (post-deploy hook) |
Hook exit |
show or invoke |
-- |
doctor |
Command exit |
Varies based on check results |
All state inputs |
Next-Step Resolver (core)
We already have two patterns to generalize: the AZURE_AI_PROJECT_ID check in init.go (state-aware branching) and exterrors factories (structured error + suggestion). The resolver unifies both into a single function called at command completion points (command exit for most, startup-success for run):
func resolveNextSteps(cmd string, exitCode int, project ProjectState) []Suggestion
Where ProjectState is assembled by reading:
azure.yaml (parsed once, cached)
- azd environment variables
- (optionally) a running agent's OpenAPI endpoint
Each Suggestion has:
command: the CLI command to run (e.g., azd ai agent run)
description: one-line explanation (e.g., start the agent locally)
priority: ordering when multiple suggestions apply
The resolver outputs a formatted block:
func printNextSteps(suggestions []Suggestion) {
// numbered list for multi-step, single line for one suggestion
}
Where It Hooks In
The resolver fires at command completion -- PostRun for short-lived commands, the "listening" callback for run. No new CLI flags needed; it's always-on. If output is piped or we're in CI, guidance goes to stderr so it doesn't mess with stdout parsing.
Downstream Impact
azure.ai.agents extension: New resolveNextSteps function for success-path guidance, plus doctor. Both read azure.yaml and azd env -- already available in the command context.
- Agent Server SDKs: No changes needed. The invocation protocol already supports
GET /invocations/docs/openapi.json and the Python SDK already accepts openapi_spec on InvocationAgentServerHost.
- Samples: Add
openapi_spec definitions to our invocations-protocol samples so the getting-started happy path always produces rich invoke examples. Responses-protocol samples need no changes.
- Foundry Toolkit for VS Code: No impact. This is CLI output only.
Alternatives Considered
AZD Hooks for Sample-Specific Guidance
Instead of the CLI reading state and inferring suggestions, each sample ships hook scripts (postinit.sh, postrun.sh) that echo guidance to the terminal.
What's good about this:
- Hooks are first-class in azd. No new mechanism.
- Sample authors control the messaging.
Why I don't think it works:
- Spawning a shell to echo 3 lines is heavy, and now you need platform-specific scripts (sh vs ps1).
- Hooks fire on every run with no opt-out. Re-running a command re-fires the guidance.
- Hooks can't inspect deep project state (unresolved env vars, sibling service status) without reimplementing the resolver in every sample's hook script. That's duplication, not reuse.
- The state-aware decision tree has to live in the CLI anyway (for non-sample-specific stuff). Hooks would create two parallel guidance systems.
README Markers
Standardized HTML-comment markers in each sample's README (e.g., <!-- section:azd:after-init -->) that the CLI parses and displays.
What's good about this:
- Declarative, no script execution.
- Sample authors write guidance once; README readers and the CLI both benefit.
Why I don't think it works:
- Parsing HTML comments from markdown is fragile and introduces a convention that doesn't exist anywhere else in azd.
- Markers are static -- they can't adapt to actual project state (provisioned? deployed? env vars resolved?).
- OpenAPI at
/invocations/docs/openapi.json already handles the invoke payload problem, which is the main gap markers would fill.
- For non-payload guidance (e.g., "run azd up first"), the built-in decision tree handles it without needing README content.
Static Hints in azure.yaml
A config.samples block in azure.yaml where sample authors declare example payloads:
config:
samples:
invoke: '{"message": "What hotels are in Seattle?"}'
What's good about this:
- Lives in a file the CLI already reads. No new parsing.
- Declarative and simple.
Why I don't think it works:
- Duplicates info that already exists in agent code (and in the OpenAPI spec at
/invocations/docs/openapi.json when the agent includes one).
- Complex payloads make YAML quoting painful.
- Doesn't scale to agents with multiple endpoints or evolving request schemas.
- The config section is intended to match the API.
- OpenAPI at
/invocations/docs/openapi.json is strictly more capable and already available.
Scope Boundaries
In the CLI (next-step resolver): State-aware command suggestions on success paths, driven by azure.yaml structure, azd env state, and exit status. Deterministic logic that every project benefits from.
In the CLI (doctor): Comprehensive project health checks -- auth, permissions, reachability, env var resolution, agent status. One pass, all the answers.
In the Agent Server SDKs (no changes needed): The invocation protocol already supports serving OpenAPI at /invocations/docs/openapi.json. The Python SDK already supports openapi_spec on InvocationAgentServerHost. Agent authors opt in by defining and passing a spec -- we include it by default in our samples.
Already covered: Error-path guidance. exterrors handles this across the extension. We don't touch it here.
Out of scope: CI/CD pipeline guidance. Multi-agent orchestration. Error message formatting (that's exterrors territory and a future Thread 10 audit).
After every successful
azd ai agentcommand, the CLI should inspect the project's real state --azure.yamlservices, azd env vars, running endpoints -- and guide the developer exactly what to do next. We also need a newazd ai agent doctorcommand for on-demand health checks.Right now,
azd ai agent inittells you to deploy. That's it. No mention of local dev, no prereq checks, no adaptation to the sample you picked. A developer who just scaffolded a responses-protocol agent has no idea they can run it locally. A developer with unresolved toolbox deps gets told torunwhen they shouldazd provisionfirst. We fix this by making every success path exit with state-aware, actionable guidance, and by giving developersdoctoras a dedicated diagnostic.Scope:
init,run,invoke(local and remote),show, anddeploy(via post-deploy hooks). Error-path guidance is out of scope --exterrorsalready handles that well and doesn't need rework.Assumes the Unified azure.yaml proposal has landed --
azure.yamlis the sole config file,agent.yamlandagent.manifest.yamlare gone.Current Problems
Init pushes you straight to deploy. After
azd ai agent init, the CLI checksAZURE_AI_PROJECT_IDand suggests eitherazd deploy <service>orazd up(init.go#L1718-L1733, init_from_code.go#L136-L140). Both are deploy paths. Local dev (run->invoke --local) never gets mentioned. For prototypers, local iteration should be the default -- deployment is "when you're ready," not the only option.Run's invoke hint is static and often wrong. After the agent starts,
runprintsAfter startup, in another terminal, try: azd ai agent invoke --local "Hello!"(run.go#L157-L158). Same string every time, regardless of protocol. An invocations-protocol agent needs'{"message": "Hello!"}', not a bare string. The hint works for responses-protocol agents and is misleading for everything else.Success paths are silent. Error paths have solid suggestions via the
exterrorspackage (Validation,Dependency,Auth,Configuration,Compatibility-- each with aSuggestionfield). But when a command succeeds? Nothing. After a successfulinvoke --local,show, or remoteinvoke, the developer gets output and a blank terminal. No "here's what to try next."Init conflates provision and deploy. Some samples need
azd provisionbeforerun-- to create model deployments, toolboxes, or connections via Bicep -- but they don't need a fullazd up(which also deploys the agent). Today init suggestsazd uporazd deploywithout distinguishing these, so developers deploy prematurely when all they wanted was to set up deps for local dev.No way to recover context or diagnose issues in one shot. Two concrete gaps: (a) You lost the CLI's suggestion -- cleared the terminal, closed it, switched contexts -- and now you don't remember what to do next. There's no command to ask "where was I?" (b) You hit an error from a command and don't know how to fix it. The error message might say what went wrong, but it doesn't check the broader context -- are your RBAC roles correct? Is the Foundry project reachable? Are your env vars stale? A
doctorcommand could run comprehensive checks and give you the specific commands to fix what's broken.Solution Hypothesis
Every
azd ai agentcommand's success path runs a next-step resolver that:run, probes the agent's OpenAPI endpoint (/invocations/docs/openapi.json) to discover protocol-specific invoke payloads and build exact example commands. The extension already has this code -- the spec is optional and depends on the agent author including it.State Resolution
The resolver pulls project state from several sources, in priority order. Higher layers win, and the resolver works with whatever's available:
--project-endpoint,--agent, etc.) -- always authoritative.azure.yaml-- services, protocols,usesdependencies,config.envreferences..azure/<env>/.env) -- which${...}variables are actually populated.This layering matters because we're moving toward environment-optional -- data-plane commands work with just
--project-endpoint, and brownfield developers may have set up Foundry manually. The resolver should give useful guidance with partial state, not go silent when the azd environment is missing or stale.Output is human-readable text appended after the command's normal output, separated by a blank line.
What We Can Ship Today
We can ship better command choice without any dependencies -- the resolver tells you which command to run next based on actual project state, not a hardcoded string. Works for every sample, every protocol, today.
Exact invoke payload examples (showing the right JSON body for your specific agent) work today for agents that include an OpenAPI spec. The invocation protocol already supports serving a spec at
GET /invocations/docs/openapi.json, and the extension already hasfetchOpenAPISpecto fetch and cache it. This is opt-in -- the agent author must define the spec (see human-in-the-loop example). We include it by default in our samples so the getting-started happy path always has rich examples. When the spec isn't available, the resolver falls back to protocol-generic examples ("Hello!"for responses,'{"message": "Hello!"}'for invocations) plus a helpful hint to check/invocationsdirectly to verify the right payload shape.Success Measures
initshould always see a runnable next command -- not just "deploy."initto a workinginvoke --local.azd provision, notazd ai agent run. If you're ready for Azure, you hearazd up.azd ai agent doctorsurfaces permission issues, missing env vars, and unreachable endpoints in a single run, instead of making you discover them through trial and error across five different commands.Decision Tree
The resolver fires after each command and evaluates the following. All
azure.yamlfield references use the post-unification schema from the Unified azure.yaml proposal.State Inputs
HasProjectEndpointAZURE_AI_PROJECT_ENDPOINTis set and non-emptyHasUnresolvedInfraVarsazure.yamlservice configs; collect${...}references that map to known Bicep outputs (e.g.,AZURE_AI_MODEL_DEPLOYMENT_NAME, connection IDs); check which are missing from azd envHasUnresolvedManualVars${...}references not attributable to Bicep outputs (e.g., API keys, custom config); check which are missing from.envor azd envHasToolboxesazure.yamlhas services withhost: azure.ai.toolboxHasConnectionsazure.yamlagentconfig.envreferences connection env vars (e.g.,${GITHUB_MCP_CONN}) that are unresolvedProtocolazure.yamlagentconfig.protocols[0].protocolIsDeployedAgentStatusHasOpenAPIGET /invocations/docs/openapi.jsonAfter
initToday, init checks
AZURE_AI_PROJECT_IDand branches betweenazd deployandazd up. We expand this to cover local dev, distinguish provision from deploy, and inspect deeper state:Example output (everything ready, project set up):
Example output (dependencies not yet created):
Example output (project ready, but manual config values missing):
After
run(on successful startup)runis a long-lived foreground command. Today it prints a static hint:After startup, in another terminal, try: azd ai agent invoke --local "Hello!"(run.go#L157-L158). We make this protocol-aware:Example output (responses protocol):
Example output (invocations protocol, with OpenAPI):
After
invoke --localToday, invoke's failure path already has one suggestion: when the HTTP connection fails, it prints
could not connect to localhost:<port> -- is the agent running? Start it with: azd ai agent run(invoke.go#L303, invoke.go#L507). We add success-path guidance:To get to a successful
invoke --local, the developer must have already provisioned (dependencies exist) and run the agent locally. So the natural next step isazd deploy-- infrastructure is already there, just push the agent.Example output (single agent):
Example output (multi-agent project):
After
invoke(remote)After
showToday,
showhas error-path guidance when the agent name or version can't be resolved:Run 'azd deploy' first to deploy the agent, or check your azd environment values(show.go#L70-L83). We add success-path guidance:After
deploy(via post-deploy hook)Deployment goes through core
azd up/azd deploy. The extension's deploy hooks (service_target_agent.go) run during deployment but don't print next-step guidance when they finish. We add agent-specific suggestions to the post-deploy hook's exit path:Example output (success, single agent):
Example output (success, multi-agent project):
azd ai agent doctor-- On-Demand Health CheckWhy This Matters
Two specific gaps drive this:
You lost context. You cleared the terminal, closed it, switched to another task. The CLI told you what to do next, but that output is gone. There's no command to ask "where was I?" --
doctorfills that role by re-evaluating project state and telling you the next step.You hit an error and don't know how to fix it. A command failed, but the error only tells you what went wrong in that command's scope. It doesn't check the broader picture -- are your RBAC roles correct? Is the Foundry project reachable? Are env vars stale? Is the model deployment actually there?
doctorruns comprehensive checks across the full project and gives you the specific commands to fix what's broken.What
doctorDoesdoctorruns a fixed set of checks in order, top to bottom. Each check either passes (with a summary) or fails (with a specific remediation command). The checks are designed to cover the full surface area that individual commands can't see on their own.Check 1: azure.yaml validity.
Verify the file exists, parses correctly, and declares at least one
azure.ai.agentservice.azure.yaml: valid (2 services: echo-agent, agent-toolbox)azure.yaml: not foundorazure.yaml: parse error at line 14azd ai agent init(if missing) or manual edit (if malformed)Check 2: Authentication.
Run the equivalent of
az account showto verify the developer is signed in and has an active token.Authentication: signed in as user@contoso.comAuthentication: not signed inaz loginCheck 3: Foundry project reachability.
Check that
AZURE_AI_PROJECT_ENDPOINTis set, then probe the endpoint to confirm the project exists and responds.Foundry project: reachable (endpoint: https://...)Foundry project: not provisionedFoundry project: endpoint set but not reachable (HTTP 403 or timeout)azd provision(if not set) or check network/firewall (if unreachable)Check 4: Model deployments.
For each
${...}reference inconfig.envthat maps to a model deployment, verify the env var is set and (if the project is reachable) that the deployment actually exists in the Foundry project.Model deployment: gpt-4o (deployed)Model deployment: AZURE_AI_MODEL_DEPLOYMENT_NAME not setazd provision(if defined in Bicep) or manual creation in the Foundry portalCheck 5: Toolboxes.
For each
azure.ai.toolboxservice inazure.yaml, verify it has been provisioned server-side.Toolbox 'agent-toolbox': provisioned (3 tools)Toolbox 'agent-toolbox': not provisionedazd provisionCheck 6: Connections.
For each connection env var referenced in toolbox or agent configs, verify it's set and (if reachable) that the connection exists in the Foundry project.
Connection GITHUB_MCP_CONN: setConnection GITHUB_MCP_CONN: not setazd provisionor check Bicep outputsCheck 7: RBAC permissions.
Query the Foundry project's role assignments to verify the current identity has the required roles. This is the check that individual commands almost never do, and it's one of the most common sources of confusing errors.
Permissions: sufficient (Contributor + Cognitive Services User)Permissions: missing 'Cognitive Services User' role on the Foundry projectaz role assignment create --assignee user@contoso.com --role "Cognitive Services User" --scope <resource-id>Or ask your admin to grant access.Check 8: Agent status.
If the agent has been deployed (deployment metadata exists in azd env), query Foundry for its current status.
Agent 'echo-agent': active (v3)Agent 'echo-agent': failedazd ai agent monitor --follow(check logs)Check 9: Manual env vars.
Any
${...}references inazure.yamlthat aren't attributable to Bicep outputs -- verify they're set in.envor azd env.MY_API_KEY: setMY_API_KEY: not setAdd to .envExample Output (all healthy)
Example Output (issues found)
Example Output (permission issue)
How
doctorand Next-Step Hints RelateThey cover different situations:
doctor.doctoris for when you've lost that thread -- cleared the terminal, hit an error you can't diagnose, or are picking up a project cold. It re-evaluates everything from scratch and gives you both the status and the fix.Under the hood, both share the same
ProjectStateassembly logic.doctorruns every check unconditionally; the resolver runs only the subset relevant to whatever command just finished.Suggestion Output Format
Every next-step suggestion follows the same format:
Ground rules:
<command> -- <description>(two spaces before--).Next:label is always present so suggestions stand out visually from command output.OpenAPI Discovery for Invoke Payloads
The decision tree handles "which command next" well, but the invoke payload problem -- "what do I actually send to this agent?" -- needs richer input than
azure.yamlcan provide.The Gap
Every sample has a different payload shape:
{"input": "Hello!"}{"message": "Hello!"}(or custom shapes)The CLI can infer protocol from
azure.yaml, but the actual payload schema -- especially for invocations-protocol agents with custom request bodies -- lives in the agent's code, not config.What Already Exists
The invocation protocol already defines a well-known OpenAPI endpoint, and the extension already has code to consume it:
Protocol convention: Agents can serve an OpenAPI 3.0 spec at
GET /invocations/docs/openapi.json. This is part of the invocation protocol -- not a new proposal.Extension support:
fetchOpenAPISpecin helpers.go already probes the well-known path, fetches the spec, and caches it to disk (openapi-{agent}-{local|remote}.jsonin the azd env directory). Failures are non-fatal and silently ignored.SDK support: The Python SDK (
azure-ai-agentserver-invocations) supports this via theopenapi_specparameter onInvocationAgentServerHost. See the human-in-the-loop sample for a working example -- the agent defines anOPENAPI_SPECdict describing its endpoints and request schemas, then passes it toInvocationAgentServerHost(openapi_spec=OPENAPI_SPEC).Opt-In by the Agent Author
This is not automatic. The agent developer must define and include the OpenAPI spec for it to be served. The SDK does not generate it from handler types today. This means:
openapi_specand passing it to the server host. If they don't, the CLI falls back gracefully.CLI Behavior
After
azd ai agent runstarts the agent, the CLI probesGET /invocations/docs/openapi.json. Two paths:Spec available: Parse it, grab the invoke endpoint and an example payload (from
exampleorschemafields), and build the exact invoke command shown to the developer.Spec unavailable (404, timeout, etc.): Fall back to protocol-generic suggestions (
"Hello!"for responses,'{"message": "Hello!"}'for invocations) plus a helpful hint:This nudges developers toward verifying the right payload shape for their agent's
/invocationsendpoint without requiring the spec to exist.No cross-team dependencies remain. The protocol convention exists, the extension code exists, and samples can include specs today.
Example: OpenAPI-Enhanced Output After
runWithout OpenAPI (generic):
With OpenAPI (sample-specific):
Implementation Sketch
Command Coverage Matrix
initrunorazd provisionruninvoke --localinvoke --localazd deployinvoke(remote)showormonitor --followshowinvokeormonitor --followdeploy(post-deploy hook)showorinvokedoctorNext-Step Resolver (core)
We already have two patterns to generalize: the
AZURE_AI_PROJECT_IDcheck in init.go (state-aware branching) andexterrorsfactories (structured error + suggestion). The resolver unifies both into a single function called at command completion points (command exit for most, startup-success forrun):Where
ProjectStateis assembled by reading:azure.yaml(parsed once, cached)Each
Suggestionhas:command: the CLI command to run (e.g.,azd ai agent run)description: one-line explanation (e.g.,start the agent locally)priority: ordering when multiple suggestions applyThe resolver outputs a formatted block:
Where It Hooks In
The resolver fires at command completion --
PostRunfor short-lived commands, the "listening" callback forrun. No new CLI flags needed; it's always-on. If output is piped or we're in CI, guidance goes to stderr so it doesn't mess with stdout parsing.Downstream Impact
azure.ai.agentsextension: NewresolveNextStepsfunction for success-path guidance, plusdoctor. Both readazure.yamland azd env -- already available in the command context.GET /invocations/docs/openapi.jsonand the Python SDK already acceptsopenapi_speconInvocationAgentServerHost.openapi_specdefinitions to our invocations-protocol samples so the getting-started happy path always produces rich invoke examples. Responses-protocol samples need no changes.Alternatives Considered
AZD Hooks for Sample-Specific Guidance
Instead of the CLI reading state and inferring suggestions, each sample ships hook scripts (
postinit.sh,postrun.sh) that echo guidance to the terminal.What's good about this:
Why I don't think it works:
README Markers
Standardized HTML-comment markers in each sample's README (e.g.,
<!-- section:azd:after-init -->) that the CLI parses and displays.What's good about this:
Why I don't think it works:
/invocations/docs/openapi.jsonalready handles the invoke payload problem, which is the main gap markers would fill.Static Hints in
azure.yamlA
config.samplesblock inazure.yamlwhere sample authors declare example payloads:What's good about this:
Why I don't think it works:
/invocations/docs/openapi.jsonwhen the agent includes one)./invocations/docs/openapi.jsonis strictly more capable and already available.Scope Boundaries
In the CLI (next-step resolver): State-aware command suggestions on success paths, driven by azure.yaml structure, azd env state, and exit status. Deterministic logic that every project benefits from.
In the CLI (
doctor): Comprehensive project health checks -- auth, permissions, reachability, env var resolution, agent status. One pass, all the answers.In the Agent Server SDKs (no changes needed): The invocation protocol already supports serving OpenAPI at
/invocations/docs/openapi.json. The Python SDK already supportsopenapi_speconInvocationAgentServerHost. Agent authors opt in by defining and passing a spec -- we include it by default in our samples.Already covered: Error-path guidance.
exterrorshandles this across the extension. We don't touch it here.Out of scope: CI/CD pipeline guidance. Multi-agent orchestration. Error message formatting (that's
exterrorsterritory and a future Thread 10 audit).