feat(routines): add azd ai routine commands#8241
Conversation
Add the full v1 routine command subtree to the azure.ai.routines extension as specified in the design spec (PR #8200). Commands implemented: - routine create, update, show, list, delete - routine enable, disable (dedicated idempotent action routes) - routine dispatch (calls dispatch_async, --async flag for client-side wait) - routine run list (auto-paging, --top, --filter) New packages: - internal/exterrors/ -- structured error codes and helpers - internal/pkg/routines/ -- data-plane HTTP client and models - internal/cmd/endpoint.go -- 5-level project endpoint resolver Wire format: trigger/action as Record with 'default' key. All calls include x-ms-foundry-features-opt-in: Routines=V1Preview header. Also adds the design spec at cli/azd/docs/design/ai-routine-design-spec.md.
…dels - Add readAzdProjectSourcesFunc seam to endpoint.go for daemon isolation - endpoint_test.go: isFoundryHost, validateProjectEndpoint, full cascade tests - routine_create_test.go: buildTrigger and buildAction table tests - routine_manifest_test.go: readRoutineManifest (JSON/YAML), mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction - models_test.go: TriggerCLIToWire and ActionCLIToWire completeness - Add yaml struct tags to models.go for YAML manifest support
- Extract stubAzdProjectSources() helper (mirrors stubAzdHostedSources in agents) - isolateFromAzdDaemon now also clears AZD_SERVER env var - Add t.Parallel() to all pure-function tests (isFoundryHost, validateProjectEndpoint, buildTrigger, buildAction, mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction, TriggerCLIToWire/ActionCLIToWire map checks)
- cspell.yaml: add exterrors, sess, routineName, azdProjectSources to word list - endpoint.go: remove unused projectEndpointPathPrefix constant - routine_create.go: wrap long buildAction() call (line >125 chars) - routine_update.go: wrap long --file flag help text - routine_manifest_test.go: expand inline map literals to multi-line - client.go: wrap ListRoutineRuns signature to fit 125-char limit - Run gofmt -w and go fix on all files (codes.go, client.go, models.go formatting)
golangci-lint flagged ptrBool as unused. The function had no call sites; the //go:fix inline directive does not exempt it from the unused linter.
The design spec is tracked separately in PR #8200; this PR focuses on the implementation only.
…pagination - Extract getPage helper so resp.Body.Close runs per iteration in ListRoutines and ListRoutineRuns (defer-in-loop leaked FDs). - Preserve the filter query param across pages in ListRoutineRuns; previously page 2+ only carried pageToken and dropped the filter. - Correct dispatch command help text: the service always runs routines asynchronously, so the old 'waits and streams' wording was wrong.
b1c629f to
8e0d723
Compare
…ine' The extension namespace 'ai.routine' already mounts the extension under 'azd ai routine'. Adding a 'routine' subcommand group on top of that produced the redundant 'azd ai routine routine <cmd>' path. Move --project-endpoint persistent flag and all subcommands directly onto rootCmd so the correct usage is 'azd ai routine <cmd>'.
📋 Prioritization NoteThanks for the contribution! The linked issue isn't in the current milestone yet. |
There was a problem hiding this comment.
Pull request overview
Adds the initial azd ai routine command surface for the new azure.ai.routines extension, including a Foundry Routines data-plane client, models, endpoint resolution, and CLI commands for routine CRUD/dispatch/run history.
Changes:
- Introduces
internal/pkg/routines(models + HTTP client) andinternal/exterrorsstructured error helpers. - Implements routine commands (
create/update/show/list/delete/enable/disable/dispatch, plusrun list) with table/JSON output. - Adds a 5-level project endpoint resolver with unit tests, plus additional unit tests for manifest parsing and flag→wire mapping.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/extensions/azure.ai.routines/internal/pkg/routines/models.go | Routine/trigger/action/run models + CLI→wire mappings/constants. |
| cli/azd/extensions/azure.ai.routines/internal/pkg/routines/models_test.go | Unit tests for CLI→wire mappings and default keys. |
| cli/azd/extensions/azure.ai.routines/internal/pkg/routines/client.go | Data-plane HTTP client for routines operations (CRUD, enable/disable, dispatch, run listing). |
| cli/azd/extensions/azure.ai.routines/internal/exterrors/errors.go | Structured LocalError/ServiceError helpers and wrappers. |
| cli/azd/extensions/azure.ai.routines/internal/exterrors/codes.go | Error/operation code constants used across commands. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/root.go | Extension root command wiring (registers routine subcommands + persistent endpoint flag). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/endpoint.go | Project endpoint validation + 5-level endpoint resolution cascade (flag/env/config/envvar/error). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/endpoint_test.go | Unit tests for endpoint validation and cascade precedence. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine.go | (Currently unused) routine subcommand group definition. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create.go | routine create implementation (flags/manifest support, trigger/action building). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create_test.go | Unit tests for create flag→wire trigger/action builders. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_update.go | routine update implementation (GET-then-PUT merge semantics, type-switch guard). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_show.go | routine show implementation (table/JSON). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_list.go | routine list implementation (auto-pagination, table/JSON). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_delete.go | routine delete implementation (confirmation prompt, no-prompt/force guard). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_enable.go | routine enable implementation. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_disable.go | routine disable implementation. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_dispatch.go | routine dispatch implementation (dispatch_async call + output handling). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_run.go | routine run list implementation (run history listing with top/filter). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_helpers.go | Shared helpers (client creation, JSON printing, tabwriter formatting). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_manifest.go | Manifest parsing and create/update merge helpers. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_manifest_test.go | Unit tests for manifest parsing/merge/update-flag application helpers. |
| cli/azd/extensions/azure.ai.routines/go.mod | Extension module dependencies updated. |
| cli/azd/extensions/azure.ai.routines/go.sum | Dependency checksum updates. |
| cli/azd/extensions/azure.ai.routines/cspell.yaml | Adds extension-local cspell words. |
The first cut of the Routines client used header/route/field shapes that did not match the TypeSpec being merged in azure-rest-api-specs#42779. Realign the extension with the spec so requests round-trip cleanly: * Preview header renamed from 'x-ms-foundry-features-opt-in' to 'Foundry-Features' (the value 'Routines=V1Preview' was already correct). * Async dispatch route renamed ':dispatch_async' -> ':dispatchAsync'; the action segment is case-sensitive per spec. * Dropped the non-existent ':enable' / ':disable' action routes; enable/disable now read the routine and PUT it back with 'enabled' flipped (idempotent: no-op if already at the target value). * DispatchRoutineRequest wraps a discriminated 'payload' object whose 'type' must match the routine's action type; --conversation-id was removed from dispatch (the spec does not expose it). * Routine.Action is now a single discriminated object (not an 'actions' map keyed by name). * RoutineAction.AgentName -> AgentID; the CLI flag is renamed to --agent-id accordingly. * RoutineTrigger.Cron -> CronExpression to match the TypeSpec field. * PagedRoutine pagination follows the absolute 'nextLink' URL from Azure.Core.Page<Routine> instead of re-deriving a continuation query. * RoutineRun gains the additional fields documented in the spec (phase, trigger_type, attempt_source, action_type, triggered_at, dispatch_id, action_correlation_id, response_id, error_type, error_message); 'run list' now prints phase alongside status. * EventRoutineTrigger fields aligned to the spec: connection_id, owner, repository, actions[]; removed 'assignee'. * DispatchRoutineResponse drops the unused 'status' field. Tests, mock manifests, and the E2E driver were updated to the new contract (--agent-id, agent_id, cron_expression, single 'action'). Note: the live Foundry Routines preview endpoint still returns HTTP 500 on /routines?api-version=v1 even with the correct request shape; that is an upstream service bug tracked separately.
wbreza
left a comment
There was a problem hiding this comment.
Code Review (automated, supplements Copilot pre-review)
TL;DR
First v1 of the azd ai routine command surface in a new extension. Implements create/update/show/list/delete/enable/disable/dispatch plus run list, a 5-level endpoint resolver, structured exterrors, and a Foundry data-plane HTTP client. Code is well-organized and the design is sound. However: the HTTP client has zero tests, enable/disable are GET-then-PUT with no concurrency guard, and the PR description contradicts the actual implementation in two places (enable route, preview header name). Findings below are net-new — they do not duplicate Copilot's 6 pre-review comments (port stripping, --enabled masking manifest, github-issue trigger accepted, boolStr nil→"true", applyUpdateFlags using fmt.Errorf, unused newRoutineCommand), all of which look valid.
🔴 Critical
C1 — PR description contradicts the implementation in two places (internal/pkg/routines/client.go)
| PR description claims | Code actually does |
|---|---|
"routine enable/disable — Calls dedicated :enable route (idempotent)" |
setEnabled does GetRoutine → mutate Enabled → PutRoutine. The code comment itself states: "The Foundry Routines API does not expose a dedicated :enable route; the client mutates the routine resource directly." |
"Every request includes x-ms-foundry-features-opt-in: Routines=V1Preview" |
Header constant is routinesPreviewHeader = "Foundry-Features", so the actual header sent is Foundry-Features: Routines=V1Preview. |
Two consequences:
- Reviewers, docs writers, and future maintainers will form an incorrect mental model from the description.
- If the Foundry service actually expects the
x-ms-foundry-features-opt-inheader name, the preview opt-in is silently broken and the server is ignoring it. Please verify the server-side header name against the Foundry TypeSpec and fix either the code or the description.
🟠 High
H1 — setEnabled has a lost-update race and no If-Match guard (client.go, EnableRoutine/DisableRoutine/setEnabled)
Because enable/disable is GET-then-PUT against the full routine resource, any concurrent writer (another CLI invocation, another user, a workspace tool) that mutates the routine between the GET and the PUT will be silently clobbered. There is no If-Match/ETag conditional request to detect the race.
Fix options: (a) capture ETag from the GET response and send If-Match on the PUT, returning a structured "conflict" error on 412; (b) document this race in user-facing help text and emit a warning when the GET observes Enabled != desired; (c) work with the Foundry team to add a true :enable/:disable endpoint (preferred long-term, matches the PR description).
H2 — HTTP client (internal/pkg/routines/client.go) has zero tests
The data-plane integration boundary — every CRUD/enable/disable/dispatch/list call — has no unit coverage. This means:
- Pagination loop termination (
nextLinkfollow inListRoutines,pageTokeninListRoutineRuns) is unverified. - Same-origin checks on
nextLinkare unverified. - Error envelope parsing for 401 / 403 / 404 / 409 / 5xx is unverified.
- Request body marshaling for trigger/action records is unverified end-to-end.
- The client-side
Topcap in run-listing is not exercised.
The PR description acknowledges tests are "tracked as follow-up work." Given this is the wire boundary of a Preview API, at minimum httptest.Server-based happy paths for each method plus pagination termination should ship in v1. The current command-level tests only exercise pure helpers (buildTrigger, buildAction, applyUpdateFlags) and don't reach the client.
H3 — routine update may report "no changes" when only --file is used (routine_update.go, routine_manifest.go applyUpdateFlags)
applyUpdateFlags only increments its changed counter when individual CLI flags are set; the --file merge path does not contribute to that counter. The no-op early-return checks changed == 0, so routine update <name> --file manifest.yaml (no other flags) likely takes the no-op branch even though the manifest legitimately changed fields. Either snapshot/diff the routine before vs after the merge, or count merge-introduced changes.
🟡 Medium
M1 — Credential ignores tenant for guest / multi-tenant users (internal/cmd/routine_helpers.go)
newRoutineClient constructs azidentity.NewAzureDeveloperCLICredential(&AzureDeveloperCLICredentialOptions{}) with no TenantID. Per .github/instructions/extensions.instructions.md, extensions must use the user-access tenant (Subscription.UserTenantId, not the resource tenant). For multi-tenant or guest users the default tenant from azd auth may not match the Foundry project's tenant, causing AADSTS errors. Plumb a tenant hint (e.g., from azd context or by extracting from the endpoint/project metadata) into the credential options.
M2 — Pagination has no max-page safety cap (client.go, ListRoutines / ListRoutineRuns)
Both methods loop while NextLink/PageToken is non-empty. A misbehaving or compromised service that returns a self-referential or cycling token will cause an unbounded loop that consumes CPU and memory (each appended page grows the result slice). Add a defensive page-count cap (e.g., 1000) and return a clear "pagination exceeded N pages" error.
M3 — Unbounded io.ReadAll on response bodies (client.go, JSON decode helper)
The decode helper reads the entire response body into memory without io.LimitReader. A misbehaving server returning a multi-GB body will OOM the process. Wrap with a sane cap (e.g., io.LimitReader(body, 32<<20)) before decoding.
M4 — Token scope https://ai.azure.com/.default should be verified (client.go, NewClient)
The Bearer token scope is hardcoded. If the Foundry Routines data plane requires a project-specific or different audience (services.ai.azure.com, a regional audience, or a Foundry-specific resource ID), token acquisition will succeed but the server may reject with 401/403. Verify against the Foundry TypeSpec and add a code comment justifying the chosen scope.
M5 — assert.Error instead of require.Error in negative-path tests (routine_create_test.go)
Several "expect error" tests use assert.Error followed by further dereferences of the result. If the implementation regresses and returns no error, the test logs a failure but continues executing on a nil/zero value, potentially panicking or producing a confusing secondary failure that hides the real cause. Switch to require.Error.
🟢 Low
- L1 —
routine dispatch --asynchelp text ("Suppress descriptive output; useful for scripting") is ambiguous. Clarify that it prints only the dispatch ID and exits without polling. - L2 — Manifest file-format detection falls back to JSON for empty/unknown extensions, and the error message recommends only JSON, leaving YAML users guessing. Mention both formats in the error.
- L3 —
routine runsubcommand help doesn't mention thatrun show/run deleteare intentionally deferred. Add a one-line note inLong. - L4 —
printJSON/ table writers write directly toos.Stdout, making future output-assertion tests impossible without monkey-patching. Accept anio.Writerparameter.
Notes (not findings)
- This PR lives in an extension with its own
go.mod, so core-azdpatterns (IoC container,ActionDescriptor,ErrorWithSuggestion) do not apply — the extension uses cobra directly. new(expr)is valid in Go 1.26+ (seecli/azd/AGENTS.md).- Adding
DisallowUnknownFieldsto the JSON decoder is not recommended — it would break forward-compat with non-breaking Foundry API additions.
Generated review · supplements Copilot inline comments · posted as non-blocking COMMENT.
- endpoint: reject project endpoints with an explicit port so the normalized URL cannot silently strip a non-default port - routine create: only set Enabled from --enabled when the user explicitly passes the flag, so a manifest's enabled value is honored; default to enabled=true if neither source provides one - routine create: explicitly reject --trigger github-issue (deferred for v1) instead of producing an incomplete github_issue trigger - routine_helpers: boolStr now returns "unknown" for a nil pointer to avoid displaying "true" when the field is absent from the service response - routine_manifest: surface applyUpdateFlags user-input errors as exterrors.Validation (CodeInvalidParameter) for consistent CLI error shapes
wbreza
left a comment
There was a problem hiding this comment.
Re-review: commit 1ff54876 (fix(routines): address PR review feedback)
Scope: one new commit since my previous review — 6 files, +82/-9.
✅ Copilot's pre-review findings — all addressed correctly
| # | Issue | Fix |
|---|---|---|
| 1 | endpoint.go silently strips explicit ports |
Explicit port now rejected with structured error + 2 new tests |
| 2 | --enabled default masks manifest value |
Uses cmd.Flags().Changed("enabled"); default applied only after manifest merge |
| 3 | --trigger github-issue accepted but incomplete |
Explicit rejection in buildTrigger with deferred-v1 message |
| 4 | boolStr returns "true" for nil pointer |
Now returns "unknown" |
| 5 | applyUpdateFlags uses plain fmt.Errorf |
All 6 sites converted to exterrors.Validation(CodeInvalidParameter, …) |
(Copilot's 6th finding — dead newRoutineCommand — was already resolved in earlier commit b7d375b8.)
❌ Still open from my prior review
These weren't in the scope of this commit, just noting so they don't get lost:
- C1 — PR description still claims a dedicated
:enableroute andx-ms-foundry-features-opt-inheader. The actual code does GET-then-PUT and sendsFoundry-Features. Please reconcile description vs implementation (and verify the preview header name with the Foundry team — if the server expectsx-ms-foundry-features-opt-in, opt-in is silently broken). - H1 — Lost-update race in
setEnabled(noIf-Match/ETag). - H2 —
internal/pkg/routines/client.gostill has zero tests. - H3 —
routine update --filemay falsely report "no changes" because file-merge doesn't increment thechangedcounter. - M1 —
AzureDeveloperCLICredentialconstructed with noTenantID; will break for guest / multi-tenant users.
No new blocking issues introduced by this commit. Nice clean turnaround on the Copilot items.
…ng extensions Other AI extensions (projects, agents, toolboxes, inspector) ship a .golangci.yaml lint config and an AGENTS.md contributor guide. Add both to azure.ai.routines so it follows the same convention, and register the project-specific �xterrors word in cspell.yaml.
The deployed Foundry Routines data plane applies a camelCase property
naming policy on the wire (e.g. `cronExpression`, `timeZone`,
`agentId`), even though the upstream TypeSpec / OpenAPI document still
emits snake_case. With snake_case JSON tags, `routine create` and
`update` always failed with errors like:
triggers['default'].cronExpression must be provided for schedule routines
exactly one of action.agentId or action.agentEndpointId must be provided
and routines read back from `show` / `list` would have empty
trigger/action fields because the camelCase wire payload did not
deserialize into snake_case-tagged Go fields.
Switch the JSON tags on `Routine`, `RoutineTrigger`, `RoutineAction`,
`RoutineRun`, `PagedRoutineRun`, and `DispatchRoutineResponse` to
camelCase so requests/responses round-trip cleanly against the deployed
service. YAML tags stay snake_case so user-facing `--file` manifests
keep the documented convention.
Verified against a live project endpoint: create/list/show now reach
the service correctly (residual `InternalServerError` from the
backend is unrelated and reproduces from raw curl with the same body).
Use azure-rest-api-specs PR #43186 (Foundry Routines TypeSpec) as the single source of truth for the routines extension, applying every spec change that does not break the currently deployed service, and documenting each deliberate divergence inline and in AGENTS.md. ## Spec alignment * `RoutineRun` and `DispatchRoutineResponse` gain the new `TaskID` field (wire `taskId`); the service already emits it. `dispatch` now prints `Task ID` after `Action Correlation ID`, and JSON output exposes the new field too. * `RoutineTrigger` is restructured to match the spec's `GitHubIssueOpenedRoutineTrigger` shape: dropped `Owner` / `Actions[]`, added `Assignee`. The github trigger is still deferred at the CLI surface, so this is safe. * Inline comments and a new AGENTS.md table call out each divergence the client deliberately keeps to stay compatible with the live service: camelCase wire naming (spec is snake_case), `agentId` field (spec renamed to `agent_name`), `:dispatchAsync` action segment (spec uses `:dispatch_async`), GET+PUT enable/disable fallback (spec adds dedicated routes which still 404), `value`/`nextLink` / `value`/`nextPageToken` paged shapes (spec uses `AgentsPagedResult<T>`), and `github_issue` wire value (spec renamed to `github_issue_opened`). ## CLI bug fix: HTTP/2 stream-reset hang The pipeline now uses a custom `http.Client` with an explicit `ResponseHeaderTimeout` (60s) and `TryTimeout` (30s), and azcore retries are capped at 1. When the Foundry service returns an HTTP/2 RST_STREAM (for example, the schedule-create InternalServerError), the CLI now surfaces a `context deadline exceeded` error within ~40 seconds instead of the previous ~6 minute hang. ## Verified end-to-end against a live Foundry project * timer create / show / list / update / disable / enable / dispatch (with `taskId` round-tripping) / run list / delete all succeed. * schedule create still fails (service-side ISE) but now in under a minute instead of six.
The Foundry data plane currently returns `InternalServerError` for any
`PUT /routines/{name}` request whose trigger is `schedule` (the wire
value behind the CLI's `--trigger recurring`). The CLI side is fully
implemented and verified correct via raw curl, so this is a service-side
issue, but it leaves the `recurring` trigger non-functional end-to-end.
Take the recurring trigger off the public CLI surface so users do not
hit the service hang:
* Drop the `--cron` flag from `routine create` and `routine update`.
* `--trigger recurring` is now rejected with the same "deferred"
shape as `--trigger github-issue`: a clear error pointing the user
at `--trigger timer` and explaining that recurring is gated on the
Foundry service.
* `--trigger` help text and validation messages list only `timer`.
The underlying wire model still carries `cron_expression` /
`time_zone` and the `schedule` discriminator so re-enabling the
trigger when the service is ready is just a CLI flag-wiring change.
Unit tests around buildTrigger and applyUpdateFlags are updated
accordingly.
5e76b31 to
0046e61
Compare
…pt in delete, and action-type flag validation
0046e61 to
9659512
Compare
wbreza
left a comment
There was a problem hiding this comment.
Re-review of PR #8241 — feat(routines): implement azd ai routine commands
Reviewed since wbreza's May 20 review. Skipping items already flagged by prior reviews (endpoint port stripping, body.Enabled flag override, buildTrigger deferred github-issue alias, boolStr nil-pointer returning "true", applyUpdateFlags plain fmt.Errorf).
🔴 High — Security / correctness
1. internal/pkg/routines/client.go — Missing Logging.AllowedHeaders on policy.ClientOptions
Sibling azure.ai.agents restricts logged headers to MsCorrelationIdHeader only. The routines client omits the Logging field entirely. Azure SDK default logging will include the Authorization header (bearer token) in pipeline debug logs.
Fix: mirror agents' Logging: policy.LogOptions{AllowedHeaders: []string{azsdk.MsCorrelationIdHeader}}.
2. internal/cmd/routine_dispatch.go — Wrong op code on inner GetRoutine error
When fetching the routine to validate --input, the error is wrapped with exterrors.OpGetRoutine instead of exterrors.OpDispatchRoutine. Telemetry / user-facing error shows get_routine.* when the user ran dispatch. Note the adjacent IsNotFound branch already uses OpDispatchRoutine — the inconsistency is the smell.
3. internal/exterrors/errors.go — IsCancellation misses gRPC codes.Canceled
Routines checks only errors.Is(err, context.Canceled). Agents also checks status.FromError(err).Code() == codes.Canceled. Cancellations bubbling up from gRPC azdext calls are misclassified as internal errors instead of user-driven.
Fix: port the agents implementation verbatim.
🟡 Medium — Test coverage & design
4. internal/pkg/routines/client.go (+390 LOC) — no unit tests
The data-plane client (CRUD, pagination, enable/disable GET+PUT fallback, dispatchAsync, list runs, custom HTTP/2 transport, same-origin pagination guard) has zero tests. PR description defers "unit tests as follow-up", but the manifest parser (low-risk) gets tests while the mission-critical client does not. Suggest at minimum httptest.NewServer-based tests for: 2xx happy paths, 404/409/5xx, nextLink pagination, validateSameOrigin rejection, context cancellation.
5. Command-layer coverage gaps
No tests for routine_update.go (complex flag→file→default merge), routine_delete, routine_enable, routine_disable, routine_dispatch, routine_list, routine_show, routine_helpers. Agents extension test ratio is ~1:1; this PR is ~1:2.2.
6. internal/pkg/routines/client.go — runtime.NewResponseError(resp) leaks raw response body into errors
Used in ~5 spots. Service error bodies pass straight through. CLI commands re-wrap with exterrors.ServiceFromAzure, but the underlying message text is preserved and may surface in logs/output.
Fix: drain body, extract structured fields, return a controlled message — or document that callers must always wrap and never log the raw error.
7. internal/cmd/routine_manifest.go — os.ReadFile errors all classified as CodeFileNotFound
Permission-denied, is-a-directory, etc. surface to the user as "routine manifest file not found" with misleading remediation.
Fix: gate with os.IsNotExist(err); fall back to a generic file-read error otherwise.
8. internal/exterrors/codes.go — No semantic "not found" code
404s become get_routine.404 rather than e.g. routine_not_found. Hurts telemetry classification and stable error-handling contracts.
Fix: add CodeRoutineNotFound and use it in the IsNotFound branches.
🟢 Low / nit
9. internal/cmd/endpoint_test.go — port edge cases under-tested
The recently-added port rejection covers :443 only. Add: non-default ports (:8443), IPv6+port (https://[::1]:443/...), whitespace-only input. The original bug evaded tests because they didn't exercise non-default ports.
10. internal/cmd/routine_update.go — _ = cmd.Flags().MarkHidden(...) (×2)
Silent error discard. Safe in practice, but if err := ...; err != nil { ... } or a brief comment is cleaner.
11. AGENTS.md — "If this extension grows enough… introduce internal/exterrors"
Phrased as a future option, but the package exists in this PR. Rewrite in present tense so future maintainers don't treat it as optional.
12. go.mod — armappservice/v2 v2.3.0 // indirect added
Indirect, likely a cascade from an azd-core dep bump. Worth a quick confirmation that this isn't pulling unwanted weight on the extension.
✅ Verified clean (no action)
- URL construction (
url.PathEscape/url.QueryEscape) - Pagination
validateSameOrigin - HTTP/2 timeout fix (explicit
ResponseHeaderTimeout,TLSHandshakeTimeout) is sound, not masking - Wire format alignment (camelCase tags,
agentId,:dispatchAsync, GET+PUT enable/disable fallback) matches deployed Foundry service per AGENTS.md - Bearer token scope
https://ai.azure.com/.default - Endpoint validation (HTTPS-only, port rejection,
.services.ai.azure.comallowlist) exterrorsduplication withagents/toolboxesis acknowledged in AGENTS.md as the established cross-extension pattern — not flagged
Summary
Implements
azd ai routine <verb>in theazure.ai.routinesextension. Scoped to the subset of the Foundry Routines TypeSpec (azure-rest-api-specs PR #43186) that the deployed Foundry data plane supports end-to-end today.A routine pairs one trigger (when) with one action (what) on a Foundry project — e.g. "at 8 AM UTC on Jan 2, invoke the
daily-reportagent".Closes #8159
Included
routine create <name>--trigger timeror--filemanifest;--forceupsertroutine update <name>routine show/list/deleteroutine enable/disableroutine dispatch <name>--input,--async; surfacesdispatchId+actionCorrelationId+taskIdroutine run list <name>--top,--filterTrigger:
timer(--at,--time-zone).Action:
agent-response(default;--agent-idor--agent-endpoint-id) oragent-invoke(--agent-endpoint-id).Deferred (re-add when the Foundry service is ready)
--trigger recurring/--cronschedulecreatebuildTriggercase--trigger github-issuebuildTriggercasePOST :enable/:disablePOST :dispatch_async(snake):dispatchAsyncworksDispatchRoutineAsyncagent_namefield renameagentIdAgentIDJSON taggithub_issue_openedvalueTriggerCLIToWireAgentsPagedResult<T>envelopevalue/nextLinkEach divergence is also documented inline in code and in
cli/azd/extensions/azure.ai.routines/AGENTS.md.Tests
go build ./...andgo test ./...pass. Manual E2E against a live Foundry project covers all included commands (timer create, JSON/YAML manifest, duplicate guard,--forceupsert, update, enable/disable, dispatch sync + async +taskId, run list pagination +--top, delete + 404 verification, and all client-side validators).