Skip to content

feat(routines): add azd ai routine commands#8241

Open
huimiu wants to merge 19 commits into
mainfrom
hui/add-routine-command
Open

feat(routines): add azd ai routine commands#8241
huimiu wants to merge 19 commits into
mainfrom
hui/add-routine-command

Conversation

@huimiu
Copy link
Copy Markdown
Member

@huimiu huimiu commented May 19, 2026

Summary

Implements azd ai routine <verb> in the azure.ai.routines extension. Scoped to the subset of the Foundry Routines TypeSpec (azure-rest-api-specs PR #43186) that the deployed Foundry data plane supports end-to-end today.

A routine pairs one trigger (when) with one action (what) on a Foundry project — e.g. "at 8 AM UTC on Jan 2, invoke the daily-report agent".

Closes #8159

Included

Command Notes
routine create <name> --trigger timer or --file manifest; --force upsert
routine update <name> GET-then-PUT merge; immutable trigger/action type-switch guard
routine show / list / delete Standard CRUD; clean 404 mapping
routine enable / disable Idempotent
routine dispatch <name> --input, --async; surfaces dispatchId + actionCorrelationId + taskId
routine run list <name> Auto-paged; --top, --filter

Trigger: timer (--at, --time-zone).
Action: agent-response (default; --agent-id or --agent-endpoint-id) or agent-invoke (--agent-endpoint-id).

Deferred (re-add when the Foundry service is ready)

Deferred Why What to flip when ready
--trigger recurring / --cron Service 500s on every schedule create Re-add flags + buildTrigger case
--trigger github-issue Workspace connection model not yet wired Add flags + buildTrigger case
POST :enable / :disable Service 404s Switch client from GET+PUT
POST :dispatch_async (snake) Service 404s; only :dispatchAsync works Flip action segment in DispatchRoutineAsync
agent_name field rename Service still expects agentId Rename AgentID JSON tag
github_issue_opened value Service rejects (400) Update TriggerCLIToWire
AgentsPagedResult<T> envelope Service returns value / nextLink Update paged models + pagination loop

Each divergence is also documented inline in code and in cli/azd/extensions/azure.ai.routines/AGENTS.md.

Tests

go build ./... and go test ./... pass. Manual E2E against a live Foundry project covers all included commands (timer create, JSON/YAML manifest, duplicate guard, --force upsert, update, enable/disable, dispatch sync + async + taskId, run list pagination + --top, delete + 404 verification, and all client-side validators).

Add the full v1 routine command subtree to the azure.ai.routines
extension as specified in the design spec (PR #8200).

Commands implemented:
- routine create, update, show, list, delete
- routine enable, disable (dedicated idempotent action routes)
- routine dispatch (calls dispatch_async, --async flag for client-side wait)
- routine run list (auto-paging, --top, --filter)

New packages:
- internal/exterrors/ -- structured error codes and helpers
- internal/pkg/routines/ -- data-plane HTTP client and models
- internal/cmd/endpoint.go -- 5-level project endpoint resolver

Wire format: trigger/action as Record with 'default' key.
All calls include x-ms-foundry-features-opt-in: Routines=V1Preview header.

Also adds the design spec at cli/azd/docs/design/ai-routine-design-spec.md.
huimiu added 6 commits May 19, 2026 15:18
…dels

- Add readAzdProjectSourcesFunc seam to endpoint.go for daemon isolation
- endpoint_test.go: isFoundryHost, validateProjectEndpoint, full cascade tests
- routine_create_test.go: buildTrigger and buildAction table tests
- routine_manifest_test.go: readRoutineManifest (JSON/YAML), mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction
- models_test.go: TriggerCLIToWire and ActionCLIToWire completeness
- Add yaml struct tags to models.go for YAML manifest support
- Extract stubAzdProjectSources() helper (mirrors stubAzdHostedSources in agents)
- isolateFromAzdDaemon now also clears AZD_SERVER env var
- Add t.Parallel() to all pure-function tests (isFoundryHost, validateProjectEndpoint,
  buildTrigger, buildAction, mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction,
  TriggerCLIToWire/ActionCLIToWire map checks)
- cspell.yaml: add exterrors, sess, routineName, azdProjectSources to word list
- endpoint.go: remove unused projectEndpointPathPrefix constant
- routine_create.go: wrap long buildAction() call (line >125 chars)
- routine_update.go: wrap long --file flag help text
- routine_manifest_test.go: expand inline map literals to multi-line
- client.go: wrap ListRoutineRuns signature to fit 125-char limit
- Run gofmt -w and go fix on all files (codes.go, client.go, models.go formatting)
golangci-lint flagged ptrBool as unused. The function had no call sites; the //go:fix inline directive does not exempt it from the unused linter.
The design spec is tracked separately in PR #8200; this PR focuses on the implementation only.
…pagination

- Extract getPage helper so resp.Body.Close runs per iteration in
  ListRoutines and ListRoutineRuns (defer-in-loop leaked FDs).
- Preserve the filter query param across pages in ListRoutineRuns;
  previously page 2+ only carried pageToken and dropped the filter.
- Correct dispatch command help text: the service always runs routines
  asynchronously, so the old 'waits and streams' wording was wrong.
@huimiu huimiu force-pushed the hui/add-routine-command branch from b1c629f to 8e0d723 Compare May 19, 2026 09:00
…ine'

The extension namespace 'ai.routine' already mounts the extension under
'azd ai routine'. Adding a 'routine' subcommand group on top of that
produced the redundant 'azd ai routine routine <cmd>' path.

Move --project-endpoint persistent flag and all subcommands directly onto
rootCmd so the correct usage is 'azd ai routine <cmd>'.
@huimiu huimiu marked this pull request as ready for review May 19, 2026 09:46
Copilot AI review requested due to automatic review settings May 19, 2026 09:46
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

📋 Prioritization Note

Thanks for the contribution! The linked issue isn't in the current milestone yet.
Review may take a bit longer — reach out to @rajeshkamal5050 or @kristenwomack if you'd like to discuss prioritization.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the initial azd ai routine command surface for the new azure.ai.routines extension, including a Foundry Routines data-plane client, models, endpoint resolution, and CLI commands for routine CRUD/dispatch/run history.

Changes:

  • Introduces internal/pkg/routines (models + HTTP client) and internal/exterrors structured error helpers.
  • Implements routine commands (create/update/show/list/delete/enable/disable/dispatch, plus run list) with table/JSON output.
  • Adds a 5-level project endpoint resolver with unit tests, plus additional unit tests for manifest parsing and flag→wire mapping.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
cli/azd/extensions/azure.ai.routines/internal/pkg/routines/models.go Routine/trigger/action/run models + CLI→wire mappings/constants.
cli/azd/extensions/azure.ai.routines/internal/pkg/routines/models_test.go Unit tests for CLI→wire mappings and default keys.
cli/azd/extensions/azure.ai.routines/internal/pkg/routines/client.go Data-plane HTTP client for routines operations (CRUD, enable/disable, dispatch, run listing).
cli/azd/extensions/azure.ai.routines/internal/exterrors/errors.go Structured LocalError/ServiceError helpers and wrappers.
cli/azd/extensions/azure.ai.routines/internal/exterrors/codes.go Error/operation code constants used across commands.
cli/azd/extensions/azure.ai.routines/internal/cmd/root.go Extension root command wiring (registers routine subcommands + persistent endpoint flag).
cli/azd/extensions/azure.ai.routines/internal/cmd/endpoint.go Project endpoint validation + 5-level endpoint resolution cascade (flag/env/config/envvar/error).
cli/azd/extensions/azure.ai.routines/internal/cmd/endpoint_test.go Unit tests for endpoint validation and cascade precedence.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine.go (Currently unused) routine subcommand group definition.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create.go routine create implementation (flags/manifest support, trigger/action building).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create_test.go Unit tests for create flag→wire trigger/action builders.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_update.go routine update implementation (GET-then-PUT merge semantics, type-switch guard).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_show.go routine show implementation (table/JSON).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_list.go routine list implementation (auto-pagination, table/JSON).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_delete.go routine delete implementation (confirmation prompt, no-prompt/force guard).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_enable.go routine enable implementation.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_disable.go routine disable implementation.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_dispatch.go routine dispatch implementation (dispatch_async call + output handling).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_run.go routine run list implementation (run history listing with top/filter).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_helpers.go Shared helpers (client creation, JSON printing, tabwriter formatting).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_manifest.go Manifest parsing and create/update merge helpers.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_manifest_test.go Unit tests for manifest parsing/merge/update-flag application helpers.
cli/azd/extensions/azure.ai.routines/go.mod Extension module dependencies updated.
cli/azd/extensions/azure.ai.routines/go.sum Dependency checksum updates.
cli/azd/extensions/azure.ai.routines/cspell.yaml Adds extension-local cspell words.

Comment thread cli/azd/extensions/azure.ai.routines/internal/cmd/endpoint.go
Comment thread cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create.go
Comment thread cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create.go
Comment thread cli/azd/extensions/azure.ai.routines/internal/cmd/routine_helpers.go Outdated
Comment thread cli/azd/extensions/azure.ai.routines/internal/cmd/routine_manifest.go Outdated
Comment thread cli/azd/extensions/azure.ai.routines/internal/cmd/routine.go Outdated
The first cut of the Routines client used header/route/field shapes that
did not match the TypeSpec being merged in azure-rest-api-specs#42779.
Realign the extension with the spec so requests round-trip cleanly:

* Preview header renamed from 'x-ms-foundry-features-opt-in' to
  'Foundry-Features' (the value 'Routines=V1Preview' was already correct).
* Async dispatch route renamed ':dispatch_async' -> ':dispatchAsync';
  the action segment is case-sensitive per spec.
* Dropped the non-existent ':enable' / ':disable' action routes;
  enable/disable now read the routine and PUT it back with 'enabled'
  flipped (idempotent: no-op if already at the target value).
* DispatchRoutineRequest wraps a discriminated 'payload' object whose
  'type' must match the routine's action type; --conversation-id was
  removed from dispatch (the spec does not expose it).
* Routine.Action is now a single discriminated object (not an 'actions'
  map keyed by name).
* RoutineAction.AgentName -> AgentID; the CLI flag is renamed to
  --agent-id accordingly.
* RoutineTrigger.Cron -> CronExpression to match the TypeSpec field.
* PagedRoutine pagination follows the absolute 'nextLink' URL from
  Azure.Core.Page<Routine> instead of re-deriving a continuation query.
* RoutineRun gains the additional fields documented in the spec
  (phase, trigger_type, attempt_source, action_type, triggered_at,
  dispatch_id, action_correlation_id, response_id, error_type,
  error_message); 'run list' now prints phase alongside status.
* EventRoutineTrigger fields aligned to the spec: connection_id, owner,
  repository, actions[]; removed 'assignee'.
* DispatchRoutineResponse drops the unused 'status' field.

Tests, mock manifests, and the E2E driver were updated to the new
contract (--agent-id, agent_id, cron_expression, single 'action').

Note: the live Foundry Routines preview endpoint still returns HTTP 500
on /routines?api-version=v1 even with the correct request shape; that is
an upstream service bug tracked separately.
@huimiu huimiu changed the title feat(routines): implement azd ai routine commands (v1) feat(routines): implement azd ai routine commands May 19, 2026
Copy link
Copy Markdown
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review (automated, supplements Copilot pre-review)

TL;DR

First v1 of the azd ai routine command surface in a new extension. Implements create/update/show/list/delete/enable/disable/dispatch plus run list, a 5-level endpoint resolver, structured exterrors, and a Foundry data-plane HTTP client. Code is well-organized and the design is sound. However: the HTTP client has zero tests, enable/disable are GET-then-PUT with no concurrency guard, and the PR description contradicts the actual implementation in two places (enable route, preview header name). Findings below are net-new — they do not duplicate Copilot's 6 pre-review comments (port stripping, --enabled masking manifest, github-issue trigger accepted, boolStr nil→"true", applyUpdateFlags using fmt.Errorf, unused newRoutineCommand), all of which look valid.


🔴 Critical

C1 — PR description contradicts the implementation in two places (internal/pkg/routines/client.go)

PR description claims Code actually does
"routine enable/disable — Calls dedicated :enable route (idempotent)" setEnabled does GetRoutine → mutate EnabledPutRoutine. The code comment itself states: "The Foundry Routines API does not expose a dedicated :enable route; the client mutates the routine resource directly."
"Every request includes x-ms-foundry-features-opt-in: Routines=V1Preview" Header constant is routinesPreviewHeader = "Foundry-Features", so the actual header sent is Foundry-Features: Routines=V1Preview.

Two consequences:

  1. Reviewers, docs writers, and future maintainers will form an incorrect mental model from the description.
  2. If the Foundry service actually expects the x-ms-foundry-features-opt-in header name, the preview opt-in is silently broken and the server is ignoring it. Please verify the server-side header name against the Foundry TypeSpec and fix either the code or the description.

🟠 High

H1 — setEnabled has a lost-update race and no If-Match guard (client.go, EnableRoutine/DisableRoutine/setEnabled)

Because enable/disable is GET-then-PUT against the full routine resource, any concurrent writer (another CLI invocation, another user, a workspace tool) that mutates the routine between the GET and the PUT will be silently clobbered. There is no If-Match/ETag conditional request to detect the race.

Fix options: (a) capture ETag from the GET response and send If-Match on the PUT, returning a structured "conflict" error on 412; (b) document this race in user-facing help text and emit a warning when the GET observes Enabled != desired; (c) work with the Foundry team to add a true :enable/:disable endpoint (preferred long-term, matches the PR description).

H2 — HTTP client (internal/pkg/routines/client.go) has zero tests

The data-plane integration boundary — every CRUD/enable/disable/dispatch/list call — has no unit coverage. This means:

  • Pagination loop termination (nextLink follow in ListRoutines, pageToken in ListRoutineRuns) is unverified.
  • Same-origin checks on nextLink are unverified.
  • Error envelope parsing for 401 / 403 / 404 / 409 / 5xx is unverified.
  • Request body marshaling for trigger/action records is unverified end-to-end.
  • The client-side Top cap in run-listing is not exercised.

The PR description acknowledges tests are "tracked as follow-up work." Given this is the wire boundary of a Preview API, at minimum httptest.Server-based happy paths for each method plus pagination termination should ship in v1. The current command-level tests only exercise pure helpers (buildTrigger, buildAction, applyUpdateFlags) and don't reach the client.

H3 — routine update may report "no changes" when only --file is used (routine_update.go, routine_manifest.go applyUpdateFlags)

applyUpdateFlags only increments its changed counter when individual CLI flags are set; the --file merge path does not contribute to that counter. The no-op early-return checks changed == 0, so routine update <name> --file manifest.yaml (no other flags) likely takes the no-op branch even though the manifest legitimately changed fields. Either snapshot/diff the routine before vs after the merge, or count merge-introduced changes.


🟡 Medium

M1 — Credential ignores tenant for guest / multi-tenant users (internal/cmd/routine_helpers.go)

newRoutineClient constructs azidentity.NewAzureDeveloperCLICredential(&AzureDeveloperCLICredentialOptions{}) with no TenantID. Per .github/instructions/extensions.instructions.md, extensions must use the user-access tenant (Subscription.UserTenantId, not the resource tenant). For multi-tenant or guest users the default tenant from azd auth may not match the Foundry project's tenant, causing AADSTS errors. Plumb a tenant hint (e.g., from azd context or by extracting from the endpoint/project metadata) into the credential options.

M2 — Pagination has no max-page safety cap (client.go, ListRoutines / ListRoutineRuns)

Both methods loop while NextLink/PageToken is non-empty. A misbehaving or compromised service that returns a self-referential or cycling token will cause an unbounded loop that consumes CPU and memory (each appended page grows the result slice). Add a defensive page-count cap (e.g., 1000) and return a clear "pagination exceeded N pages" error.

M3 — Unbounded io.ReadAll on response bodies (client.go, JSON decode helper)

The decode helper reads the entire response body into memory without io.LimitReader. A misbehaving server returning a multi-GB body will OOM the process. Wrap with a sane cap (e.g., io.LimitReader(body, 32<<20)) before decoding.

M4 — Token scope https://ai.azure.com/.default should be verified (client.go, NewClient)

The Bearer token scope is hardcoded. If the Foundry Routines data plane requires a project-specific or different audience (services.ai.azure.com, a regional audience, or a Foundry-specific resource ID), token acquisition will succeed but the server may reject with 401/403. Verify against the Foundry TypeSpec and add a code comment justifying the chosen scope.

M5 — assert.Error instead of require.Error in negative-path tests (routine_create_test.go)

Several "expect error" tests use assert.Error followed by further dereferences of the result. If the implementation regresses and returns no error, the test logs a failure but continues executing on a nil/zero value, potentially panicking or producing a confusing secondary failure that hides the real cause. Switch to require.Error.


🟢 Low

  • L1routine dispatch --async help text ("Suppress descriptive output; useful for scripting") is ambiguous. Clarify that it prints only the dispatch ID and exits without polling.
  • L2 — Manifest file-format detection falls back to JSON for empty/unknown extensions, and the error message recommends only JSON, leaving YAML users guessing. Mention both formats in the error.
  • L3routine run subcommand help doesn't mention that run show / run delete are intentionally deferred. Add a one-line note in Long.
  • L4printJSON / table writers write directly to os.Stdout, making future output-assertion tests impossible without monkey-patching. Accept an io.Writer parameter.

Notes (not findings)

  • This PR lives in an extension with its own go.mod, so core-azd patterns (IoC container, ActionDescriptor, ErrorWithSuggestion) do not apply — the extension uses cobra directly.
  • new(expr) is valid in Go 1.26+ (see cli/azd/AGENTS.md).
  • Adding DisallowUnknownFields to the JSON decoder is not recommended — it would break forward-compat with non-breaking Foundry API additions.

Generated review · supplements Copilot inline comments · posted as non-blocking COMMENT.

- endpoint: reject project endpoints with an explicit port so the
  normalized URL cannot silently strip a non-default port
- routine create: only set Enabled from --enabled when the user
  explicitly passes the flag, so a manifest's enabled value is
  honored; default to enabled=true if neither source provides one
- routine create: explicitly reject --trigger github-issue (deferred
  for v1) instead of producing an incomplete github_issue trigger
- routine_helpers: boolStr now returns "unknown" for a nil pointer
  to avoid displaying "true" when the field is absent from the
  service response
- routine_manifest: surface applyUpdateFlags user-input errors as
  exterrors.Validation (CodeInvalidParameter) for consistent CLI
  error shapes
Copy link
Copy Markdown
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: commit 1ff54876 (fix(routines): address PR review feedback)

Scope: one new commit since my previous review — 6 files, +82/-9.

✅ Copilot's pre-review findings — all addressed correctly

# Issue Fix
1 endpoint.go silently strips explicit ports Explicit port now rejected with structured error + 2 new tests
2 --enabled default masks manifest value Uses cmd.Flags().Changed("enabled"); default applied only after manifest merge
3 --trigger github-issue accepted but incomplete Explicit rejection in buildTrigger with deferred-v1 message
4 boolStr returns "true" for nil pointer Now returns "unknown"
5 applyUpdateFlags uses plain fmt.Errorf All 6 sites converted to exterrors.Validation(CodeInvalidParameter, …)

(Copilot's 6th finding — dead newRoutineCommand — was already resolved in earlier commit b7d375b8.)

❌ Still open from my prior review

These weren't in the scope of this commit, just noting so they don't get lost:

  • C1 — PR description still claims a dedicated :enable route and x-ms-foundry-features-opt-in header. The actual code does GET-then-PUT and sends Foundry-Features. Please reconcile description vs implementation (and verify the preview header name with the Foundry team — if the server expects x-ms-foundry-features-opt-in, opt-in is silently broken).
  • H1 — Lost-update race in setEnabled (no If-Match/ETag).
  • H2internal/pkg/routines/client.go still has zero tests.
  • H3routine update --file may falsely report "no changes" because file-merge doesn't increment the changed counter.
  • M1AzureDeveloperCLICredential constructed with no TenantID; will break for guest / multi-tenant users.

No new blocking issues introduced by this commit. Nice clean turnaround on the Copilot items.

@JeffreyCA JeffreyCA added the ext-agents azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensions label May 20, 2026
@huimiu huimiu marked this pull request as draft May 21, 2026 08:03
…ng extensions

Other AI extensions (projects, agents, toolboxes, inspector) ship a
.golangci.yaml lint config and an AGENTS.md contributor guide. Add both
to azure.ai.routines so it follows the same convention, and register
the project-specific �xterrors word in cspell.yaml.
@huimiu huimiu changed the title feat(routines): implement azd ai routine commands feat(routines): add azd ai routine commands May 22, 2026
huimiu added 3 commits May 22, 2026 19:25
The deployed Foundry Routines data plane applies a camelCase property
naming policy on the wire (e.g. `cronExpression`, `timeZone`,
`agentId`), even though the upstream TypeSpec / OpenAPI document still
emits snake_case. With snake_case JSON tags, `routine create` and
`update` always failed with errors like:

    triggers['default'].cronExpression must be provided for schedule routines
    exactly one of action.agentId or action.agentEndpointId must be provided

and routines read back from `show` / `list` would have empty
trigger/action fields because the camelCase wire payload did not
deserialize into snake_case-tagged Go fields.

Switch the JSON tags on `Routine`, `RoutineTrigger`, `RoutineAction`,
`RoutineRun`, `PagedRoutineRun`, and `DispatchRoutineResponse` to
camelCase so requests/responses round-trip cleanly against the deployed
service. YAML tags stay snake_case so user-facing `--file` manifests
keep the documented convention.

Verified against a live project endpoint: create/list/show now reach
the service correctly (residual `InternalServerError` from the
backend is unrelated and reproduces from raw curl with the same body).
Use azure-rest-api-specs PR #43186 (Foundry Routines TypeSpec) as the
single source of truth for the routines extension, applying every
spec change that does not break the currently deployed service, and
documenting each deliberate divergence inline and in AGENTS.md.

## Spec alignment

* `RoutineRun` and `DispatchRoutineResponse` gain the new `TaskID`
  field (wire `taskId`); the service already emits it. `dispatch`
  now prints `Task ID` after `Action Correlation ID`, and JSON
  output exposes the new field too.
* `RoutineTrigger` is restructured to match the spec's
  `GitHubIssueOpenedRoutineTrigger` shape: dropped `Owner` /
  `Actions[]`, added `Assignee`. The github trigger is still
  deferred at the CLI surface, so this is safe.
* Inline comments and a new AGENTS.md table call out each divergence
  the client deliberately keeps to stay compatible with the live
  service: camelCase wire naming (spec is snake_case), `agentId`
  field (spec renamed to `agent_name`), `:dispatchAsync` action
  segment (spec uses `:dispatch_async`), GET+PUT enable/disable
  fallback (spec adds dedicated routes which still 404),
  `value`/`nextLink` / `value`/`nextPageToken` paged shapes
  (spec uses `AgentsPagedResult<T>`), and `github_issue` wire
  value (spec renamed to `github_issue_opened`).

## CLI bug fix: HTTP/2 stream-reset hang

The pipeline now uses a custom `http.Client` with an explicit
`ResponseHeaderTimeout` (60s) and `TryTimeout` (30s), and azcore
retries are capped at 1. When the Foundry service returns an
HTTP/2 RST_STREAM (for example, the schedule-create
InternalServerError), the CLI now surfaces a `context deadline
exceeded` error within ~40 seconds instead of the previous ~6 minute
hang.

## Verified end-to-end against a live Foundry project

* timer create / show / list / update / disable / enable / dispatch
  (with `taskId` round-tripping) / run list / delete all succeed.
* schedule create still fails (service-side ISE) but now in under a
  minute instead of six.
The Foundry data plane currently returns `InternalServerError` for any
`PUT /routines/{name}` request whose trigger is `schedule` (the wire
value behind the CLI's `--trigger recurring`). The CLI side is fully
implemented and verified correct via raw curl, so this is a service-side
issue, but it leaves the `recurring` trigger non-functional end-to-end.

Take the recurring trigger off the public CLI surface so users do not
hit the service hang:

* Drop the `--cron` flag from `routine create` and `routine update`.
* `--trigger recurring` is now rejected with the same "deferred"
  shape as `--trigger github-issue`: a clear error pointing the user
  at `--trigger timer` and explaining that recurring is gated on the
  Foundry service.
* `--trigger` help text and validation messages list only `timer`.

The underlying wire model still carries `cron_expression` /
`time_zone` and the `schedule` discriminator so re-enabling the
trigger when the service is ready is just a CLI flag-wiring change.
Unit tests around buildTrigger and applyUpdateFlags are updated
accordingly.
@huimiu huimiu force-pushed the hui/add-routine-command branch from 5e76b31 to 0046e61 Compare May 22, 2026 13:39
…pt in delete, and action-type flag validation
@huimiu huimiu force-pushed the hui/add-routine-command branch from 0046e61 to 9659512 Compare May 22, 2026 13:42
@huimiu huimiu marked this pull request as ready for review May 22, 2026 13:49
@huimiu huimiu requested a review from RickWinter as a code owner May 22, 2026 13:49
Copy link
Copy Markdown
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of PR #8241 — feat(routines): implement azd ai routine commands

Reviewed since wbreza's May 20 review. Skipping items already flagged by prior reviews (endpoint port stripping, body.Enabled flag override, buildTrigger deferred github-issue alias, boolStr nil-pointer returning "true", applyUpdateFlags plain fmt.Errorf).

🔴 High — Security / correctness

1. internal/pkg/routines/client.go — Missing Logging.AllowedHeaders on policy.ClientOptions
Sibling azure.ai.agents restricts logged headers to MsCorrelationIdHeader only. The routines client omits the Logging field entirely. Azure SDK default logging will include the Authorization header (bearer token) in pipeline debug logs.
Fix: mirror agents' Logging: policy.LogOptions{AllowedHeaders: []string{azsdk.MsCorrelationIdHeader}}.

2. internal/cmd/routine_dispatch.go — Wrong op code on inner GetRoutine error
When fetching the routine to validate --input, the error is wrapped with exterrors.OpGetRoutine instead of exterrors.OpDispatchRoutine. Telemetry / user-facing error shows get_routine.* when the user ran dispatch. Note the adjacent IsNotFound branch already uses OpDispatchRoutine — the inconsistency is the smell.

3. internal/exterrors/errors.goIsCancellation misses gRPC codes.Canceled
Routines checks only errors.Is(err, context.Canceled). Agents also checks status.FromError(err).Code() == codes.Canceled. Cancellations bubbling up from gRPC azdext calls are misclassified as internal errors instead of user-driven.
Fix: port the agents implementation verbatim.

🟡 Medium — Test coverage & design

4. internal/pkg/routines/client.go (+390 LOC) — no unit tests
The data-plane client (CRUD, pagination, enable/disable GET+PUT fallback, dispatchAsync, list runs, custom HTTP/2 transport, same-origin pagination guard) has zero tests. PR description defers "unit tests as follow-up", but the manifest parser (low-risk) gets tests while the mission-critical client does not. Suggest at minimum httptest.NewServer-based tests for: 2xx happy paths, 404/409/5xx, nextLink pagination, validateSameOrigin rejection, context cancellation.

5. Command-layer coverage gaps
No tests for routine_update.go (complex flag→file→default merge), routine_delete, routine_enable, routine_disable, routine_dispatch, routine_list, routine_show, routine_helpers. Agents extension test ratio is ~1:1; this PR is ~1:2.2.

6. internal/pkg/routines/client.goruntime.NewResponseError(resp) leaks raw response body into errors
Used in ~5 spots. Service error bodies pass straight through. CLI commands re-wrap with exterrors.ServiceFromAzure, but the underlying message text is preserved and may surface in logs/output.
Fix: drain body, extract structured fields, return a controlled message — or document that callers must always wrap and never log the raw error.

7. internal/cmd/routine_manifest.goos.ReadFile errors all classified as CodeFileNotFound
Permission-denied, is-a-directory, etc. surface to the user as "routine manifest file not found" with misleading remediation.
Fix: gate with os.IsNotExist(err); fall back to a generic file-read error otherwise.

8. internal/exterrors/codes.go — No semantic "not found" code
404s become get_routine.404 rather than e.g. routine_not_found. Hurts telemetry classification and stable error-handling contracts.
Fix: add CodeRoutineNotFound and use it in the IsNotFound branches.

🟢 Low / nit

9. internal/cmd/endpoint_test.go — port edge cases under-tested
The recently-added port rejection covers :443 only. Add: non-default ports (:8443), IPv6+port (https://[::1]:443/...), whitespace-only input. The original bug evaded tests because they didn't exercise non-default ports.

10. internal/cmd/routine_update.go_ = cmd.Flags().MarkHidden(...) (×2)
Silent error discard. Safe in practice, but if err := ...; err != nil { ... } or a brief comment is cleaner.

11. AGENTS.md — "If this extension grows enough… introduce internal/exterrors"
Phrased as a future option, but the package exists in this PR. Rewrite in present tense so future maintainers don't treat it as optional.

12. go.modarmappservice/v2 v2.3.0 // indirect added
Indirect, likely a cascade from an azd-core dep bump. Worth a quick confirmation that this isn't pulling unwanted weight on the extension.

✅ Verified clean (no action)

  • URL construction (url.PathEscape / url.QueryEscape)
  • Pagination validateSameOrigin
  • HTTP/2 timeout fix (explicit ResponseHeaderTimeout, TLSHandshakeTimeout) is sound, not masking
  • Wire format alignment (camelCase tags, agentId, :dispatchAsync, GET+PUT enable/disable fallback) matches deployed Foundry service per AGENTS.md
  • Bearer token scope https://ai.azure.com/.default
  • Endpoint validation (HTTPS-only, port rejection, .services.ai.azure.com allowlist)
  • exterrors duplication with agents / toolboxes is acknowledged in AGENTS.md as the established cross-extension pattern — not flagged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ext-agents azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensions

Projects

None yet

4 participants