Skip to content

[P1] ci-cd: retry path misses ACA state adoption after first failed provision attempt #115

@Cataldir

Description

@Cataldir

Problem

azd-deploy fails in a brand-new environment because retry attempt 2 tries to create ACA apps that already exist in Azure but are not present in Terraform state at that moment.

Type

  • bug
  • regression

Evidence

  • Workflow run: https://github.com/Azure-Samples/tutor/actions/runs/23124471384
  • Job: provision (67164738583), step Provision infrastructure failed.
  • Same run shows:
    • attempt 1 transient ACA errors (Operation expired), then retry
    • attempt 2 errors: already exists - to be managed via Terraform this resource needs to be imported into the State for multiple apps (avatar/chat/configuration/essays/evaluation/lms-gateway/questions/upskilling).
  • Earlier in same job, reconcile step logged several not found in Azure. Skipping state import. for those apps before attempt 1.

Reproduction

  1. Trigger .github/workflows/azd-deploy.yml with workflow_dispatch and azure_env_name_override=178dev (or any fresh dev env).
  2. Allow first Provision infrastructure attempt to hit transient ACA revision failures.
  3. Observe retry attempt where Terraform reports already exists on ACA apps.

Acceptance Criteria

  • Before each provision retry, the workflow deterministically imports any ACA app that exists in Azure and is missing from state.
  • State-adoption check includes bounded polling to handle ARM eventual consistency after failed attempt 1.
  • Retry attempt 2 no longer fails with already exists for ACA backend apps in fresh env scenarios.
  • Add a step summary block that reports imported/skipped apps per retry for diagnostics.

Suggested Owner

platform-quality

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions