Skip to content

docs: document DNS Query Log Config replacement cascade upgrade path #269

@scottschreckengaust

Description

@scottschreckengaust

Problem

Stacks deployed before PR #222 (tag-exclusion fix) fail with UPDATE_FAILED on the first deploy after upgrading to a version containing that fix.

Error: InvalidRequest: Cannot create association — one already exists for this VPC on AWS::Route53Resolver::ResolverQueryLoggingConfigAssociation.

Root cause

CfnResolverQueryLoggingConfig is a create-only CloudFormation resource — ANY property change (including Tags) triggers a full replacement. Pre-fix stacks have github:sha and other tags applied to this resource. Although the new code uses excludeResourceTypes to prevent new tag applications, CloudFormation still generates a diff to remove the now-excluded tags from the existing resource during the update. This triggers the replacement cascade:

  1. Config is replaced → new physical resource ID
  2. Association detects ResolverQueryLogConfigId changed → triggers its own replacement
  3. CloudFormation attempts Create-before-Delete on the Association → Route53 Resolver rejects (one-association-per-VPC constraint) → InvalidRequest

Key insight: CDK's Tags.of(scope).add(..., { excludeResourceTypes }) prevents new tag applications but does NOT suppress tag removal from resources that already have them. The tag removal is still a property change on a create-only resource.

Who is affected

Proposed fix

Add a "Known deployment issues" section to docs/guides/DEPLOYMENT_GUIDE.md documenting two resolution paths.

Where to add

In docs/guides/DEPLOYMENT_GUIDE.md, insert a new ## Known deployment issues section between the existing "For administrators" subsection (line ~162) and the "## Related docs" section (line ~164).

Content to add

## Known deployment issues

### DNS Query Log Config replacement cascade (upgrading from pre-v0.5)

**Affects:** Stacks deployed *before* the tag-exclusion fix ([#222](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/pull/222)). Stacks created after this fix are not affected.

**Symptom:** `UPDATE_FAILED` on `AWS::Route53Resolver::ResolverQueryLoggingConfigAssociation` with error `InvalidRequest: Cannot create association — one already exists for this VPC`.

**Root cause:** The `ResolverQueryLoggingConfig` resource is *create-only* in CloudFormation — any property change (including Tags) triggers a full replacement. Pre-fix stacks have `github:sha` and other tags on this resource. Although the new code excludes it from future tag applications, CloudFormation still attempts to *remove* the now-excluded tags from the existing resource during the update, triggering the replacement cascade:

1. Config is replaced → new physical resource ID
2. Association detects `ResolverQueryLogConfigId` changed → triggers its own replacement
3. CloudFormation attempts Create-before-Delete on the association → Route53 Resolver rejects (one association per VPC) → `InvalidRequest`

**Resolution — choose one:**

#### Option A: Manual disassociation via AWS Console (recommended)

1. Open the [Route 53 Resolver console](https://console.aws.amazon.com/route53resolver/home#/query-logging)
2. Select the query logging configuration named `agent-dns-query-log`
3. Under **Associated VPCs**, disassociate the VPC
4. Delete the query logging configuration
5. Run `mise //cdk:deploy` (or `cdk deploy`) — CloudFormation will recreate both resources without tags

#### Option B: Two-phase deploy (comment-out / re-add)

1. In `cdk/src/stacks/agent.ts`, comment out the `DnsFirewall` construct instantiation (~line 197):
   ```typescript
   // new DnsFirewall(this, 'DnsFirewall', {
   //   vpc: agentVpc.vpc,
   //   additionalAllowedDomains: additionalDomains,
   //   observationMode: true,
   // });
  1. Deploy: mise //cdk:deploy — this deletes the query log config, association, firewall rules, and related resources
  2. Uncomment the DnsFirewall block
  3. Deploy again: mise //cdk:deploy — resources are recreated cleanly without tags

Option B is more disruptive (two deploys, brief DNS logging gap) but requires no console access.


## Acceptance criteria

- [ ] New section added to `docs/guides/DEPLOYMENT_GUIDE.md` between "For administrators" and "Related docs"
- [ ] Both resolution options (console disassociation and two-phase deploy) are documented
- [ ] Root cause explanation is clear enough for a customer to verify this is their issue
- [ ] `docs/scripts/sync-starlight.mjs` is run and Starlight mirrors committed alongside
- [ ] No code changes — documentation only

## References

- PR #222: https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/pull/222
- Issue #221: https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/issues/221
- Affected file: `cdk/src/main.ts` (lines 51–76, `excludeResourceTypes` array)
- Affected construct: `cdk/src/constructs/dns-firewall.ts` (lines 223–231, QueryLogConfig + Association)
- Construct instantiation: `cdk/src/stacks/agent.ts` (lines 197–201)

## Labels

`documentation`, `good first issue`

Metadata

Metadata

Assignees

Labels

P1Priority 1 — high priorityapprovedWhen an issue has been approved and readybugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions