Follow-up from PR #88 — surfaced when discussing the log group rename / migration story for future production adopters; ABCA today has no production users so the migration scenario doesn't apply, but the underlying log-retention policy is worth setting now while it's cheap.
Functional description
ABCA's CDK constructs create CloudWatch log groups with default retention (NEVER) and default removal policy (DESTROY). That combination has two failure modes that show up only when ABCA gets used past its current reference-application stage:
-
cdk destroy deletes log groups along with the stack. For dev iteration, fine — short-lived stacks, throwaway logs. For a future production adopter, this means the first cdk destroy (intentional or accidental) silently deletes every log of every agent run that ever happened. Forensic data, security audit trail, "what did the agent run last week?" — all gone.
-
No retention cap. Log groups accumulate indefinitely. CloudWatch charges per GB stored. Long-running ABCA stacks eventually pay for years of agent stdout that nobody will ever read.
Both fixes are 1-line changes per construct. Both should land before the first production adopter goes live, not after — because retrofitting changes how live log groups behave (potentially destroying data) and creates the exact "circular migration problem" we want to avoid.
ABCA is currently a reference application — there are no production adopters today, so no immediate user-visible problem. Filing this as a "ship before the first prod use" issue so it doesn't decay.
Why this is its own issue, not a code change:
- Ship-it-now is tempting (it's small) but the audit needs to be deliberate: every
LogGroup construct, every implicit log group from aws-lambda constructs, every AgentCore-managed log group needs to be considered separately. Some have policy reasons to NOT retain (e.g. ephemeral CI logs).
- Open as an issue so the decision conversation happens publicly and the choices are documented in the issue thread for future maintainers.
Technical context
Current state (audit needed for a complete list):
cdk/src/constructs/task-orchestrator.ts — Lambda log group, default DESTROY + NEVER.
cdk/src/constructs/fanout-consumer.ts — Lambda log group, default DESTROY + NEVER.
cdk/src/constructs/approval-metrics-publisher-consumer.ts — Lambda log group, default DESTROY + NEVER.
cdk/src/handlers/* — every Lambda construct has an implicit log group.
cdk/src/stacks/agent.ts — AgentCore Runtime log group (managed by AgentCore service, may not be directly settable via CDK).
cdk/src/constructs/agent-vpc.ts — VPC Flow Logs may have their own log group.
The two changes per construct:
// Before:
new lambda.Function(this, 'Fn', { ... });
// After:
const fn = new lambda.Function(this, 'Fn', {
logRetention: logs.RetentionDays.ONE_MONTH, // or longer for audit-relevant
...
});
// Or for explicit log groups:
const lg = new logs.LogGroup(this, 'Lg', {
retention: logs.RetentionDays.ONE_MONTH,
removalPolicy: RemovalPolicy.RETAIN,
});
Retention period choice:
- 30 days — minimum for security incident response (most pen-test reports require ≥30 days of access logs).
- 90 days — common compliance baseline.
- 365 days — if the team treats agent runs as audit-relevant.
Recommend 30 days as the ABCA reference default, document in AGENTS.md / CLAUDE.md that adopters should override per their compliance requirements.
Removal policy choice:
RETAIN — log groups survive cdk destroy. Operator manually cleans up if they want zero residue.
DESTROY — log groups vanish with the stack. Cleanup is automatic; data is lost.
Recommend RETAIN for ABCA reference. The cost of orphans is "operator runs aws logs delete-log-group once per stack" — bounded and recoverable. The cost of unintended DESTROY is unbounded data loss.
Why this issue and not just a PR:
- The retention period is a policy choice that should be discussed publicly.
- Some constructs may have reasons to opt out (CI-only log groups, smoke-test scaffolding).
- Future adopters reading the codebase should be able to find the rationale.
Proposed approach
- Audit phase (~30 min): identify every log group resource in
cdk/src/. Output: a table in this issue thread of (construct, current state, proposed state, rationale).
- Decision (in-issue discussion): confirm the retention default (30/90/365 days) and the removal policy (RETAIN vs DESTROY per construct).
- Implementation (~30 min): make the changes per the audit table. Tests stay green (CDK construct tests don't pin retention by default, but worth adding
Match.objectLike({ RetentionInDays: ... }) assertions for the major constructs to prevent silent regressions).
- Documentation (~15 min): add a "Log retention policy" section to AGENTS.md / CLAUDE.md explaining the chosen defaults and how to override.
Acceptance criteria
Out of scope
- Migrating already-deployed stacks to the new policy (not applicable today; ABCA is a reference application).
- Cross-region log replication.
- Encryption-at-rest configuration on log groups (separate security hardening issue if appetite).
- Defining a CI sweep that auto-deletes orphaned RETAIN groups across test accounts (operational tooling; separate concern).
References
Functional description
ABCA's CDK constructs create CloudWatch log groups with default retention (NEVER) and default removal policy (DESTROY). That combination has two failure modes that show up only when ABCA gets used past its current reference-application stage:
cdk destroydeletes log groups along with the stack. For dev iteration, fine — short-lived stacks, throwaway logs. For a future production adopter, this means the firstcdk destroy(intentional or accidental) silently deletes every log of every agent run that ever happened. Forensic data, security audit trail, "what did the agent run last week?" — all gone.No retention cap. Log groups accumulate indefinitely. CloudWatch charges per GB stored. Long-running ABCA stacks eventually pay for years of agent stdout that nobody will ever read.
Both fixes are 1-line changes per construct. Both should land before the first production adopter goes live, not after — because retrofitting changes how live log groups behave (potentially destroying data) and creates the exact "circular migration problem" we want to avoid.
ABCA is currently a reference application — there are no production adopters today, so no immediate user-visible problem. Filing this as a "ship before the first prod use" issue so it doesn't decay.
Why this is its own issue, not a code change:
LogGroupconstruct, every implicit log group fromaws-lambdaconstructs, every AgentCore-managed log group needs to be considered separately. Some have policy reasons to NOT retain (e.g. ephemeral CI logs).Technical context
Current state (audit needed for a complete list):
cdk/src/constructs/task-orchestrator.ts— Lambda log group, default DESTROY + NEVER.cdk/src/constructs/fanout-consumer.ts— Lambda log group, default DESTROY + NEVER.cdk/src/constructs/approval-metrics-publisher-consumer.ts— Lambda log group, default DESTROY + NEVER.cdk/src/handlers/*— every Lambda construct has an implicit log group.cdk/src/stacks/agent.ts— AgentCore Runtime log group (managed by AgentCore service, may not be directly settable via CDK).cdk/src/constructs/agent-vpc.ts— VPC Flow Logs may have their own log group.The two changes per construct:
Retention period choice:
Recommend 30 days as the ABCA reference default, document in AGENTS.md / CLAUDE.md that adopters should override per their compliance requirements.
Removal policy choice:
RETAIN— log groups survivecdk destroy. Operator manually cleans up if they want zero residue.DESTROY— log groups vanish with the stack. Cleanup is automatic; data is lost.Recommend
RETAINfor ABCA reference. The cost of orphans is "operator runsaws logs delete-log-grouponce per stack" — bounded and recoverable. The cost of unintendedDESTROYis unbounded data loss.Why this issue and not just a PR:
Proposed approach
cdk/src/. Output: a table in this issue thread of(construct, current state, proposed state, rationale).Match.objectLike({ RetentionInDays: ... })assertions for the major constructs to prevent silent regressions).Acceptance criteria
retentionset (no defaults to NEVER without rationale)removalPolicyset (default RETAIN unless rationale)Match.objectLike({ RetentionInDays: <value> })Out of scope
References
cdk/src/constructs/(every Lambda-bearing construct)logs.LogGroupreference: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_logs.LogGroup.htmlRemovalPolicyreference: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.RemovalPolicy.html