You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature: Integrate AKW multi-agent capabilities into ABCA
Summary
Extend ABCA from a single-purpose coding agent platform into a general-purpose autonomous agent platform by integrating the AKW (Autonomous Knowledge Work) subsystems. The result is a single AWS-hosted platform that runs both coding tasks (git, PRs, build/lint) and knowledge-work tasks (research, document generation, email triage, etc.) on shared infrastructure, with a shared memory system, trust model, and blueprint registry.
Motivation
ABCA today is hardwired for coding tasks: every task requires a GitHub repo, clones it, runs an agent against it, and opens a PR. This makes the platform unusable for tasks that have no repo (research, document drafting, data analysis). AKW solves exactly this problem with a blueprint-driven, task-mode-agnostic agent loop, but lacks ABCA's production-grade AWS infrastructure (durable orchestration, AgentCore compute isolation, Cognito auth, Cedar policy enforcement).
ABCA gets blueprint-driven extensibility, semantic long-term memory, risk-aware admission, in-execution human-in-the-loop (HITL), self-extending tool generation, and support for non-coding task domains
Proposed changes
1. task_mode field — decouple coding from knowledge tasks
Add a task_mode: 'coding' | 'knowledge' field to blueprints and TaskRecord. Coding tasks follow the existing path (repo clone, GitHub context hydration, build/lint post-hooks). Knowledge tasks skip all git scaffolding and run the agent directly against instructions + memory context.
Replace hardcoded system prompts with a FilesystemRegistryService backed by YAML blueprint files. Each blueprint declares its task type, system prompt, tool set, execution phases, HITL conditions, and parameters. The registry resolves which blueprint to load at runtime.
Add an initial set of blueprints covering coding (new_task, pr_iteration, pr_review) and knowledge (web_research, pubmed, document_draft, email_triage) task types.
New files:agent/blueprints/, agent/src/registry/
3. Mem0 long-term memory backend
Add a Mem0LTM backend that runs alongside the existing AgentCore Memory. Mem0 provides semantic search (via embeddings), contradiction detection before writes, and a memory lifecycle engine (decay + consolidation). AgentCore Memory remains authoritative for episodic task history; Mem0 stores tool knowledge, repo learnings, and cross-task semantic facts.
Deploy Mem0 + Qdrant as an ECS Fargate service (MemoBacked CDK construct) in the agent VPC, reachable at mem0.agent-services:8001 via Cloud Map.
New files:agent/src/backends/ltm/mem0.py, cdk/src/constructs/mem0-backend.ts, agent/mem0/
4. Blueprint phase tracking and PatternEvaluator (HITL)
Add BlueprintTracker to track which execution phase the agent is in and PatternEvaluator to evaluate HITL conditions between turns. When a pattern fires (e.g. conflicting evidence detected, low corpus quality, scope violation), the agent pauses at AWAITING_APPROVAL, writes a PendingApproval record, and waits for human input before resuming.
New files:agent/src/blueprint_tracker.py, agent/src/pattern_evaluator.py New status:AWAITING_APPROVAL added to TaskStatusType with transitions RUNNING → AWAITING_APPROVAL → RUNNING
5. Risk-aware pre-flight pipeline
Add a 4-stage pre-flight Lambda (readiness check → context hydration → risk assessment → admission policy) invoked as a durable step in the orchestrator. Pre-flight writes pre_flight_decision (ADMIT | ADMIT_WITH_HITL | DEFER | REJECT) and risk_tier (LOW | MEDIUM | HIGH | CRITICAL) to TaskRecord before the agent session starts.
New files:agent/src/preflight/, cdk/src/constructs/preflight-lambda.ts Orchestrator change: new invoke-preflight durable step between admission-control and hydrate-context
6. SandboxManager + ECS sidecar for tool execution
Add SandboxManager to execute dynamically generated tool code in isolated ECS task containers (network-none, read-only filesystem, tmpfs /tmp). SecretManager scopes secrets per tool ID under /abca/tools/{tool_id}/. The ECS sidecar CDK construct provisions the execution environment in the agent VPC.
New files:agent/src/sandbox/, cdk/src/constructs/sandbox-sidecar.ts
7. ToolBuilderAgent and BlueprintBuilderAgent (meta-agents)
Add two new task types:
generate_tool — ToolBuilderAgent searches CapabilityIndex for existing tools, generates new tool code if none found, tests it in the sandbox, and promotes it to the registry
generate_blueprint — BlueprintBuilderAgent generates a new YAML blueprint for an unknown task type and promotes it through the DRAFT → VALIDATED → PRODUCTION pipeline
These agents enable the platform to extend itself without a code deploy.
New files:agent/src/agents/tool_builder/, agent/src/agents/blueprint_builder/
8. CapabilityIndex — semantic tool search
Add CapabilityIndex backed by Mem0LTM. When ToolBuilderAgent is asked to find or generate a tool, it first searches the index semantically before generating new code. Registered tools are stored with embeddings so future searches find them by intent, not just name.
New files:agent/src/registry/capability_index.py
9. Trust & Graduation
Add TrustEventsTable (DynamoDB) and a TrustEmitter that records typed signals (TOOL_SUCCESS, TOOL_FAILURE, SCOPE_VIOLATION, HITL_TRIGGERED, TASK_COMPLETE, TASK_FAILED, etc.) on every significant agent event. AutonomyGraduationEngine accumulates net points per agent and promotes the autonomy level (restricted → supervised → autonomous) when thresholds are met, reducing HITL gate frequency for agents with strong track records.
New files:agent/src/trust/, cdk/src/constructs/trust-events-table.ts
10. HITL approve/reject API
Add POST /v1/tasks/{id}/approve and POST /v1/tasks/{id}/reject Lambda handlers backed by ApprovalsTable (DynamoDB). Add bgagent approve <task-id> [comment] and bgagent reject <task-id> [reason] CLI commands. The orchestrator poll loop does not count AWAITING_APPROVAL cycles against MAX_POLL_ATTEMPTS.
New files:cdk/src/handlers/approve-task.ts, cdk/src/handlers/reject-task.ts
Out of scope for this issue
DEFER sub-workflow — unknown task type currently returns FAILED; auto-spawning a BlueprintBuilderAgent child and resuming the original task requires a new durable sub-workflow
Per-blueprint HITL timeout (hitl_max_wait_sec) — orchestrator currently uses the global 8.5h poll limit
Today, ABCA can autonomously write code, open pull requests, and iterate on them — but only if the work involves a GitHub repository. Any task that doesn't fit that mold (researching a topic, drafting a document, triaging emails, analyzing data) simply can't run on the platform. This feature removes that constraint: by integrating AKW's blueprint-driven agent loop and knowledge-work task types, the same cloud infrastructure that runs your coding agents can now run research agents, document agents, or any other autonomous workflow you can describe in a YAML blueprint — with the same security model, the same memory system, the same human-in-the-loop controls, and the same auditability. A team could submit a coding task and a literature review in the same CLI session, watch both run in parallel in isolated compute environments, approve or steer them mid-execution, and have the results accumulate in a shared long-term memory that makes every subsequent task smarter.
Component
None
Describe the feature
Feature: Integrate AKW multi-agent capabilities into ABCA
Summary
Extend ABCA from a single-purpose coding agent platform into a general-purpose autonomous agent platform by integrating the AKW (Autonomous Knowledge Work) subsystems. The result is a single AWS-hosted platform that runs both coding tasks (git, PRs, build/lint) and knowledge-work tasks (research, document generation, email triage, etc.) on shared infrastructure, with a shared memory system, trust model, and blueprint registry.
Motivation
ABCA today is hardwired for coding tasks: every task requires a GitHub repo, clones it, runs an agent against it, and opens a PR. This makes the platform unusable for tasks that have no repo (research, document drafting, data analysis). AKW solves exactly this problem with a blueprint-driven, task-mode-agnostic agent loop, but lacks ABCA's production-grade AWS infrastructure (durable orchestration, AgentCore compute isolation, Cognito auth, Cedar policy enforcement).
Merging the two gives each what it lacks:
Proposed changes
1.
task_modefield — decouple coding from knowledge tasksAdd a
task_mode: 'coding' | 'knowledge'field to blueprints andTaskRecord. Coding tasks follow the existing path (repo clone, GitHub context hydration, build/lint post-hooks). Knowledge tasks skip all git scaffolding and run the agent directly against instructions + memory context.Touches:
cdk/src/handlers/shared/types.ts,agent/src/config.py,agent/src/pipeline.py,cdk/src/constructs/blueprint.ts2. Blueprint registry
Replace hardcoded system prompts with a
FilesystemRegistryServicebacked by YAML blueprint files. Each blueprint declares its task type, system prompt, tool set, execution phases, HITL conditions, and parameters. The registry resolves which blueprint to load at runtime.Add an initial set of blueprints covering coding (
new_task,pr_iteration,pr_review) and knowledge (web_research,pubmed,document_draft,email_triage) task types.New files:
agent/blueprints/,agent/src/registry/3. Mem0 long-term memory backend
Add a
Mem0LTMbackend that runs alongside the existing AgentCore Memory. Mem0 provides semantic search (via embeddings), contradiction detection before writes, and a memory lifecycle engine (decay + consolidation). AgentCore Memory remains authoritative for episodic task history; Mem0 stores tool knowledge, repo learnings, and cross-task semantic facts.Deploy Mem0 + Qdrant as an ECS Fargate service (
MemoBackedCDK construct) in the agent VPC, reachable atmem0.agent-services:8001via Cloud Map.New files:
agent/src/backends/ltm/mem0.py,cdk/src/constructs/mem0-backend.ts,agent/mem0/4. Blueprint phase tracking and PatternEvaluator (HITL)
Add
BlueprintTrackerto track which execution phase the agent is in andPatternEvaluatorto evaluate HITL conditions between turns. When a pattern fires (e.g. conflicting evidence detected, low corpus quality, scope violation), the agent pauses atAWAITING_APPROVAL, writes aPendingApprovalrecord, and waits for human input before resuming.New files:
agent/src/blueprint_tracker.py,agent/src/pattern_evaluator.pyNew status:
AWAITING_APPROVALadded toTaskStatusTypewith transitionsRUNNING → AWAITING_APPROVAL → RUNNING5. Risk-aware pre-flight pipeline
Add a 4-stage pre-flight Lambda (readiness check → context hydration → risk assessment → admission policy) invoked as a durable step in the orchestrator. Pre-flight writes
pre_flight_decision(ADMIT | ADMIT_WITH_HITL | DEFER | REJECT) andrisk_tier(LOW | MEDIUM | HIGH | CRITICAL) toTaskRecordbefore the agent session starts.New files:
agent/src/preflight/,cdk/src/constructs/preflight-lambda.tsOrchestrator change: new
invoke-preflightdurable step betweenadmission-controlandhydrate-context6. SandboxManager + ECS sidecar for tool execution
Add
SandboxManagerto execute dynamically generated tool code in isolated ECS task containers (network-none, read-only filesystem, tmpfs /tmp).SecretManagerscopes secrets per tool ID under/abca/tools/{tool_id}/. The ECS sidecar CDK construct provisions the execution environment in the agent VPC.New files:
agent/src/sandbox/,cdk/src/constructs/sandbox-sidecar.ts7. ToolBuilderAgent and BlueprintBuilderAgent (meta-agents)
Add two new task types:
generate_tool—ToolBuilderAgentsearchesCapabilityIndexfor existing tools, generates new tool code if none found, tests it in the sandbox, and promotes it to the registrygenerate_blueprint—BlueprintBuilderAgentgenerates a new YAML blueprint for an unknown task type and promotes it through the DRAFT → VALIDATED → PRODUCTION pipelineThese agents enable the platform to extend itself without a code deploy.
New files:
agent/src/agents/tool_builder/,agent/src/agents/blueprint_builder/8. CapabilityIndex — semantic tool search
Add
CapabilityIndexbacked byMem0LTM. WhenToolBuilderAgentis asked to find or generate a tool, it first searches the index semantically before generating new code. Registered tools are stored with embeddings so future searches find them by intent, not just name.New files:
agent/src/registry/capability_index.py9. Trust & Graduation
Add
TrustEventsTable(DynamoDB) and aTrustEmitterthat records typed signals (TOOL_SUCCESS,TOOL_FAILURE,SCOPE_VIOLATION,HITL_TRIGGERED,TASK_COMPLETE,TASK_FAILED, etc.) on every significant agent event.AutonomyGraduationEngineaccumulates net points per agent and promotes the autonomy level (restricted → supervised → autonomous) when thresholds are met, reducing HITL gate frequency for agents with strong track records.New files:
agent/src/trust/,cdk/src/constructs/trust-events-table.ts10. HITL approve/reject API
Add
POST /v1/tasks/{id}/approveandPOST /v1/tasks/{id}/rejectLambda handlers backed byApprovalsTable(DynamoDB). Addbgagent approve <task-id> [comment]andbgagent reject <task-id> [reason]CLI commands. The orchestrator poll loop does not countAWAITING_APPROVALcycles againstMAX_POLL_ATTEMPTS.New files:
cdk/src/handlers/approve-task.ts,cdk/src/handlers/reject-task.tsOut of scope for this issue
FAILED; auto-spawning aBlueprintBuilderAgentchild and resuming the original task requires a new durable sub-workflowhitl_max_wait_sec) — orchestrator currently uses the global 8.5h poll limitAcceptance criteria
bgagent submit --task-mode knowledge --task "summarise recent papers on RAG"completes without a repo argumentBlueprintBuilderAgent(DEFER path), not a hard failureAWAITING_APPROVAL;bgagent approve <id>resumes it to completionmemory_contextfrom Mem0 in the tracegenerate_tooltask produces a tool registered in the registry and discoverable viasearch_capability_indexTrustEventsTablerecords aTASK_COMPLETEevent after every successful tasknew_task,pr_iteration,pr_reviewtask types continue to pass end-to-endReferences
~/merge-streams/compare.md~/merge-streams/plan.mdmerge/akw-integrationUse case
Today, ABCA can autonomously write code, open pull requests, and iterate on them — but only if the work involves a GitHub repository. Any task that doesn't fit that mold (researching a topic, drafting a document, triaging emails, analyzing data) simply can't run on the platform. This feature removes that constraint: by integrating AKW's blueprint-driven agent loop and knowledge-work task types, the same cloud infrastructure that runs your coding agents can now run research agents, document agents, or any other autonomous workflow you can describe in a YAML blueprint — with the same security model, the same memory system, the same human-in-the-loop controls, and the same auditability. A team could submit a coding task and a literature review in the same CLI session, watch both run in parallel in isolated compute environments, approve or steer them mid-execution, and have the results accumulate in a shared long-term memory that makes every subsequent task smarter.
Proposed solution
No response
Other information
No response
Acknowledgements