Skip to content

Feature: Integrate AKW multi-agent capabilities into ABCA #99

@harmjeff

Description

@harmjeff

Component

None

Describe the feature

Feature: Integrate AKW multi-agent capabilities into ABCA

Summary

Extend ABCA from a single-purpose coding agent platform into a general-purpose autonomous agent platform by integrating the AKW (Autonomous Knowledge Work) subsystems. The result is a single AWS-hosted platform that runs both coding tasks (git, PRs, build/lint) and knowledge-work tasks (research, document generation, email triage, etc.) on shared infrastructure, with a shared memory system, trust model, and blueprint registry.

Motivation

ABCA today is hardwired for coding tasks: every task requires a GitHub repo, clones it, runs an agent against it, and opens a PR. This makes the platform unusable for tasks that have no repo (research, document drafting, data analysis). AKW solves exactly this problem with a blueprint-driven, task-mode-agnostic agent loop, but lacks ABCA's production-grade AWS infrastructure (durable orchestration, AgentCore compute isolation, Cognito auth, Cedar policy enforcement).

Merging the two gives each what it lacks:

  • AKW gets cloud-scale durable execution, AgentCore MicroVM isolation, and Bedrock Guardrail screening
  • ABCA gets blueprint-driven extensibility, semantic long-term memory, risk-aware admission, in-execution human-in-the-loop (HITL), self-extending tool generation, and support for non-coding task domains

Proposed changes

1. task_mode field — decouple coding from knowledge tasks

Add a task_mode: 'coding' | 'knowledge' field to blueprints and TaskRecord. Coding tasks follow the existing path (repo clone, GitHub context hydration, build/lint post-hooks). Knowledge tasks skip all git scaffolding and run the agent directly against instructions + memory context.

Touches: cdk/src/handlers/shared/types.ts, agent/src/config.py, agent/src/pipeline.py, cdk/src/constructs/blueprint.ts


2. Blueprint registry

Replace hardcoded system prompts with a FilesystemRegistryService backed by YAML blueprint files. Each blueprint declares its task type, system prompt, tool set, execution phases, HITL conditions, and parameters. The registry resolves which blueprint to load at runtime.

Add an initial set of blueprints covering coding (new_task, pr_iteration, pr_review) and knowledge (web_research, pubmed, document_draft, email_triage) task types.

New files: agent/blueprints/, agent/src/registry/


3. Mem0 long-term memory backend

Add a Mem0LTM backend that runs alongside the existing AgentCore Memory. Mem0 provides semantic search (via embeddings), contradiction detection before writes, and a memory lifecycle engine (decay + consolidation). AgentCore Memory remains authoritative for episodic task history; Mem0 stores tool knowledge, repo learnings, and cross-task semantic facts.

Deploy Mem0 + Qdrant as an ECS Fargate service (MemoBacked CDK construct) in the agent VPC, reachable at mem0.agent-services:8001 via Cloud Map.

New files: agent/src/backends/ltm/mem0.py, cdk/src/constructs/mem0-backend.ts, agent/mem0/


4. Blueprint phase tracking and PatternEvaluator (HITL)

Add BlueprintTracker to track which execution phase the agent is in and PatternEvaluator to evaluate HITL conditions between turns. When a pattern fires (e.g. conflicting evidence detected, low corpus quality, scope violation), the agent pauses at AWAITING_APPROVAL, writes a PendingApproval record, and waits for human input before resuming.

New files: agent/src/blueprint_tracker.py, agent/src/pattern_evaluator.py
New status: AWAITING_APPROVAL added to TaskStatusType with transitions RUNNING → AWAITING_APPROVAL → RUNNING


5. Risk-aware pre-flight pipeline

Add a 4-stage pre-flight Lambda (readiness check → context hydration → risk assessment → admission policy) invoked as a durable step in the orchestrator. Pre-flight writes pre_flight_decision (ADMIT | ADMIT_WITH_HITL | DEFER | REJECT) and risk_tier (LOW | MEDIUM | HIGH | CRITICAL) to TaskRecord before the agent session starts.

New files: agent/src/preflight/, cdk/src/constructs/preflight-lambda.ts
Orchestrator change: new invoke-preflight durable step between admission-control and hydrate-context


6. SandboxManager + ECS sidecar for tool execution

Add SandboxManager to execute dynamically generated tool code in isolated ECS task containers (network-none, read-only filesystem, tmpfs /tmp). SecretManager scopes secrets per tool ID under /abca/tools/{tool_id}/. The ECS sidecar CDK construct provisions the execution environment in the agent VPC.

New files: agent/src/sandbox/, cdk/src/constructs/sandbox-sidecar.ts


7. ToolBuilderAgent and BlueprintBuilderAgent (meta-agents)

Add two new task types:

  • generate_toolToolBuilderAgent searches CapabilityIndex for existing tools, generates new tool code if none found, tests it in the sandbox, and promotes it to the registry
  • generate_blueprintBlueprintBuilderAgent generates a new YAML blueprint for an unknown task type and promotes it through the DRAFT → VALIDATED → PRODUCTION pipeline

These agents enable the platform to extend itself without a code deploy.

New files: agent/src/agents/tool_builder/, agent/src/agents/blueprint_builder/


8. CapabilityIndex — semantic tool search

Add CapabilityIndex backed by Mem0LTM. When ToolBuilderAgent is asked to find or generate a tool, it first searches the index semantically before generating new code. Registered tools are stored with embeddings so future searches find them by intent, not just name.

New files: agent/src/registry/capability_index.py


9. Trust & Graduation

Add TrustEventsTable (DynamoDB) and a TrustEmitter that records typed signals (TOOL_SUCCESS, TOOL_FAILURE, SCOPE_VIOLATION, HITL_TRIGGERED, TASK_COMPLETE, TASK_FAILED, etc.) on every significant agent event. AutonomyGraduationEngine accumulates net points per agent and promotes the autonomy level (restricted → supervised → autonomous) when thresholds are met, reducing HITL gate frequency for agents with strong track records.

New files: agent/src/trust/, cdk/src/constructs/trust-events-table.ts


10. HITL approve/reject API

Add POST /v1/tasks/{id}/approve and POST /v1/tasks/{id}/reject Lambda handlers backed by ApprovalsTable (DynamoDB). Add bgagent approve <task-id> [comment] and bgagent reject <task-id> [reason] CLI commands. The orchestrator poll loop does not count AWAITING_APPROVAL cycles against MAX_POLL_ATTEMPTS.

New files: cdk/src/handlers/approve-task.ts, cdk/src/handlers/reject-task.ts


Out of scope for this issue

  • DEFER sub-workflow — unknown task type currently returns FAILED; auto-spawning a BlueprintBuilderAgent child and resuming the original task requires a new durable sub-workflow
  • Per-blueprint HITL timeout (hitl_max_wait_sec) — orchestrator currently uses the global 8.5h poll limit
  • PR [RFC] feat(cli): Interactive TUI prototype #54 TUI wiring — interactive CLI TUI exists with mock data; real endpoint wiring is a separate task
  • LangGraph / LangSmith integration — deferred to a later phase

Acceptance criteria

  • bgagent submit --task-mode knowledge --task "summarise recent papers on RAG" completes without a repo argument
  • Submitting a task with an unregistered task type triggers BlueprintBuilderAgent (DEFER path), not a hard failure
  • A task that hits a HITL condition transitions to AWAITING_APPROVAL; bgagent approve <id> resumes it to completion
  • A second run of the same knowledge task shows non-empty memory_context from Mem0 in the trace
  • generate_tool task produces a tool registered in the registry and discoverable via search_capability_index
  • TrustEventsTable records a TASK_COMPLETE event after every successful task
  • All existing new_task, pr_iteration, pr_review task types continue to pass end-to-end

References

  • Detailed source-level comparison: ~/merge-streams/compare.md
  • Phase-by-phase plan and deployed environment details: ~/merge-streams/plan.md
  • Implementation branch: merge/akw-integration

Use case

Today, ABCA can autonomously write code, open pull requests, and iterate on them — but only if the work involves a GitHub repository. Any task that doesn't fit that mold (researching a topic, drafting a document, triaging emails, analyzing data) simply can't run on the platform. This feature removes that constraint: by integrating AKW's blueprint-driven agent loop and knowledge-work task types, the same cloud infrastructure that runs your coding agents can now run research agents, document agents, or any other autonomous workflow you can describe in a YAML blueprint — with the same security model, the same memory system, the same human-in-the-loop controls, and the same auditability. A team could submit a coding task and a literature review in the same CLI session, watch both run in parallel in isolated compute environments, approve or steer them mid-execution, and have the results accumulate in a shared long-term memory that makes every subsequent task smarter.

Proposed solution

No response

Other information

No response

Acknowledgements

  • I may be able to implement this feature
  • This might be a breaking change

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions