DonkeyWork CodeSandbox

A unified monorepo containing both the Manager API (Kata container orchestration) and Executor API (sandboxed code execution with Python, Node.js, and bash support).

Components

Manager API (src/DonkeyWork.CodeSandbox.Manager): REST API service for managing Kata containers in a Kubernetes cluster
Executor API (src/DonkeyWork.CodeSandbox.Server): HTTP+SSE server for executing commands inside sandboxed containers
Shared Contracts (src/DonkeyWork.CodeSandbox.Contracts): Common models and contracts
Client Library (src/DonkeyWork.CodeSandbox.Client): .NET client for consuming the Executor API

Features

Create Kata Containers: Dynamically create VM-isolated containers with custom configuration
Warm Pool Management: Maintains a pool of pre-warmed containers for instant allocation
High Availability: Stateless managers with Kubernetes-native leader election
Automatic Cleanup: Idle and expired containers are automatically cleaned up
Container Limits: Configurable maximum container limit prevents resource exhaustion
List Containers: Retrieve all Kata containers with their status and metadata
Get Container Details: Fetch detailed information about a specific container
Delete Containers: Remove containers and terminate their associated VMs
Health Checks: Built-in health check endpoint for monitoring
OpenAPI Documentation: Interactive API documentation via Scalar

High Availability Architecture

The CodeSandbox Manager is designed for high availability with no external dependencies:

Stateless Managers: Multiple manager instances can run simultaneously with no shared state
Kubernetes as Source of Truth: All container state (timestamps, allocation status) is stored as Kubernetes annotations and labels
Leader Election: Uses Kubernetes Lease objects for leader election - only the leader performs pool backfill operations
Optimistic Locking: Container allocation uses Kubernetes resourceVersion for conflict detection and automatic retry
Distributed Monitoring: All instances monitor and cleanup containers independently

Leader Election

flowchart TB
    subgraph Managers["Manager Instances"]
        M1["Manager 1<br/>(Leader)"]
        M2["Manager 2<br/>(Follower)"]
        M3["Manager 3<br/>(Follower)"]
    end

    subgraph K8s["Kubernetes"]
        Lease["Lease Object<br/>pool-backfill-leader"]
        Pods["Pod Resources<br/>(annotations/labels)"]
    end

    M1 -->|"Renews lease"| Lease
    M1 -->|"Backfills pool"| Pods
    M2 -.->|"Monitors lease"| Lease
    M3 -.->|"Monitors lease"| Lease
    M2 -->|"Allocates/Cleanup"| Pods
    M3 -->|"Allocates/Cleanup"| Pods

Key behaviors:

Only the leader creates new warm pool containers (prevents duplicate creation)
All managers can allocate containers and handle cleanup (distributed workload)
If the leader fails, another manager automatically takes over within 15 seconds
State survives manager restarts - Kubernetes annotations are the source of truth

Architecture

Framework: ASP.NET Core 10.0 (Minimal APIs)
Kubernetes Client: Official Kubernetes C# client (v18.0.13)
Logging: Serilog with structured logging
Configuration: IOptions with data validation
Container Runtime: Kata Containers (kata-qemu)

System Overview

flowchart TB
    subgraph Client["Client Application"]
        CA[API Consumer]
    end

    subgraph Manager["Manager API :8668"]
        ME["/api/kata endpoints"]
        KCS[KataContainerService]
    end

    subgraph K8s["Kubernetes Cluster"]
        API[Kubernetes API Server]
        subgraph NS["sandbox-containers namespace"]
            subgraph KP1["Kata Pod 1"]
                VM1["Kata VM"]
                EX1["Executor API :8666"]
            end
            subgraph KP2["Kata Pod 2"]
                VM2["Kata VM"]
                EX2["Executor API :8666"]
            end
        end
    end

    CA -->|"REST/SSE"| ME
    ME --> KCS
    KCS -->|"Create/List/Delete Pods"| API
    API --> KP1
    API --> KP2
    KCS -->|"Execute Commands"| EX1
    KCS -->|"Execute Commands"| EX2

Container Creation Flow

sequenceDiagram
    participant Client
    participant Manager as Manager API
    participant K8s as Kubernetes API
    participant Kata as Kata Pod
    participant Executor as Executor API

    Client->>Manager: POST /api/kata
    Manager->>Manager: Generate unique pod name
    Manager->>K8s: Create Pod (kata-qemu runtime)
    K8s-->>Manager: Pod created (Pending)
    Manager-->>Client: SSE: created event

    loop Wait for Pod Ready
        Manager->>K8s: Get Pod status
        K8s-->>Manager: Pod status
        Manager-->>Client: SSE: waiting event
    end

    K8s->>Kata: Start Kata VM
    Kata->>Kata: Boot VM + Start Executor API

    loop Health Check Executor API
        Manager->>Executor: GET /healthz
        alt Healthy
            Executor-->>Manager: 200 OK
            Manager-->>Client: SSE: healthcheck (healthy)
        else Not Ready
            Executor-->>Manager: Connection refused / timeout
            Manager-->>Client: SSE: healthcheck (unhealthy)
            Manager-->>Client: SSE: waiting event
        end
    end

    Manager-->>Client: SSE: ready event

Command Execution Flow

sequenceDiagram
    participant Client
    participant Manager as Manager API
    participant Executor as Executor API (in Kata Pod)
    participant Process as Bash Process

    Client->>Manager: POST /api/kata/{sandboxId}/execute
    Manager->>Manager: Lookup Pod IP
    Manager->>Executor: POST /api/execute (SSE)

    Executor->>Process: Spawn /bin/bash -c "command"

    loop Stream Output
        Process-->>Executor: stdout/stderr
        Executor-->>Manager: SSE: OutputEvent
        Manager-->>Client: SSE: OutputEvent
    end

    Process-->>Executor: Exit code
    Executor-->>Manager: SSE: CompletedEvent
    Manager-->>Client: SSE: CompletedEvent

Component Interaction

flowchart LR
    subgraph Manager["Manager API"]
        direction TB
        EP[Endpoints]
        SVC[KataContainerService]
        CFG[Configuration]
    end

    subgraph Executor["Executor API"]
        direction TB
        CTRL[ExecutionController]
        MP[ManagedProcess]
    end

    subgraph Shared["Contracts"]
        EE[ExecutionEvent]
        OE[OutputEvent]
        CE[CompletedEvent]
    end

    EP --> SVC
    SVC --> CFG
    SVC -.->|HTTP/SSE| CTRL
    CTRL --> MP
    MP --> OE
    MP --> CE
    OE --> EE
    CE --> EE

Prerequisites

Kubernetes Cluster: k3s v1.33.5+ with Kata Containers enabled
Namespace: sandbox-containers namespace must exist
RBAC: ServiceAccount with appropriate permissions (see k8s/ folder)
Runtime Class: kata-qemu RuntimeClass configured in the cluster

Configuration

The service is configured via appsettings.json:

{
  "KataContainerManager": {
    "TargetNamespace": "sandbox-containers",
    "RuntimeClassName": "kata-qemu",
    "DefaultResourceRequests": {
      "MemoryMi": 128,
      "CpuMillicores": 250
    },
    "DefaultResourceLimits": {
      "MemoryMi": 512,
      "CpuMillicores": 1000
    },
    "PodNamePrefix": "kata-sandbox",
    "CleanupCompletedPods": true,
    "PodReadyTimeoutSeconds": 90,
    "IdleTimeoutMinutes": 5,
    "MaxContainerLifetimeMinutes": 15,
    "MaxTotalContainers": 50,
    "WarmPoolSize": 10,
    "PoolBackfillCheckIntervalSeconds": 30,
    "LeaderLeaseDurationSeconds": 15
  }
}

Configuration Options

Option	Default	Range	Description
`TargetNamespace`	`sandbox-containers`	-	Kubernetes namespace for containers
`RuntimeClassName`	`kata-qemu`	-	Runtime class for Kata isolation
`PodNamePrefix`	`kata-sandbox`	-	Prefix for generated pod names
`PodReadyTimeoutSeconds`	`90`	30-300	Timeout waiting for pods to become ready
`IdleTimeoutMinutes`	`5`	1-1440	Delete allocated containers after this idle time
`MaxContainerLifetimeMinutes`	`15`	1-1440	Maximum lifetime for any allocated container
`MaxTotalContainers`	`50`	1-500	Maximum total containers (warm + allocated + manual)
`WarmPoolSize`	`10`	0-100	Target number of pre-warmed containers
`PoolBackfillCheckIntervalSeconds`	`30`	10-300	How often to check and backfill the pool
`LeaderLeaseDurationSeconds`	`15`	5-60	Leader election lease duration
`CleanupCheckIntervalMinutes`	`1`	1-60	How often to check for idle/expired containers

API Endpoints

All endpoints are prefixed with /api/kata to support future multi-runtime capabilities (Kata, gVisor, etc.).

POST /api/kata

Create a new Kata container. The container image is fixed to the configured default executor image for security.

Request Body:

{
  "labels": {
    "environment": "sandbox",
    "project": "test"
  },
  "environmentVariables": {
    "KEY": "value"
  },
  "resources": {
    "requests": {
      "memoryMi": 256,
      "cpuMillicores": 500
    },
    "limits": {
      "memoryMi": 1024,
      "cpuMillicores": 2000
    }
  },
  "waitForReady": true
}

Response: Server-Sent Events (SSE) stream with creation progress:

data: {"eventType":"created","podName":"kata-sandbox-a1b2c3d4","phase":"Pending"}

data: {"eventType":"waiting","podName":"kata-sandbox-a1b2c3d4","attemptNumber":1,"phase":"Pending","message":"Waiting for pod to be ready"}

data: {"eventType":"ready","podName":"kata-sandbox-a1b2c3d4","containerInfo":{...},"elapsedSeconds":15.2}

GET /api/kata

List all Kata containers.

Response: 200 OK

[
  {
    "name": "kata-sandbox-a1b2c3d4",
    "phase": "Running",
    "isReady": true,
    "createdAt": "2026-01-13T10:30:00Z",
    "nodeName": "office1",
    "podIP": "10.42.1.15",
    "labels": {
      "app": "kata-manager",
      "runtime": "kata"
    },
    "image": "nginx:alpine"
  }
]

GET /api/kata/{podName}

Get details of a specific container.

Response: 200 OK (same structure as list response)

DELETE /api/kata/{podName}

Delete a Kata container.

Response: 200 OK

{
  "success": true,
  "message": "Container kata-sandbox-a1b2c3d4 deleted successfully",
  "podName": "kata-sandbox-a1b2c3d4"
}

GET /healthz

Health check endpoint.

Response: 200 OK (Healthy) or 503 Service Unavailable (Unhealthy)

Development

Project Structure

DonkeyWork-CodeSandbox-Manager/
├── src/
│   ├── DonkeyWork.CodeSandbox.Manager/      # Manager API (container orchestration)
│   │   ├── Configuration/
│   │   │   └── KataContainerManager.cs      # Configuration models with validation
│   │   ├── Endpoints/
│   │   │   └── KataContainerEndpoints.cs    # Minimal API endpoints (/api/kata)
│   │   ├── Models/
│   │   │   ├── CreateContainerRequest.cs    # Request DTOs
│   │   │   ├── KataContainerInfo.cs         # Response DTOs
│   │   │   └── DeleteContainerResponse.cs
│   │   ├── Services/
│   │   │   ├── IKataContainerService.cs     # Service interface
│   │   │   └── KataContainerService.cs      # Kubernetes operations
│   │   ├── Program.cs                       # Application entry point
│   │   └── appsettings.json                 # Configuration
│   ├── DonkeyWork.CodeSandbox.Server/       # Executor API (code execution)
│   │   ├── Controllers/
│   │   │   └── ExecutionController.cs       # /api/execute endpoint
│   │   ├── Services/
│   │   │   └── ManagedProcess.cs            # Process management with streaming
│   │   └── Program.cs
│   ├── DonkeyWork.CodeSandbox.Contracts/    # Shared models
│   │   ├── Events/
│   │   │   └── ExecutionEvent.cs            # OutputEvent, CompletedEvent
│   │   └── Requests/
│   │       └── ExecuteCommand.cs
│   └── DonkeyWork.CodeSandbox.Client/       # .NET client library
├── test/
│   ├── DonkeyWork.CodeSandbox.Manager.Tests/
│   └── DonkeyWork.CodeSandbox.Server.IntegrationTests/
├── Dockerfile                               # Manager API container
├── docker-compose.yml                       # Local development setup
└── .github/workflows/
    ├── pr-build-test.yml                    # PR validation workflow
    └── release.yml                          # Release automation workflow

Key Design Decisions

Minimal APIs: Uses ASP.NET Core minimal APIs for simpler, more performant endpoints
IOptions Pattern: Configuration is validated at startup using data annotations
In-Cluster Auth: Automatically uses ServiceAccount tokens when running in Kubernetes
Scoped Services: KataContainerService is scoped to match request lifetime
Structured Logging: Serilog provides structured logging with context

Troubleshooting

Container fails to create

Verify the image exists and is accessible
Check that the sandbox-containers namespace exists
Ensure the RuntimeClass kata-qemu is configured
Check RBAC permissions for the ServiceAccount

Permission denied errors

Verify Role and RoleBinding are correctly applied
Ensure ServiceAccount is attached to the pod
Check that the service can reach the Kubernetes API server

Pods stuck in Pending

Check cluster node capacity
Verify Kata is installed on nodes
Check pod events: kubectl describe pod <pod-name> -n sandbox-containers

Health check failures

Verify the application is running: kubectl logs <pod-name>
Check if the service can connect to Kubernetes API
Review configuration validation errors in logs

Security Considerations

Fixed Container Image: Only the configured default executor image can be used (no arbitrary images)
Least Privilege: Role only grants permissions in sandbox-containers namespace
Non-Root Container: Dockerfile creates and runs as non-root user
Resource Limits: All containers have resource limits to prevent exhaustion
VM Isolation: Kata containers provide hardware-level isolation

Performance

Pod Creation: 12-25 seconds (includes VM boot time)
Overhead: +160Mi RAM, +250m CPU per Kata container
Recommended Rate: 5-10 Kata pods per minute
Cluster Capacity: 4 nodes, each supporting multiple Kata VMs

References

Kata Containers - Official Kata documentation
Kubernetes C# Client - Client library
k3s Documentation - k3s cluster documentation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github/workflows		.github/workflows
frontend		frontend
src		src
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
Directory.Packages.props		Directory.Packages.props
Dockerfile		Dockerfile
DonkeyWork.CodeSandbox.sln		DonkeyWork.CodeSandbox.sln
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

andyjmorgan/DonkeyWork-CodeSandbox-Manager

Folders and files

Latest commit

History

Repository files navigation

DonkeyWork CodeSandbox

Components

Features

High Availability Architecture

Leader Election

Architecture

System Overview

Container Creation Flow

Command Execution Flow

Component Interaction

Prerequisites

Configuration

Configuration Options

API Endpoints

POST /api/kata

GET /api/kata

GET /api/kata/{podName}

DELETE /api/kata/{podName}

GET /healthz

Development

Project Structure

Key Design Decisions

Troubleshooting

Container fails to create

Permission denied errors

Pods stuck in Pending

Health check failures

Security Considerations

Performance

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 38

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages