A unified monorepo containing both the Manager API (Kata container orchestration) and Executor API (sandboxed code execution with Python, Node.js, and bash support).
- Manager API (
src/DonkeyWork.CodeSandbox.Manager): REST API service for managing Kata containers in a Kubernetes cluster - Executor API (
src/DonkeyWork.CodeSandbox.Server): HTTP+SSE server for executing commands inside sandboxed containers - Shared Contracts (
src/DonkeyWork.CodeSandbox.Contracts): Common models and contracts - Client Library (
src/DonkeyWork.CodeSandbox.Client): .NET client for consuming the Executor API
- Create Kata Containers: Dynamically create VM-isolated containers with custom configuration
- Warm Pool Management: Maintains a pool of pre-warmed containers for instant allocation
- High Availability: Stateless managers with Kubernetes-native leader election
- Automatic Cleanup: Idle and expired containers are automatically cleaned up
- Container Limits: Configurable maximum container limit prevents resource exhaustion
- List Containers: Retrieve all Kata containers with their status and metadata
- Get Container Details: Fetch detailed information about a specific container
- Delete Containers: Remove containers and terminate their associated VMs
- Health Checks: Built-in health check endpoint for monitoring
- OpenAPI Documentation: Interactive API documentation via Scalar
The CodeSandbox Manager is designed for high availability with no external dependencies:
- Stateless Managers: Multiple manager instances can run simultaneously with no shared state
- Kubernetes as Source of Truth: All container state (timestamps, allocation status) is stored as Kubernetes annotations and labels
- Leader Election: Uses Kubernetes Lease objects for leader election - only the leader performs pool backfill operations
- Optimistic Locking: Container allocation uses Kubernetes resourceVersion for conflict detection and automatic retry
- Distributed Monitoring: All instances monitor and cleanup containers independently
flowchart TB
subgraph Managers["Manager Instances"]
M1["Manager 1<br/>(Leader)"]
M2["Manager 2<br/>(Follower)"]
M3["Manager 3<br/>(Follower)"]
end
subgraph K8s["Kubernetes"]
Lease["Lease Object<br/>pool-backfill-leader"]
Pods["Pod Resources<br/>(annotations/labels)"]
end
M1 -->|"Renews lease"| Lease
M1 -->|"Backfills pool"| Pods
M2 -.->|"Monitors lease"| Lease
M3 -.->|"Monitors lease"| Lease
M2 -->|"Allocates/Cleanup"| Pods
M3 -->|"Allocates/Cleanup"| Pods
Key behaviors:
- Only the leader creates new warm pool containers (prevents duplicate creation)
- All managers can allocate containers and handle cleanup (distributed workload)
- If the leader fails, another manager automatically takes over within 15 seconds
- State survives manager restarts - Kubernetes annotations are the source of truth
- Framework: ASP.NET Core 10.0 (Minimal APIs)
- Kubernetes Client: Official Kubernetes C# client (v18.0.13)
- Logging: Serilog with structured logging
- Configuration: IOptions with data validation
- Container Runtime: Kata Containers (kata-qemu)
flowchart TB
subgraph Client["Client Application"]
CA[API Consumer]
end
subgraph Manager["Manager API :8668"]
ME["/api/kata endpoints"]
KCS[KataContainerService]
end
subgraph K8s["Kubernetes Cluster"]
API[Kubernetes API Server]
subgraph NS["sandbox-containers namespace"]
subgraph KP1["Kata Pod 1"]
VM1["Kata VM"]
EX1["Executor API :8666"]
end
subgraph KP2["Kata Pod 2"]
VM2["Kata VM"]
EX2["Executor API :8666"]
end
end
end
CA -->|"REST/SSE"| ME
ME --> KCS
KCS -->|"Create/List/Delete Pods"| API
API --> KP1
API --> KP2
KCS -->|"Execute Commands"| EX1
KCS -->|"Execute Commands"| EX2
sequenceDiagram
participant Client
participant Manager as Manager API
participant K8s as Kubernetes API
participant Kata as Kata Pod
participant Executor as Executor API
Client->>Manager: POST /api/kata
Manager->>Manager: Generate unique pod name
Manager->>K8s: Create Pod (kata-qemu runtime)
K8s-->>Manager: Pod created (Pending)
Manager-->>Client: SSE: created event
loop Wait for Pod Ready
Manager->>K8s: Get Pod status
K8s-->>Manager: Pod status
Manager-->>Client: SSE: waiting event
end
K8s->>Kata: Start Kata VM
Kata->>Kata: Boot VM + Start Executor API
loop Health Check Executor API
Manager->>Executor: GET /healthz
alt Healthy
Executor-->>Manager: 200 OK
Manager-->>Client: SSE: healthcheck (healthy)
else Not Ready
Executor-->>Manager: Connection refused / timeout
Manager-->>Client: SSE: healthcheck (unhealthy)
Manager-->>Client: SSE: waiting event
end
end
Manager-->>Client: SSE: ready event
sequenceDiagram
participant Client
participant Manager as Manager API
participant Executor as Executor API (in Kata Pod)
participant Process as Bash Process
Client->>Manager: POST /api/kata/{sandboxId}/execute
Manager->>Manager: Lookup Pod IP
Manager->>Executor: POST /api/execute (SSE)
Executor->>Process: Spawn /bin/bash -c "command"
loop Stream Output
Process-->>Executor: stdout/stderr
Executor-->>Manager: SSE: OutputEvent
Manager-->>Client: SSE: OutputEvent
end
Process-->>Executor: Exit code
Executor-->>Manager: SSE: CompletedEvent
Manager-->>Client: SSE: CompletedEvent
flowchart LR
subgraph Manager["Manager API"]
direction TB
EP[Endpoints]
SVC[KataContainerService]
CFG[Configuration]
end
subgraph Executor["Executor API"]
direction TB
CTRL[ExecutionController]
MP[ManagedProcess]
end
subgraph Shared["Contracts"]
EE[ExecutionEvent]
OE[OutputEvent]
CE[CompletedEvent]
end
EP --> SVC
SVC --> CFG
SVC -.->|HTTP/SSE| CTRL
CTRL --> MP
MP --> OE
MP --> CE
OE --> EE
CE --> EE
- Kubernetes Cluster: k3s v1.33.5+ with Kata Containers enabled
- Namespace:
sandbox-containersnamespace must exist - RBAC: ServiceAccount with appropriate permissions (see k8s/ folder)
- Runtime Class:
kata-qemuRuntimeClass configured in the cluster
The service is configured via appsettings.json:
{
"KataContainerManager": {
"TargetNamespace": "sandbox-containers",
"RuntimeClassName": "kata-qemu",
"DefaultResourceRequests": {
"MemoryMi": 128,
"CpuMillicores": 250
},
"DefaultResourceLimits": {
"MemoryMi": 512,
"CpuMillicores": 1000
},
"PodNamePrefix": "kata-sandbox",
"CleanupCompletedPods": true,
"PodReadyTimeoutSeconds": 90,
"IdleTimeoutMinutes": 5,
"MaxContainerLifetimeMinutes": 15,
"MaxTotalContainers": 50,
"WarmPoolSize": 10,
"PoolBackfillCheckIntervalSeconds": 30,
"LeaderLeaseDurationSeconds": 15
}
}| Option | Default | Range | Description |
|---|---|---|---|
TargetNamespace |
sandbox-containers |
- | Kubernetes namespace for containers |
RuntimeClassName |
kata-qemu |
- | Runtime class for Kata isolation |
PodNamePrefix |
kata-sandbox |
- | Prefix for generated pod names |
PodReadyTimeoutSeconds |
90 |
30-300 | Timeout waiting for pods to become ready |
IdleTimeoutMinutes |
5 |
1-1440 | Delete allocated containers after this idle time |
MaxContainerLifetimeMinutes |
15 |
1-1440 | Maximum lifetime for any allocated container |
MaxTotalContainers |
50 |
1-500 | Maximum total containers (warm + allocated + manual) |
WarmPoolSize |
10 |
0-100 | Target number of pre-warmed containers |
PoolBackfillCheckIntervalSeconds |
30 |
10-300 | How often to check and backfill the pool |
LeaderLeaseDurationSeconds |
15 |
5-60 | Leader election lease duration |
CleanupCheckIntervalMinutes |
1 |
1-60 | How often to check for idle/expired containers |
All endpoints are prefixed with /api/kata to support future multi-runtime capabilities (Kata, gVisor, etc.).
Create a new Kata container. The container image is fixed to the configured default executor image for security.
Request Body:
{
"labels": {
"environment": "sandbox",
"project": "test"
},
"environmentVariables": {
"KEY": "value"
},
"resources": {
"requests": {
"memoryMi": 256,
"cpuMillicores": 500
},
"limits": {
"memoryMi": 1024,
"cpuMillicores": 2000
}
},
"waitForReady": true
}Response: Server-Sent Events (SSE) stream with creation progress:
data: {"eventType":"created","podName":"kata-sandbox-a1b2c3d4","phase":"Pending"}
data: {"eventType":"waiting","podName":"kata-sandbox-a1b2c3d4","attemptNumber":1,"phase":"Pending","message":"Waiting for pod to be ready"}
data: {"eventType":"ready","podName":"kata-sandbox-a1b2c3d4","containerInfo":{...},"elapsedSeconds":15.2}
List all Kata containers.
Response: 200 OK
[
{
"name": "kata-sandbox-a1b2c3d4",
"phase": "Running",
"isReady": true,
"createdAt": "2026-01-13T10:30:00Z",
"nodeName": "office1",
"podIP": "10.42.1.15",
"labels": {
"app": "kata-manager",
"runtime": "kata"
},
"image": "nginx:alpine"
}
]Get details of a specific container.
Response: 200 OK (same structure as list response)
Delete a Kata container.
Response: 200 OK
{
"success": true,
"message": "Container kata-sandbox-a1b2c3d4 deleted successfully",
"podName": "kata-sandbox-a1b2c3d4"
}Health check endpoint.
Response: 200 OK (Healthy) or 503 Service Unavailable (Unhealthy)
DonkeyWork-CodeSandbox-Manager/
├── src/
│ ├── DonkeyWork.CodeSandbox.Manager/ # Manager API (container orchestration)
│ │ ├── Configuration/
│ │ │ └── KataContainerManager.cs # Configuration models with validation
│ │ ├── Endpoints/
│ │ │ └── KataContainerEndpoints.cs # Minimal API endpoints (/api/kata)
│ │ ├── Models/
│ │ │ ├── CreateContainerRequest.cs # Request DTOs
│ │ │ ├── KataContainerInfo.cs # Response DTOs
│ │ │ └── DeleteContainerResponse.cs
│ │ ├── Services/
│ │ │ ├── IKataContainerService.cs # Service interface
│ │ │ └── KataContainerService.cs # Kubernetes operations
│ │ ├── Program.cs # Application entry point
│ │ └── appsettings.json # Configuration
│ ├── DonkeyWork.CodeSandbox.Server/ # Executor API (code execution)
│ │ ├── Controllers/
│ │ │ └── ExecutionController.cs # /api/execute endpoint
│ │ ├── Services/
│ │ │ └── ManagedProcess.cs # Process management with streaming
│ │ └── Program.cs
│ ├── DonkeyWork.CodeSandbox.Contracts/ # Shared models
│ │ ├── Events/
│ │ │ └── ExecutionEvent.cs # OutputEvent, CompletedEvent
│ │ └── Requests/
│ │ └── ExecuteCommand.cs
│ └── DonkeyWork.CodeSandbox.Client/ # .NET client library
├── test/
│ ├── DonkeyWork.CodeSandbox.Manager.Tests/
│ └── DonkeyWork.CodeSandbox.Server.IntegrationTests/
├── Dockerfile # Manager API container
├── docker-compose.yml # Local development setup
└── .github/workflows/
├── pr-build-test.yml # PR validation workflow
└── release.yml # Release automation workflow
- Minimal APIs: Uses ASP.NET Core minimal APIs for simpler, more performant endpoints
- IOptions Pattern: Configuration is validated at startup using data annotations
- In-Cluster Auth: Automatically uses ServiceAccount tokens when running in Kubernetes
- Scoped Services: KataContainerService is scoped to match request lifetime
- Structured Logging: Serilog provides structured logging with context
- Verify the image exists and is accessible
- Check that the
sandbox-containersnamespace exists - Ensure the RuntimeClass
kata-qemuis configured - Check RBAC permissions for the ServiceAccount
- Verify Role and RoleBinding are correctly applied
- Ensure ServiceAccount is attached to the pod
- Check that the service can reach the Kubernetes API server
- Check cluster node capacity
- Verify Kata is installed on nodes
- Check pod events:
kubectl describe pod <pod-name> -n sandbox-containers
- Verify the application is running:
kubectl logs <pod-name> - Check if the service can connect to Kubernetes API
- Review configuration validation errors in logs
- Fixed Container Image: Only the configured default executor image can be used (no arbitrary images)
- Least Privilege: Role only grants permissions in
sandbox-containersnamespace - Non-Root Container: Dockerfile creates and runs as non-root user
- Resource Limits: All containers have resource limits to prevent exhaustion
- VM Isolation: Kata containers provide hardware-level isolation
- Pod Creation: 12-25 seconds (includes VM boot time)
- Overhead: +160Mi RAM, +250m CPU per Kata container
- Recommended Rate: 5-10 Kata pods per minute
- Cluster Capacity: 4 nodes, each supporting multiple Kata VMs
- Kata Containers - Official Kata documentation
- Kubernetes C# Client - Client library
- k3s Documentation - k3s cluster documentation
MIT