docs: add CLI reference documentation

future-xy · future-xy · commit 14d9c082fbd1 · 2025-12-23T11:22:00.000Z
Add comprehensive CLI reference covering all 8 commands:
- start (head/worker)
- submit
- get-instance
- get-result
- list-workers
- get-endpoint
- cancel
- logs

Includes options, arguments, and usage examples.
diff --git a/docs/cli_reference.md b/docs/cli_reference.md
@@ -0,0 +1,300 @@
+# PyLet CLI Reference
+
+PyLet provides a command-line interface for managing a distributed instance execution cluster.
+
+## Installation
+
+```bash
+pip install pylet
+```
+
+After installation, the `pylet` command is available.
+
+---
+
+## Commands
+
+### `pylet start`
+
+Start the head node (server) or a worker node.
+
+```bash
+# Start head node (server) on port 8000
+pylet start
+
+# Start worker node connected to head
+pylet start --head 192.168.1.10:8000
+
+# Start worker with custom resources
+pylet start --head 192.168.1.10:8000 --gpu-units 4 --cpu-cores 8 --memory-mb 16384
+```
+
+**Options:**
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--head <ip:port>` | None | Head node address. If omitted, starts as head node. |
+| `--cpu-cores <int>` | 4 | CPU cores to offer (worker only) |
+| `--gpu-units <int>` | 0 | GPU units to offer (worker only) |
+| `--memory-mb <int>` | 4096 | Memory in MB to offer (worker only) |
+
+---
+
+### `pylet submit`
+
+Submit a new instance to the cluster.
+
+```bash
+# Simple command
+pylet submit echo hello
+
+# With resource requirements
+pylet submit python train.py --cpu-cores 4 --gpu-units 1 --memory-mb 8192
+
+# Named instance (for service discovery)
+pylet submit "vllm serve model --port \$PORT" --name my-vllm --gpu-units 1
+
+# Multi-word commands with quotes
+pylet submit "python -c 'print(\"hello\")'"
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `COMMAND` | Yes | Shell command to execute (can be multiple words) |
+
+**Options:**
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--cpu-cores <int>` | 1 | CPU cores required |
+| `--gpu-units <int>` | 0 | GPU units required |
+| `--memory-mb <int>` | 512 | Memory in MB required |
+| `--name <string>` | None | Instance name for service discovery |
+
+**Output:**
+
+```
+Instance submitted with ID: abc-123-def
+```
+
+---
+
+### `pylet get-instance`
+
+Get instance details by ID or name.
+
+```bash
+# By ID
+pylet get-instance --instance-id abc-123-def
+
+# By name
+pylet get-instance --name my-vllm
+```
+
+**Options:**
+
+| Option | Description |
+|--------|-------------|
+| `--instance-id <string>` | Instance UUID |
+| `--name <string>` | Instance name |
+
+One of `--instance-id` or `--name` is required.
+
+---
+
+### `pylet get-result`
+
+Get the result of a completed instance.
+
+```bash
+pylet get-result abc-123-def
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `INSTANCE_ID` | Yes | Instance UUID |
+
+---
+
+### `pylet list-workers`
+
+List all registered workers in the cluster.
+
+```bash
+pylet list-workers
+```
+
+**Output:**
+
+```
+Worker abc-123 (192.168.1.5) - ONLINE - GPUs: 4
+Worker def-456 (192.168.1.6) - ONLINE - GPUs: 2
+Worker ghi-789 (192.168.1.7) - SUSPECT - GPUs: 1
+```
+
+---
+
+### `pylet get-endpoint`
+
+Get the endpoint (host:port) of a running instance. Useful for service discovery.
+
+```bash
+# By ID
+pylet get-endpoint --instance-id abc-123-def
+
+# By name
+pylet get-endpoint --name my-vllm
+```
+
+**Options:**
+
+| Option | Description |
+|--------|-------------|
+| `--instance-id <string>` | Instance UUID |
+| `--name <string>` | Instance name |
+
+**Output:**
+
+```
+192.168.1.5:15600
+```
+
+---
+
+### `pylet cancel`
+
+Cancel a running instance. Sends SIGTERM, waits grace period, then SIGKILL.
+
+```bash
+pylet cancel abc-123-def
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `INSTANCE_ID` | Yes | Instance UUID |
+
+**Output:**
+
+```
+Cancellation requested for instance abc-123-def
+```
+
+---
+
+### `pylet logs`
+
+Get logs from an instance.
+
+```bash
+# Get all logs
+pylet logs abc-123-def
+
+# Get last 1000 bytes
+pylet logs abc-123-def --tail 1000
+
+# Follow logs (like tail -f)
+pylet logs abc-123-def --follow
+pylet logs abc-123-def -f
+```
+
+**Arguments:**
+
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `INSTANCE_ID` | Yes | Instance UUID |
+
+**Options:**
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--tail <int>` | None | Get only last N bytes |
+| `--follow`, `-f` | False | Follow log output (poll for new content) |
+
+---
+
+## Environment
+
+The CLI connects to the head node at `http://localhost:8000` by default. This is currently hardcoded in the client.
+
+## Exit Codes
+
+| Code | Meaning |
+|------|---------|
+| 0 | Success |
+| 1 | Error (connection failed, instance not found, etc.) |
+
+---
+
+## Examples
+
+### Start a Local Cluster
+
+```bash
+# Terminal 1: Start head node
+pylet start
+
+# Terminal 2: Start worker with 2 GPUs
+pylet start --head localhost:8000 --gpu-units 2 --cpu-cores 8
+
+# Terminal 3: Start another worker
+pylet start --head localhost:8000 --gpu-units 1 --cpu-cores 4
+```
+
+### Submit and Monitor an Instance
+
+```bash
+# Submit a long-running job
+pylet submit "python train.py --epochs 100" --name training --gpu-units 1
+
+# Check status
+pylet get-instance --name training
+
+# Follow logs
+pylet logs $(pylet get-instance --name training | grep -o 'instance_id.*' | cut -d"'" -f2) -f
+
+# Cancel if needed
+pylet cancel <instance-id>
+```
+
+### Run a vLLM Service
+
+```bash
+# Submit vLLM server
+pylet submit "vllm serve Qwen/Qwen2.5-1.5B-Instruct --port \$PORT" \
+    --name vllm-server \
+    --gpu-units 1 \
+    --memory-mb 8192
+
+# Wait and get endpoint
+sleep 30
+ENDPOINT=$(pylet get-endpoint --name vllm-server)
+
+# Use the service
+curl http://$ENDPOINT/v1/completions \
+    -H "Content-Type: application/json" \
+    -d '{"model": "Qwen/Qwen2.5-1.5B-Instruct", "prompt": "Hello", "max_tokens": 10}'
+
+# Cleanup
+pylet cancel <instance-id>
+```
+
+---
+
+## Command Summary
+
+| Command | Purpose |
+|---------|---------|
+| `pylet start` | Start head or worker node |
+| `pylet submit <cmd>` | Submit instance |
+| `pylet get-instance` | Get instance details |
+| `pylet get-result <id>` | Get instance result |
+| `pylet list-workers` | List workers |
+| `pylet get-endpoint` | Get instance endpoint |
+| `pylet cancel <id>` | Cancel instance |
+| `pylet logs <id>` | Get instance logs |