diff --git a/examples/README.md b/examples/README.md
index cb92870e..2f80dc5f 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -18,6 +18,7 @@ service keys.
 | [Build a multimodal wine recommender with OCR](./wine-recommender) | Combining preference-based retrieval with OCR-driven label detection in one UI | `encode`, `score`, `extract` | Docker Compose app plus local SIE endpoint; API key optional for unauthenticated SIE | Runnable demo |
 | [Build a multi-modal product classifier with embeddings](./taxonomy-classification) | Evaluating text, image, NLI, and reranking approaches for hierarchical product taxonomy classification | `extract`, `encode`, `score` | SIE endpoint, Shopify dataset prep via `uv run` scripts, standalone `uv` project | Runnable evaluation example |
 | [Swap an OCR model with one identifier change](./document-ocr) | Driving recognition (VLM-OCR), structured extraction (Donut), and zero-shot NER (GLiNER) through the same `extract` call by swapping the model ID | `extract` | Docker Compose plus Node UI, no API key required, hosted version on [Hugging Face Spaces](https://huggingface.co/spaces/superlinked/document-ocr) | Runnable demo |
+| [Vision-first document RAG](./vision-doc-rag) | Retrieving and answering questions over a multi-tenant page corpus by looking at page images, with OCR kept out of the score path | `encode`, `extract`, `score` (optional) | SIE endpoint with a GPU recommended for ColQwen2.5 + Florence-2-DocVQA | Runnable demo |
 
 For docs publishing, lead with the quickest runnable demos, then use the
 benchmark and evaluation examples for deeper technical users.
diff --git a/examples/vision-doc-rag/.gitignore b/examples/vision-doc-rag/.gitignore
new file mode 100644
index 00000000..a787e920
--- /dev/null
+++ b/examples/vision-doc-rag/.gitignore
@@ -0,0 +1,6 @@
+.venv/
+__pycache__/
+data/pages.json
+data/pages/
+data/multivectors.npz
+data/metadata.json
diff --git a/examples/vision-doc-rag/README.md b/examples/vision-doc-rag/README.md
new file mode 100644
index 00000000..f179051c
--- /dev/null
+++ b/examples/vision-doc-rag/README.md
@@ -0,0 +1,209 @@
+# Vision-first document RAG
+
+Retrieve by image, answer by image. ColQwen2.5 reads each page as a picture
+and ranks them via late interaction; Florence-2-DocVQA reads the winning
+page and produces the textual answer. OCR never enters the score path, so
+charts, screenshots, tables, and any other layout cue that would die in a
+text round-trip still drives ranking. Everything runs on one SIE endpoint.
+
+Each page also carries a `client` tag, so the same corpus serves multiple
+tenants from one index — queries scoped to `acme-corp` cannot retrieve a
+`globex` page, no separate index per tenant required.
+
+## SIE features used
+
+- `encode` — `vidore/colqwen2.5-v0.2` on page images at ingest and on the
+  query text at search time. Output is a `[tokens, 128]` multivector. Late
+  interaction (`sie_sdk.scoring.maxsim`) is the only ranking signal.
+- `extract` — `mynkchaudhry/Florence-2-FT-DocVQA`. Called twice, with two
+  jobs: with `instruction=<your question>` to get a textual answer for the
+  top page, and without `instruction` to OCR the same page for a display
+  snippet. The OCR snippet is UX-only — it never enters the score path.
+- `score` *(optional)* — `Qwen/Qwen3-VL-Reranker-2B` second-stage rerank
+  over `(query text, page image)`. Off by default while we wait for an
+  upstream adapter fix; flip `search.visual_rerank: true` in `config.yaml`
+  to enable it on a cluster that's ready.
+
+## Why vision end-to-end
+
+OCR-then-text-rerank throws away the exact signal we pick ColQwen for —
+charts, screenshots, tables, callouts, and the spatial layout that tells
+a wiki page apart from a checklist. The rerank stays visual or doesn't
+happen. The OCR step shows on-screen text next to the page image so the
+user can copy/paste from the result, nothing more.
+
+## Multi-tenant by construction
+
+Every page carries a `client` field in `data/pages.json`. The metadata list
+loaded by `python/search.py` is filtered by `client_name` before MaxSim
+runs, so a query scoped to `acme-corp` cannot retrieve a `globex` page.
+Real deployments would push `client` down into the multivector store's
+filter expression; the demo keeps everything in memory because the corpus
+is tiny.
+
+## Run it
+
+You need Python 3.12 and a reachable SIE cluster (or local `docker run`).
+
+```bash
+# 1. SIE locally (or point SIE_CLUSTER_URL / SIE_API_KEY at a managed cluster).
+docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default
+
+# 2. Generate the synthetic corpus and render each page to a PNG.
+cd examples/vision-doc-rag
+pip install -r python/requirements.txt
+python data/fetch_dataset.py
+python data/render_pages.py
+
+# 3. Encode every page with ColQwen2.5 and save the multivectors.
+python python/ingest.py
+
+# 4a. CLI demo — runs four scoped queries and prints results.
+python python/search.py
+
+# 4b. Or start the UI.
+uvicorn --app-dir python server:app --port 8888
+open http://localhost:8888
+```
+
+First run on a cold cluster pays a one-time model load: ColQwen2.5 and
+Florence-2 are both several GB, expect roughly a minute on CPU and a few
+seconds on GPU before the warm path kicks in.
+
+### Pointing at a managed cluster
+
+```bash
+export SIE_CLUSTER_URL="https://your-cluster-host:8080"
+export SIE_API_KEY="SL-..."
+```
+
+The defaults in `config.yaml` point at `http://localhost:8080` so the env
+vars only matter when you're hitting something remote. Set `cluster.gpu`
+to a profile name like `l4-spot` if the cluster needs an explicit GPU
+class.
+
+## Try these queries
+
+| Tenant | Query | Why it's interesting |
+|---|---|---|
+| `acme-corp` | how do I sign in to the VPN? | Visual layout match — the page is titled "VPN setup for new engineers" with a bulleted body, and ColQwen2.5 picks it without keyword overlap with "sign in". DocVQA reads the page and answers with the client name and the auth method. |
+| `globex` | what is the parental leave policy? | Disambiguates from "time off" — the right page mentions parental leave only halfway down the body. The textual answer cites the week count. |
+| `initech` | audit prep evidence and walkthroughs | All three Initech pages are compliance-flavored; the visual model breaks the tie by reading the checklist layout. |
+| `globex` | how do I sign in to the VPN? | Tenant filter — even though the same query hit acme-corp earlier, scoping to globex returns the closest globex page (Wi-Fi guide) and never leaks acme content. |
+
+## API
+
+### `GET /api/search`
+
+| Parameter | Required | Description |
+|---|---|---|
+| `q` | yes | Search query |
+| `client` | no | Tenant filter (e.g. `acme-corp`). Omitted ⇒ search runs across all tenants. |
+
+```bash
+curl "http://localhost:8888/api/search?q=how+do+I+sign+in+to+the+VPN&client=acme-corp"
+```
+
+```json
+{
+  "query": "how do I sign in to the VPN",
+  "client": "acme-corp",
+  "answer": "Okta credentials with Duo Push for 2FA",
+  "timings": {
+    "encode_query_s": 0.12,
+    "maxsim_s": 0.003,
+    "docvqa_s": 0.91,
+    "ocr_snippet_s": 0.84
+  },
+  "results": [
+    {
+      "page_id": "ACME-101",
+      "client": "acme-corp",
+      "title": "VPN setup for new engineers",
+      "space": "Engineering",
+      "author": "alice@acme",
+      "web_url": "https://acme.atlassian.net/wiki/spaces/ENG/pages/101",
+      "page_image": "/pages/ACME-101.png",
+      "ocr_snippet": "VPN Setup for New Engineers · ...",
+      "scores": { "maxsim": 14.44, "rerank": null }
+    }
+  ]
+}
+```
+
+### `GET /api/clients`, `GET /api/stats`
+
+Tenant list and runtime config (active models, rerank on/off, page count).
+
+## How it works
+
+```
+        ┌──────────────────────────────────────────────────────────────┐
+        │  ingest.py  (once per corpus)                                │
+        │  pages.json ─▶ render_pages.py ─▶ data/pages/*.png           │
+        │      ─▶ SIE.encode(ColQwen2.5, images, multivector)          │
+        │      ─▶ data/multivectors.npz + data/metadata.json           │
+        └──────────────────────────────────────────────────────────────┘
+                                  │
+                                  ▼
+        ┌──────────────────────────────────────────────────────────────┐
+        │  search.py / server.py  (per query)                          │
+        │  q ─▶ SIE.encode(ColQwen2.5, text, is_query=True)            │
+        │    ─▶ filter metadata by tenant                              │
+        │    ─▶ sie_sdk.scoring.maxsim → top_k_candidates              │
+        │    ─▶ [optional] SIE.score(Qwen3-VL-Reranker, q, images)     │
+        │    ─▶ SIE.extract(Florence-2-DocVQA, instruction=q,          │
+        │                   images=[top_page])  ⇒  textual answer      │
+        │    ─▶ SIE.extract(Florence-2-DocVQA, images=[top_page])      │
+        │                                       ⇒  OCR snippet (UI)   │
+        └──────────────────────────────────────────────────────────────┘
+```
+
+OCR is never on the score path. The visual reranker (when enabled) ranks
+over the same modality as retrieval, so layout cues survive both stages.
+
+The corpus is small enough that MaxSim runs in Python. For thousands of
+pages, hand the multivectors to LanceDB or Vespa; only the SIE calls stay
+the same.
+
+## Customize
+
+`config.yaml` is the single tuning surface:
+
+```yaml
+models:
+  retriever: "vidore/colqwen2.5-v0.2"      # smaller: vidore/colpali-v1.3-hf
+  docvqa: "mynkchaudhry/Florence-2-FT-DocVQA"
+  reranker: "Qwen/Qwen3-VL-Reranker-2B"    # used only when search.visual_rerank: true
+search:
+  top_k_candidates: 5
+  top_k_results: 3
+  visual_rerank: false
+  answer: true
+  ocr_snippet: true
+```
+
+Swap any model for another from the
+[SIE model catalog](https://superlinked.com/models) and the pipeline keeps
+working.
+
+## Project layout
+
+```text
+examples/vision-doc-rag/
+├── config.yaml
+├── data/
+│   ├── fetch_dataset.py        # synthetic 3-tenant page corpus
+│   ├── render_pages.py         # pages.json → PNG screenshots
+│   ├── pages.json              # generated
+│   ├── pages/                  # generated PNGs
+│   ├── metadata.json           # generated by ingest
+│   └── multivectors.npz        # generated by ingest
+├── python/
+│   ├── ingest.py
+│   ├── search.py
+│   ├── server.py
+│   └── requirements.txt
+└── static/
+    └── index.html
+```
diff --git a/examples/vision-doc-rag/config.yaml b/examples/vision-doc-rag/config.yaml
new file mode 100644
index 00000000..8b35ffda
--- /dev/null
+++ b/examples/vision-doc-rag/config.yaml
@@ -0,0 +1,43 @@
+# SIE server (defaults to local Docker: docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default).
+# Override with SIE_CLUSTER_URL / SIE_API_KEY env vars when targeting a managed cluster.
+cluster:
+  url: "http://localhost:8080"
+  api_key: ""
+  gpu: ""                       # only set for managed multi-GPU clusters (e.g. "l4-spot"); ignored locally
+  provision_timeout_s: 600
+
+# Models. The retrieval signal is vision end-to-end: ColQwen2.5 reads each page
+# as an image and we late-interact (MaxSim) against the same model's text-side
+# embedding of the query. No OCR is involved in ranking, so charts, screenshots,
+# tables, and any other layout cue that wouldn't survive an OCR round-trip
+# still contributes to the score.
+#
+# DocVQA produces a textual answer for the top page. The model takes the page
+# image + the user's question (passed via `instruction`) and returns the answer
+# as an entity in the response — no separate LLM call needed.
+models:
+  retriever: "vidore/colqwen2.5-v0.2"
+  docvqa: "mynkchaudhry/Florence-2-FT-DocVQA"
+  # Optional second-stage cross-encoder rerank. Visual model so we don't have to
+  # collapse the page through OCR before reranking. Disabled by default while
+  # we wait for the cluster-side adapter bug to land:
+  #   https://github.com/superlinked/sie-internal/issues/1026
+  # Re-enable with search.visual_rerank: true once that ships.
+  reranker: "Qwen/Qwen3-VL-Reranker-2B"
+
+# Page rendering (used by data/render_pages.py to turn the synthetic page
+# corpus into PNGs; replace with pdf2image, screenshots, or your own files
+# for a real deployment).
+render:
+  width: 1024
+  height: 1280
+  body_font_size: 20
+  title_font_size: 30
+
+# Retrieval
+search:
+  top_k_candidates: 5           # how many pages survive MaxSim
+  top_k_results: 3              # how many pages return after optional rerank
+  visual_rerank: false          # see models.reranker note above
+  answer: true                  # run DocVQA on the top page for a textual answer
+  ocr_snippet: true             # OCR the top page for a display-only snippet in the UI
diff --git a/examples/vision-doc-rag/data/fetch_dataset.py b/examples/vision-doc-rag/data/fetch_dataset.py
new file mode 100644
index 00000000..eb901a6c
--- /dev/null
+++ b/examples/vision-doc-rag/data/fetch_dataset.py
@@ -0,0 +1,211 @@
+"""Synthetic multi-tenant page corpus.
+
+Three fictional clients, each with a handful of pages — engineering runbooks,
+HR policies, finance procedures. Small enough to encode in a minute on a warm
+GPU cluster, varied enough to make multi-tenant filtering and visual retrieval
+meaningful. Replace `PAGES` with your own pages (wiki export, Notion dump,
+PDF batch, etc.) to point the demo at real content.
+"""
+
+import json
+from pathlib import Path
+
+PAGES = [
+    # ── acme-corp: engineering ────────────────────────────────────────────
+    {
+        "client": "acme-corp",
+        "page_id": "ACME-101",
+        "title": "VPN setup for new engineers",
+        "space": "Engineering",
+        "author": "alice@acme",
+        "web_url": "https://acme.atlassian.net/wiki/spaces/ENG/pages/101",
+        "body": [
+            "All engineers need to connect through the corporate VPN to reach internal services.",
+            "We use Cisco AnyConnect on macOS and Windows, and the OpenConnect CLI on Linux.",
+            "Download the client from it.acme.com/vpn, then sign in with your Okta credentials.",
+            "Two-factor confirmation goes through Duo Push.",
+            "If you hit a TLS error on first connection, check that the device certificate from Jamf is installed.",
+            "For on-call rotations, request the always-on VPN profile from IT — it auto-reconnects after suspend.",
+        ],
+    },
+    {
+        "client": "acme-corp",
+        "page_id": "ACME-102",
+        "title": "On-call rotation and paging",
+        "space": "Engineering",
+        "author": "bob@acme",
+        "web_url": "https://acme.atlassian.net/wiki/spaces/ENG/pages/102",
+        "body": [
+            "Engineering on-call runs Monday to Monday handovers at 10:00 PT.",
+            "Primary takes the pager, secondary takes the laptop, both are paid the on-call stipend.",
+            "Pages route through PagerDuty; the escalation policy is primary -> secondary (15 min) -> manager.",
+            "During an incident open a Zoom bridge and a Slack channel named #inc-YYYYMMDD-summary.",
+            "Postmortems are due within five working days and live in the Incidents space.",
+        ],
+    },
+    {
+        "client": "acme-corp",
+        "page_id": "ACME-103",
+        "title": "Deploying to production with our CI/CD pipeline",
+        "space": "Engineering",
+        "author": "carol@acme",
+        "web_url": "https://acme.atlassian.net/wiki/spaces/ENG/pages/103",
+        "body": [
+            "We use GitHub Actions for CI and ArgoCD for delivery to Kubernetes.",
+            "Merging to main triggers a build, runs the test suite, pushes an image to ECR, and updates the staging manifest.",
+            "Production rollouts are gated by a manual approval in ArgoCD and require two reviewers from the service team.",
+            "Use the rolling strategy with maxSurge=25% by default.",
+            "Hotfix tags follow the pattern v1.2.3-hotfix.N and skip staging only with on-call approval recorded in the PR.",
+        ],
+    },
+    {
+        "client": "acme-corp",
+        "page_id": "ACME-104",
+        "title": "Local development setup",
+        "space": "Engineering",
+        "author": "dan@acme",
+        "web_url": "https://acme.atlassian.net/wiki/spaces/ENG/pages/104",
+        "body": [
+            "Install mise to manage runtimes — it pins Node, Python, and Go versions per repo.",
+            "Run `mise install` in the repo root, then `make dev` to spin up Postgres, Redis, and the API gateway in Docker.",
+            "The seed data covers the last 30 days of staging traffic, sanitized of PII.",
+            "If port 5432 is already taken, override DEV_PG_PORT in your shell profile.",
+        ],
+    },
+    # ── globex: HR and admin ──────────────────────────────────────────────
+    {
+        "client": "globex",
+        "page_id": "GLOBEX-201",
+        "title": "Time off and vacation policy",
+        "space": "HR",
+        "author": "hr@globex",
+        "web_url": "https://globex.atlassian.net/wiki/spaces/HR/pages/201",
+        "body": [
+            "Globex offers 25 working days of paid vacation per year, accruing monthly from the start date.",
+            "Requests go through Workday at least two weeks in advance for absences longer than three days.",
+            "Sick leave is separate and uncapped, but anything over three consecutive days requires a doctor's note.",
+            "Parental leave is 18 weeks at full pay for the primary caregiver and 6 weeks for the secondary, regardless of gender.",
+            "Unused vacation rolls over up to 10 days into the next calendar year; the rest is paid out.",
+        ],
+    },
+    {
+        "client": "globex",
+        "page_id": "GLOBEX-202",
+        "title": "Expense reports and reimbursement",
+        "space": "HR",
+        "author": "finance@globex",
+        "web_url": "https://globex.atlassian.net/wiki/spaces/HR/pages/202",
+        "body": [
+            "Submit expenses in Expensify within 30 days of the transaction.",
+            "Receipts are mandatory for any item over $25; below that, a description and category are enough.",
+            "Travel bookings should go through Navan when possible — direct bookings need pre-approval from your manager.",
+            "Reimbursements process every Friday and land in your payroll account the following Tuesday.",
+            "Per diem for international travel is $80 USD equivalent for meals.",
+        ],
+    },
+    {
+        "client": "globex",
+        "page_id": "GLOBEX-203",
+        "title": "Office perks and meals",
+        "space": "HR",
+        "author": "office@globex",
+        "web_url": "https://globex.atlassian.net/wiki/spaces/HR/pages/203",
+        "body": [
+            "Lunch is catered Monday through Thursday in the main cafe from 12:00 to 14:00.",
+            "There are always vegetarian, vegan, and gluten-free options labeled at the buffet.",
+            "Friday is a free-lunch credit you can spend at any partner restaurant in the office app.",
+            "Snacks and drinks in the micro-kitchens are unlimited; please refill empty trays.",
+            "The wellness stipend is $100 per month, claimable in Expensify under category Wellness.",
+        ],
+    },
+    {
+        "client": "globex",
+        "page_id": "GLOBEX-204",
+        "title": "Office Wi-Fi and guest network",
+        "space": "IT",
+        "author": "it@globex",
+        "web_url": "https://globex.atlassian.net/wiki/spaces/IT/pages/204",
+        "body": [
+            "Connect to Globex-Corp for the employee network; sign in with your @globex.com SSO.",
+            "Globex-Guest is for visitors — the rotating daily password is on the lobby screen.",
+            "Printing requires the Globex-Print network and a one-time pairing with your laptop using the Mobility Print app.",
+            "If your laptop will not join, forget the network and rejoin; the cert is renewed weekly and old caches get stuck.",
+        ],
+    },
+    # ── initech: finance and compliance ───────────────────────────────────
+    {
+        "client": "initech",
+        "page_id": "INIT-301",
+        "title": "SOX controls and quarterly attestation",
+        "space": "Compliance",
+        "author": "compliance@initech",
+        "web_url": "https://initech.atlassian.net/wiki/spaces/COMP/pages/301",
+        "body": [
+            "Initech is subject to SOX 404 reporting for financial controls over revenue, expense, and access management.",
+            "Every quarter, control owners attest in AuditBoard that their controls operated as designed.",
+            "Evidence is automatically collected from Workday, NetSuite, and Okta where possible; manual evidence goes in the AuditBoard Drive folder.",
+            "External auditors test a sample of controls in Q3; expect requests for screenshots and approver lists.",
+            "Exceptions must be logged within five business days of detection.",
+        ],
+    },
+    {
+        "client": "initech",
+        "page_id": "INIT-302",
+        "title": "Vendor onboarding and due diligence",
+        "space": "Procurement",
+        "author": "procurement@initech",
+        "web_url": "https://initech.atlassian.net/wiki/spaces/PROC/pages/302",
+        "body": [
+            "New vendors above $50,000 annual spend require a security review and a SOC 2 Type II report on file.",
+            "Submit the vendor questionnaire through Vanta; legal will review the MSA within five business days.",
+            "Payment terms default to Net 60; faster terms require CFO approval and reduce the risk score in NetSuite.",
+            "Sanctioned-country checks run automatically via the OFAC integration; any hit halts the workflow until cleared.",
+            "Annual recertification of high-risk vendors happens every January.",
+        ],
+    },
+    {
+        "client": "initech",
+        "page_id": "INIT-303",
+        "title": "Audit prep checklist",
+        "space": "Compliance",
+        "author": "audit@initech",
+        "web_url": "https://initech.atlassian.net/wiki/spaces/COMP/pages/303",
+        "body": [
+            "Two weeks before the auditors arrive, freeze the control population in AuditBoard and export the evidence index.",
+            "Confirm with control owners that they will be available for walkthrough interviews — block 60 minutes in their calendars.",
+            "Pull the user access review reports for the prior two quarters from Okta and confirm sign-off in writing.",
+            "Have the change management JIRA queries ready: filter by label sox-relevant and status Done.",
+            "If a control failed mid-period, document the compensating control and the date the gap was closed.",
+        ],
+    },
+    {
+        "client": "initech",
+        "page_id": "INIT-304",
+        "title": "Procurement card limits and exceptions",
+        "space": "Procurement",
+        "author": "procurement@initech",
+        "web_url": "https://initech.atlassian.net/wiki/spaces/PROC/pages/304",
+        "body": [
+            "Procurement cards (P-cards) have a default monthly limit of $5,000 and a single-transaction limit of $1,500.",
+            "Use them for low-dollar, low-risk purchases — software subscriptions and conference tickets are the common cases.",
+            "Limit-increase requests need manager and CFO approval and a documented business need.",
+            "Personal use, cash advances, and split transactions to bypass the single-transaction limit are policy violations.",
+            "All P-card transactions reconcile in Coupa within 14 days of statement close.",
+        ],
+    },
+]
+
+
+def main():
+    out = Path(__file__).resolve().parent / "pages.json"
+    out.write_text(json.dumps(PAGES, indent=2))
+    by_client = {}
+    for p in PAGES:
+        by_client[p["client"]] = by_client.get(p["client"], 0) + 1
+    print(f"Wrote {len(PAGES)} pages to {out}")
+    for client, n in sorted(by_client.items()):
+        print(f"  {client}: {n} pages")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/vision-doc-rag/data/render_pages.py b/examples/vision-doc-rag/data/render_pages.py
new file mode 100644
index 00000000..4043d71b
--- /dev/null
+++ b/examples/vision-doc-rag/data/render_pages.py
@@ -0,0 +1,106 @@
+"""Render the synthetic pages to PNG screenshots.
+
+Each entry in pages.json becomes one image in data/pages/<page_id>.png. The
+layout is intentionally plain — a title, a metadata line, and a body block —
+so ColQwen2.5 sees the same kind of visual structure it would in real wikis,
+docs, or PDFs. Replace this script with `pdf2image` (or screenshots) when
+pointing at real content.
+"""
+
+import json
+import sys
+from pathlib import Path
+
+import yaml
+from PIL import Image, ImageDraw, ImageFont
+
+
+def _font(size: int):
+    """Try the platform Helvetica, fall back to PIL's default bitmap font."""
+    for path in [
+        "/System/Library/Fonts/Helvetica.ttc",
+        "/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf",
+        "/usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf",
+    ]:
+        if Path(path).exists():
+            return ImageFont.truetype(path, size)
+    return ImageFont.load_default()
+
+
+def _wrap(text: str, font: ImageFont.ImageFont, max_width: int) -> list[str]:
+    """Greedy word wrap so body paragraphs fit the page width."""
+    lines: list[str] = []
+    for paragraph in text.split("\n"):
+        words = paragraph.split()
+        current = ""
+        for word in words:
+            candidate = f"{current} {word}".strip()
+            if font.getlength(candidate) <= max_width:
+                current = candidate
+            else:
+                if current:
+                    lines.append(current)
+                current = word
+        if current:
+            lines.append(current)
+    return lines
+
+
+def render_page(page: dict, width: int, height: int, body_size: int, title_size: int) -> Image.Image:
+    img = Image.new("RGB", (width, height), "white")
+    draw = ImageDraw.Draw(img)
+    title_font = _font(title_size)
+    meta_font = _font(int(body_size * 0.9))
+    body_font = _font(body_size)
+
+    margin = 48
+    cursor_y = margin
+    draw.text((margin, cursor_y), page["title"], fill="black", font=title_font)
+    cursor_y += int(title_size * 1.6)
+    meta = f"{page['space']}  ·  {page['author']}  ·  {page['page_id']}"
+    draw.text((margin, cursor_y), meta, fill=(96, 96, 96), font=meta_font)
+    cursor_y += int(title_size * 1.2)
+    draw.line([(margin, cursor_y), (width - margin, cursor_y)], fill=(200, 200, 200), width=2)
+    cursor_y += int(body_size * 1.2)
+
+    max_text_width = width - 2 * margin
+    line_gap = int(body_size * 1.5)
+    for bullet in page["body"]:
+        # Render each body line as a wrapped paragraph block.
+        lines = _wrap(bullet, body_font, max_text_width)
+        for line in lines:
+            draw.text((margin, cursor_y), line, fill="black", font=body_font)
+            cursor_y += line_gap
+        cursor_y += int(line_gap * 0.4)  # paragraph spacing
+
+    return img
+
+
+def main():
+    here = Path(__file__).resolve().parent
+    pages_path = here / "pages.json"
+    if not pages_path.exists():
+        print("pages.json not found; run fetch_dataset.py first", file=sys.stderr)
+        sys.exit(1)
+    config = yaml.safe_load((here.parent / "config.yaml").read_text())
+    render = config["render"]
+    out_dir = here / "pages"
+    out_dir.mkdir(exist_ok=True)
+
+    pages = json.loads(pages_path.read_text())
+    for p in pages:
+        img = render_page(
+            p,
+            width=render["width"],
+            height=render["height"],
+            body_size=render["body_font_size"],
+            title_size=render["title_font_size"],
+        )
+        out = out_dir / f"{p['page_id']}.png"
+        img.save(out)
+        print(f"  {p['client']:10s}  {p['page_id']:10s}  ->  {out.relative_to(here.parent)}")
+    print(f"Rendered {len(pages)} pages to {out_dir}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/vision-doc-rag/python/ingest.py b/examples/vision-doc-rag/python/ingest.py
new file mode 100644
index 00000000..15607f30
--- /dev/null
+++ b/examples/vision-doc-rag/python/ingest.py
@@ -0,0 +1,119 @@
+"""Build the per-tenant visual index.
+
+For every page PNG we ask SIE to encode the image with vidore/colqwen2.5-v0.2,
+which returns a [tokens, 128] multivector. Each page's multivector goes into a
+single .npz on disk, alongside a metadata.json that keeps the client name,
+page id, title, and source url for routing and filtering at query time.
+
+There is no vector database here. MaxSim at the scale of one team's wiki
+(hundreds to thousands of pages) is cheap and avoids the indexing step.
+For larger corpora swap the .npz for a multivector store (LanceDB, Vespa,
+Turbopuffer); the encode call is the same.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import time
+from pathlib import Path
+
+import numpy as np
+import yaml
+
+from sie_sdk import SIEClient
+from sie_sdk.types import Item
+
+
+def load_config():
+    return yaml.safe_load((Path(__file__).resolve().parent.parent / "config.yaml").read_text())
+
+
+def load_pages():
+    pages_path = Path(__file__).resolve().parent.parent / "data" / "pages.json"
+    if not pages_path.exists():
+        raise FileNotFoundError(
+            "data/pages.json not found. Run `python data/fetch_dataset.py` "
+            "and `python data/render_pages.py` first."
+        )
+    return json.loads(pages_path.read_text())
+
+
+def encode_pages(client: SIEClient, model: str, pages: list[dict], gpu: str, timeout: float):
+    pages_dir = Path(__file__).resolve().parent.parent / "data" / "pages"
+    multivectors: list[np.ndarray] = []
+    metadata: list[dict] = []
+
+    for i, page in enumerate(pages, 1):
+        image_path = pages_dir / f"{page['page_id']}.png"
+        if not image_path.exists():
+            raise FileNotFoundError(f"Missing page image: {image_path}. Run data/render_pages.py.")
+
+        start = time.time()
+        result = client.encode(
+            model,
+            Item(id=page["page_id"], images=[str(image_path)]),
+            output_types=["multivector"],
+            gpu=gpu,
+            wait_for_capacity=True,
+            provision_timeout_s=timeout,
+        )
+        elapsed = time.time() - start
+        mv = result["multivector"].astype(np.float32)
+        multivectors.append(mv)
+        metadata.append(
+            {
+                "page_id": page["page_id"],
+                "client": page["client"],
+                "title": page["title"],
+                "space": page["space"],
+                "author": page["author"],
+                "web_url": page["web_url"],
+                "image_path": str(image_path.relative_to(image_path.parent.parent.parent)),
+                "num_tokens": int(mv.shape[0]),
+            }
+        )
+        print(f"  [{i}/{len(pages)}] {page['page_id']:10s} {page['client']:10s} {mv.shape} in {elapsed:.1f}s")
+
+    return multivectors, metadata
+
+
+def main():
+    config = load_config()
+    pages = load_pages()
+    print(f"Loaded {len(pages)} pages")
+
+    cluster_url = os.environ.get("SIE_CLUSTER_URL", config["cluster"]["url"])
+    api_key = os.environ.get("SIE_API_KEY", config["cluster"]["api_key"])
+    gpu = config["cluster"]["gpu"]
+    timeout = config["cluster"]["provision_timeout_s"]
+    model = config["models"]["retriever"]
+
+    print(f"\n--- Encoding pages with {model} ---")
+    with SIEClient(cluster_url, api_key=api_key) as client:
+        multivectors, metadata = encode_pages(client, model, pages, gpu, timeout)
+
+    data_dir = Path(__file__).resolve().parent.parent / "data"
+    # np.savez stores variable-length multivectors as one entry per array; we
+    # key them by page_id so the search side can reload without an extra index.
+    np.savez(
+        data_dir / "multivectors.npz",
+        **{m["page_id"]: mv for m, mv in zip(metadata, multivectors)},
+    )
+    (data_dir / "metadata.json").write_text(json.dumps(metadata, indent=2))
+
+    total_tokens = sum(m["num_tokens"] for m in metadata)
+    by_client: dict[str, int] = {}
+    for m in metadata:
+        by_client[m["client"]] = by_client.get(m["client"], 0) + 1
+
+    print(f"\n  Saved {len(metadata)} multivectors to data/multivectors.npz")
+    print(f"  Saved metadata to data/metadata.json")
+    print(f"  Total visual tokens: {total_tokens}")
+    print("  Pages per tenant:")
+    for client_name in sorted(by_client):
+        print(f"    {client_name}: {by_client[client_name]}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/vision-doc-rag/python/requirements.txt b/examples/vision-doc-rag/python/requirements.txt
new file mode 100644
index 00000000..bd32dcbc
--- /dev/null
+++ b/examples/vision-doc-rag/python/requirements.txt
@@ -0,0 +1,6 @@
+sie-sdk==0.1.10
+fastapi>=0.115.0
+uvicorn>=0.30.0
+numpy>=1.26.0
+pyyaml>=6.0
+Pillow>=10.3.0
diff --git a/examples/vision-doc-rag/python/search.py b/examples/vision-doc-rag/python/search.py
new file mode 100644
index 00000000..52dd2211
--- /dev/null
+++ b/examples/vision-doc-rag/python/search.py
@@ -0,0 +1,243 @@
+"""Visual document search + question answering, vision end-to-end.
+
+Pipeline per query:
+  1. encode(ColQwen2.5, text)          — query multivector
+  2. sie_sdk.scoring.maxsim             — late interaction against page images
+  3. score(Qwen3-VL-Reranker, query, images)   — optional, off by default
+  4. extract(Florence-2-FT-DocVQA, instruction=query, images=[top page])
+                                        — textual answer + citation
+  5. extract(Florence-2-FT-DocVQA, images=[top page])
+                                        — OCR snippet for the UI (display only,
+                                          NOT in the ranking path)
+
+The ranking is decided by a vision model looking at the page image, so charts,
+screenshots, tables, and any other visual signal that OCR would erase still
+contributes. OCR runs only on the chosen page, only to provide on-screen text
+the user can read or copy.
+
+Multi-tenant isolation is a Python filter on metadata before MaxSim, so a
+query scoped to one client never sees another client's pages.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import time
+from pathlib import Path
+
+import numpy as np
+import yaml
+
+from sie_sdk import SIEClient
+from sie_sdk.scoring import maxsim
+from sie_sdk.types import Item
+
+
+def load_config():
+    return yaml.safe_load((Path(__file__).resolve().parent.parent / "config.yaml").read_text())
+
+
+def load_index():
+    data_dir = Path(__file__).resolve().parent.parent / "data"
+    if not (data_dir / "multivectors.npz").exists():
+        raise FileNotFoundError("data/multivectors.npz missing. Run `python python/ingest.py` first.")
+    npz = np.load(data_dir / "multivectors.npz")
+    metadata = json.loads((data_dir / "metadata.json").read_text())
+    multivectors = {m["page_id"]: npz[m["page_id"]] for m in metadata}
+    return multivectors, metadata
+
+
+def _ocr_snippet(entities: list[dict], max_chars: int = 400) -> str:
+    """Concatenate OCR text regions into a single readable snippet."""
+    pieces = []
+    for e in entities or []:
+        text = (e.get("text") or "").replace("</s>", "").strip()
+        if text:
+            pieces.append(text)
+    joined = " · ".join(pieces)
+    if len(joined) > max_chars:
+        return joined[: max_chars - 1] + "…"
+    return joined
+
+
+def _docvqa_answer(entities: list[dict]) -> str:
+    """Pick the answer string out of a Florence-2 DocVQA response.
+
+    Florence-2 returns the answer as an entity (often the single one when the
+    `<DocVQA>` task token is dispatched). We take the first non-empty text.
+    """
+    for e in entities or []:
+        text = (e.get("text") or "").replace("</s>", "").strip()
+        if text:
+            return text
+    return ""
+
+
+def search(
+    client: SIEClient,
+    config: dict,
+    multivectors: dict[str, np.ndarray],
+    metadata: list[dict],
+    query: str,
+    client_filter: str | None = None,
+) -> dict:
+    gpu = config["cluster"]["gpu"]
+    timeout = config["cluster"]["provision_timeout_s"]
+    top_k_candidates = config["search"]["top_k_candidates"]
+    top_k_results = config["search"]["top_k_results"]
+    do_visual_rerank = config["search"].get("visual_rerank", False)
+    do_answer = config["search"].get("answer", True)
+    do_ocr_snippet = config["search"].get("ocr_snippet", True)
+
+    corpus = [m for m in metadata if not client_filter or m["client"] == client_filter]
+    if not corpus:
+        return {"results": [], "answer": None, "timings": {}}
+
+    timings: dict[str, float] = {}
+    pages_root = Path(__file__).resolve().parent.parent / "data"
+
+    # 1. Encode query (text side of ColQwen2.5).
+    t0 = time.time()
+    q_result = client.encode(
+        config["models"]["retriever"],
+        Item(text=query),
+        output_types=["multivector"],
+        is_query=True,
+        gpu=gpu,
+        wait_for_capacity=True,
+        provision_timeout_s=timeout,
+    )
+    timings["encode_query_s"] = round(time.time() - t0, 3)
+    query_mv = q_result["multivector"].astype(np.float32)
+
+    # 2. MaxSim against in-memory multivectors.
+    doc_mvs = [multivectors[m["page_id"]] for m in corpus]
+    t0 = time.time()
+    maxsim_scores = maxsim(query_mv, doc_mvs)
+    timings["maxsim_s"] = round(time.time() - t0, 3)
+
+    order = np.argsort(maxsim_scores)[::-1][:top_k_candidates]
+    candidates: list[dict] = []
+    for idx in order:
+        c = dict(corpus[idx])
+        c["_maxsim_score"] = float(maxsim_scores[idx])
+        c["_rerank_score"] = None
+        candidates.append(c)
+
+    # 3. Optional visual rerank. Image-in cross-encoder so OCR never enters the
+    #    ranking path. Disabled by default — see config.yaml for the cluster
+    #    bug we're waiting on.
+    if do_visual_rerank and candidates:
+        try:
+            t0 = time.time()
+            rerank_items = [
+                Item(id=c["page_id"], images=[str(pages_root / c["image_path"])])
+                for c in candidates
+            ]
+            rerank = client.score(
+                config["models"]["reranker"],
+                Item(text=query),
+                rerank_items,
+                gpu=gpu,
+                wait_for_capacity=True,
+                provision_timeout_s=timeout,
+            )
+            timings["visual_rerank_s"] = round(time.time() - t0, 3)
+            rerank_by_id = {s["item_id"]: s for s in rerank["scores"]}
+            for c in candidates:
+                s = rerank_by_id.get(c["page_id"])
+                c["_rerank_score"] = float(s["score"]) if s else 0.0
+            candidates.sort(key=lambda c: c["_rerank_score"] or 0.0, reverse=True)
+        except Exception as exc:
+            # Cluster adapter bug fallback: keep MaxSim ordering, surface the
+            # failure to the caller. See sie-internal#1026.
+            timings["visual_rerank_error"] = type(exc).__name__
+
+    results = candidates[:top_k_results]
+
+    # 4. DocVQA answer from the top page image. instruction= goes in as the
+    #    plain question; the adapter prepends Florence-2's `<DocVQA>` task
+    #    token. See superlinked.com/docs/extract/vision.
+    answer = None
+    if do_answer and results:
+        top = results[0]
+        try:
+            t0 = time.time()
+            qa = client.extract(
+                config["models"]["docvqa"],
+                Item(images=[str(pages_root / top["image_path"])]),
+                instruction=query,
+                gpu=gpu,
+                wait_for_capacity=True,
+                provision_timeout_s=timeout,
+            )
+            timings["docvqa_s"] = round(time.time() - t0, 3)
+            answer = _docvqa_answer(qa[0].get("entities", []) if qa else [])
+        except Exception as exc:
+            timings["docvqa_error"] = type(exc).__name__
+
+    # 5. OCR snippet for display — only on the top result so users see the
+    #    text on the page they're being shown. Never used as a ranking signal.
+    if do_ocr_snippet and results:
+        top = results[0]
+        try:
+            t0 = time.time()
+            ocr = client.extract(
+                config["models"]["docvqa"],   # same model, no `instruction` ⇒ OCR mode
+                Item(images=[str(pages_root / top["image_path"])]),
+                gpu=gpu,
+                wait_for_capacity=True,
+                provision_timeout_s=timeout,
+            )
+            timings["ocr_snippet_s"] = round(time.time() - t0, 3)
+            top["ocr_snippet"] = _ocr_snippet(ocr[0].get("entities", []) if ocr else [])
+        except Exception as exc:
+            timings["ocr_snippet_error"] = type(exc).__name__
+
+    return {"results": results, "answer": answer, "timings": timings}
+
+
+def print_run(out: dict, query: str, client_filter: str | None):
+    scope = client_filter or "all clients"
+    print(f'\n  Query: "{query}"  ({scope})')
+    print(f"  Timings: {out['timings']}")
+    if out["answer"]:
+        print(f"\n  Answer: {out['answer']}")
+    if not out["results"]:
+        print("  No results.")
+        return
+    for i, r in enumerate(out["results"], 1):
+        rerank = r.get("_rerank_score")
+        rerank_str = f"rerank={rerank:.4f}" if rerank is not None else "rerank=—"
+        print(f"\n  {i}. [{r['client']}] {r['title']}")
+        print(f"     {r['page_id']}  ·  {r['space']}  ·  {r['author']}")
+        print(f"     maxsim={r['_maxsim_score']:.3f}  {rerank_str}")
+        if r.get("ocr_snippet"):
+            print(f"     OCR snippet: {r['ocr_snippet'][:200]}")
+        print(f"     url: {r['web_url']}")
+
+
+def main():
+    config = load_config()
+    multivectors, metadata = load_index()
+    print(f"Loaded index: {len(metadata)} pages")
+
+    cluster_url = os.environ.get("SIE_CLUSTER_URL", config["cluster"]["url"])
+    api_key = os.environ.get("SIE_API_KEY", config["cluster"]["api_key"])
+
+    demo = [
+        ("how do I sign in to the VPN?", "acme-corp"),
+        ("what is the parental leave policy?", "globex"),
+        ("audit prep evidence and walkthroughs", "initech"),
+        # No tenant filter: shows the query routes across tenants.
+        ("expense reports and per diem", None),
+    ]
+    with SIEClient(cluster_url, api_key=api_key) as client:
+        for query, tenant in demo:
+            out = search(client, config, multivectors, metadata, query, tenant)
+            print_run(out, query, tenant)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/vision-doc-rag/python/server.py b/examples/vision-doc-rag/python/server.py
new file mode 100644
index 00000000..d61e5962
--- /dev/null
+++ b/examples/vision-doc-rag/python/server.py
@@ -0,0 +1,96 @@
+"""FastAPI backend for the multi-tenant visual-document search + QA demo."""
+
+from __future__ import annotations
+
+import os
+from contextlib import asynccontextmanager
+from pathlib import Path
+
+import yaml
+from fastapi import FastAPI, Query
+from fastapi.responses import FileResponse
+from fastapi.staticfiles import StaticFiles
+
+from sie_sdk import SIEClient
+
+from search import load_index, search
+
+config = None
+multivectors = None
+metadata = None
+client = None
+clients_index: list[str] = []
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    global config, multivectors, metadata, client, clients_index
+    root = Path(__file__).resolve().parent.parent
+    config = yaml.safe_load((root / "config.yaml").read_text())
+    multivectors, metadata = load_index()
+    cluster_url = os.environ.get("SIE_CLUSTER_URL", config["cluster"]["url"])
+    api_key = os.environ.get("SIE_API_KEY", config["cluster"]["api_key"])
+    client = SIEClient(cluster_url, api_key=api_key)
+    clients_index = sorted({m["client"] for m in metadata})
+    yield
+    client.close()
+
+
+app = FastAPI(title="SIE Vision-First Document RAG", lifespan=lifespan)
+
+root = Path(__file__).resolve().parent.parent
+static_dir = root / "static"
+app.mount("/static", StaticFiles(directory=str(static_dir)), name="static")
+app.mount("/pages", StaticFiles(directory=str(root / "data" / "pages")), name="pages")
+
+
+@app.get("/")
+def index():
+    return FileResponse(str(static_dir / "index.html"))
+
+
+@app.get("/api/clients")
+def api_clients():
+    return clients_index
+
+
+@app.get("/api/stats")
+def api_stats():
+    return {
+        "total_pages": len(metadata),
+        "clients": clients_index,
+        "models": config["models"],
+        "visual_rerank": config["search"].get("visual_rerank", False),
+        "answer": config["search"].get("answer", True),
+    }
+
+
+@app.get("/api/search")
+def api_search(
+    q: str = Query(..., min_length=1),
+    client_name: str | None = Query(None, alias="client"),
+):
+    out = search(client, config, multivectors, metadata, q, client_name)
+    return {
+        "query": q,
+        "client": client_name,
+        "answer": out["answer"],
+        "timings": out["timings"],
+        "results": [
+            {
+                "page_id": r["page_id"],
+                "client": r["client"],
+                "title": r["title"],
+                "space": r["space"],
+                "author": r["author"],
+                "web_url": r["web_url"],
+                "page_image": f"/pages/{r['page_id']}.png",
+                "ocr_snippet": r.get("ocr_snippet", ""),
+                "scores": {
+                    "maxsim": round(r["_maxsim_score"], 4),
+                    "rerank": round(r["_rerank_score"], 4) if r.get("_rerank_score") is not None else None,
+                },
+            }
+            for r in out["results"]
+        ],
+    }
diff --git a/examples/vision-doc-rag/static/index.html b/examples/vision-doc-rag/static/index.html
new file mode 100644
index 00000000..392c8791
--- /dev/null
+++ b/examples/vision-doc-rag/static/index.html
@@ -0,0 +1,190 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <title>Vision-First Document RAG · SIE</title>
+    <style>
+      :root {
+        color-scheme: light;
+        --fg: #0f172a;
+        --muted: #475569;
+        --bg: #f8fafc;
+        --card: #ffffff;
+        --border: #e2e8f0;
+        --accent: #0ea5e9;
+      }
+      * { box-sizing: border-box; }
+      body {
+        font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Inter, system-ui, sans-serif;
+        margin: 0;
+        background: var(--bg);
+        color: var(--fg);
+      }
+      header {
+        padding: 24px 32px;
+        border-bottom: 1px solid var(--border);
+        background: var(--card);
+      }
+      h1 { margin: 0 0 4px 0; font-size: 20px; }
+      header p { margin: 0; color: var(--muted); font-size: 14px; }
+      main { padding: 24px 32px; max-width: 1200px; margin: 0 auto; }
+      form {
+        display: flex;
+        gap: 8px;
+        margin-bottom: 24px;
+        flex-wrap: wrap;
+      }
+      select, input[type=text], button {
+        font: inherit;
+        padding: 10px 14px;
+        border: 1px solid var(--border);
+        border-radius: 8px;
+        background: var(--card);
+      }
+      input[type=text] { flex: 1; min-width: 280px; }
+      button {
+        background: var(--accent);
+        color: white;
+        border-color: var(--accent);
+        cursor: pointer;
+      }
+      button:hover { background: #0284c7; }
+      .stats { color: var(--muted); font-size: 13px; margin-bottom: 16px; }
+      .answer-card {
+        padding: 16px 20px;
+        margin-bottom: 20px;
+        background: #f0fdf4;
+        border: 1px solid #bbf7d0;
+        border-radius: 12px;
+      }
+      .answer-card .label {
+        font-size: 11px;
+        text-transform: uppercase;
+        letter-spacing: 0.08em;
+        color: #15803d;
+        font-weight: 600;
+      }
+      .answer-card .text { font-size: 16px; line-height: 1.5; margin-top: 4px; }
+      .result {
+        display: grid;
+        grid-template-columns: 220px 1fr;
+        gap: 20px;
+        padding: 20px;
+        background: var(--card);
+        border: 1px solid var(--border);
+        border-radius: 12px;
+        margin-bottom: 16px;
+      }
+      .result img {
+        width: 100%;
+        border: 1px solid var(--border);
+        border-radius: 8px;
+        cursor: zoom-in;
+      }
+      .title { font-size: 16px; font-weight: 600; margin: 0 0 4px 0; }
+      .meta { font-size: 13px; color: var(--muted); margin-bottom: 8px; }
+      .scores {
+        font-family: ui-monospace, SFMono-Regular, monospace;
+        font-size: 12px;
+        color: var(--muted);
+        margin-bottom: 10px;
+      }
+      .snippet {
+        font-size: 14px;
+        line-height: 1.5;
+        color: var(--fg);
+        background: var(--bg);
+        padding: 10px 12px;
+        border-radius: 8px;
+        border: 1px solid var(--border);
+      }
+      .empty, .loading { color: var(--muted); padding: 12px 0; }
+      .tag {
+        display: inline-block;
+        padding: 2px 8px;
+        background: #e0f2fe;
+        color: #075985;
+        border-radius: 999px;
+        font-size: 12px;
+        font-weight: 500;
+        margin-right: 6px;
+      }
+    </style>
+  </head>
+  <body>
+    <header>
+      <h1>Multi-Tenant Visual Doc Search + QA</h1>
+      <p>ColQwen2.5 ranks pages by looking at the images. Florence-2-DocVQA reads the top page and answers the question. All on one SIE endpoint.</p>
+    </header>
+    <main>
+      <form id="searchForm">
+        <select id="clientSel"><option value="">All clients</option></select>
+        <input id="q" type="text" placeholder="e.g. how do I sign in to the VPN?" autofocus />
+        <button type="submit">Search</button>
+      </form>
+      <div id="stats" class="stats"></div>
+      <div id="answer"></div>
+      <div id="results"></div>
+    </main>
+    <script>
+      const clientSel = document.getElementById("clientSel");
+      const form = document.getElementById("searchForm");
+      const q = document.getElementById("q");
+      const resultsEl = document.getElementById("results");
+      const answerEl = document.getElementById("answer");
+      const statsEl = document.getElementById("stats");
+
+      async function loadStats() {
+        const r = await fetch("/api/stats").then(r => r.json());
+        for (const c of r.clients) {
+          const opt = document.createElement("option");
+          opt.value = c;
+          opt.textContent = c;
+          clientSel.appendChild(opt);
+        }
+        const rerank = r.visual_rerank ? "on" : "off";
+        statsEl.textContent =
+          `${r.total_pages} pages · ${r.clients.length} clients · ` +
+          `retriever=${r.models.retriever} · docvqa=${r.models.docvqa} · visual rerank=${rerank}`;
+      }
+
+      form.addEventListener("submit", async (e) => {
+        e.preventDefault();
+        const query = q.value.trim();
+        if (!query) return;
+        answerEl.innerHTML = "";
+        resultsEl.innerHTML = `<div class="loading">Searching…</div>`;
+        const params = new URLSearchParams({ q: query });
+        if (clientSel.value) params.set("client", clientSel.value);
+        const res = await fetch(`/api/search?${params}`).then(r => r.json());
+        if (res.answer) {
+          answerEl.innerHTML = `
+            <div class="answer-card">
+              <div class="label">Answer (Florence-2-DocVQA)</div>
+              <div class="text">${res.answer.replace(/</g, "&lt;")}</div>
+            </div>`;
+        }
+        if (!res.results.length) {
+          resultsEl.innerHTML = `<div class="empty">No results.</div>`;
+          return;
+        }
+        resultsEl.innerHTML = res.results.map(r => {
+          const rerank = r.scores.rerank == null ? "—" : r.scores.rerank;
+          return `
+          <div class="result">
+            <a href="${r.page_image}" target="_blank"><img src="${r.page_image}" alt="${r.title}"/></a>
+            <div>
+              <div class="title">${r.title}</div>
+              <div class="meta"><span class="tag">${r.client}</span> ${r.space} · ${r.author} · ${r.page_id}</div>
+              <div class="scores">maxsim=${r.scores.maxsim}   rerank=${rerank}</div>
+              ${r.ocr_snippet ? `<div class="snippet">${r.ocr_snippet.replace(/</g, "&lt;")}</div>` : ""}
+            </div>
+          </div>`;
+        }).join("");
+      });
+
+      loadStats();
+    </script>
+  </body>
+</html>

No results.