diff --git a/README/WHATS_NEW_zh-CN.md b/README/WHATS_NEW_zh-CN.md
index 51f42407..04613662 100644
--- a/README/WHATS_NEW_zh-CN.md
+++ b/README/WHATS_NEW_zh-CN.md
@@ -1,5 +1,23 @@
 # 本次更新 — AutoControl
 
+## 本次更新 (2026-06-24) — 逐步评审特征 + 规则式步骤评分
+
+把为代理步骤评分所需的证据打包,并内建规则式评分器。完整参考:[`docs/source/Zh/doc/new_features/v177_features_doc.rst`](../docs/source/Zh/doc/new_features/v177_features_doc.rst)。
+
+- **`build_critic_record` / `score_step_rule_based` / `to_judge_prompt`**(`AC_build_critic_record`、`AC_score_step`):`trajectory_eval` 对整条轨迹评分而无逐步证据;`agent_trace` 发出 span 而非质量;`agent_replay` 保存步骤却不评分。本功能把 `action_effect` + `observation_delta` + `postcondition` 组合成单一逐步记录,接着 `score_step_rule_based` 给出确定性的 `{outcome, process_score, reasons}`(不需模型),`to_judge_prompt` 把它渲染给可选的 LLM-as-judge。纯标准库聚合器;不导入 `PySide6`。
+
+## 本次更新 (2026-06-24) — 标题与正文分类 + 文档大纲
+
+以高度区分标题与正文,并建立文档大纲。完整参考:[`docs/source/Zh/doc/new_features/v176_features_doc.rst`](../docs/source/Zh/doc/new_features/v176_features_doc.rst)。
+
+- **`classify_lines` / `outline`**(`AC_classify_lines`、`AC_outline`):框架中没有功能把行高对应到标题层级或建立章节大纲——`ocr/structure` / `element_parse` 纯属位置性,`text_blocks` 不排序。本功能套用标准启发法:行高超过 `heading_ratio` × 中位行高者为标题,不同标题高度成为层级(最高 = 1)。`classify_lines` 为每行标记 `{box, text, role, level}`;`outline` 依序返回标题作为目录。纯标准库,作用于行字典;不导入 `PySide6`。
+
+## 本次更新 (2026-06-24) — 变化量序列的稳定检测
+
+判断 UI 何时安定下来——以纯粹、可测试的函数作用于变化序列。完整参考:[`docs/source/Zh/doc/new_features/v175_features_doc.rst`](../docs/source/Zh/doc/new_features/v175_features_doc.rst)。
+
+- **`settle_point` / `is_settled` / `SettleTracker`**(`AC_settle_point`):`smart_waits.wait_until_screen_stable` 把稳定逻辑包在 `time.sleep` 循环内、作用于实时帧——你无法喂记录好的序列,也无法单元测试该决策。本功能把它抽离:给定一串*变化量*(像素差 / 元素数差 / 0-1 digest 是否变),在变化量连续 `quiet_samples` 次维持 ≤ `max_churn` 时报告稳定(尖峰重置 run)。`settle_point` 返回稳定索引,`SettleTracker` 为供实时循环的增量形式。纯标准库,不需时钟、不需捕获;不导入 `PySide6`。
+
 ## 本次更新 (2026-06-24) — OCR 行的段落与列表分组
 
 把 OCR 行分组成段落,并检测项目符号 / 编号列表。完整参考:[`docs/source/Zh/doc/new_features/v174_features_doc.rst`](../docs/source/Zh/doc/new_features/v174_features_doc.rst)。
diff --git a/README/WHATS_NEW_zh-TW.md b/README/WHATS_NEW_zh-TW.md
index 65ca68c9..a2cf33fd 100644
--- a/README/WHATS_NEW_zh-TW.md
+++ b/README/WHATS_NEW_zh-TW.md
@@ -1,5 +1,23 @@
 # 本次更新 — AutoControl
 
+## 本次更新 (2026-06-24) — 逐步評審特徵 + 規則式步驟評分
+
+把為代理步驟評分所需的證據打包,並內建規則式評分器。完整參考:[`docs/source/Zh/doc/new_features/v177_features_doc.rst`](../docs/source/Zh/doc/new_features/v177_features_doc.rst)。
+
+- **`build_critic_record` / `score_step_rule_based` / `to_judge_prompt`**(`AC_build_critic_record`、`AC_score_step`):`trajectory_eval` 對整條軌跡評分而無逐步證據;`agent_trace` 發出 span 而非品質;`agent_replay` 保存步驟卻不評分。本功能把 `action_effect` + `observation_delta` + `postcondition` 組合成單一逐步記錄,接著 `score_step_rule_based` 給出確定性的 `{outcome, process_score, reasons}`(不需模型),`to_judge_prompt` 把它渲染給可選的 LLM-as-judge。純標準函式庫聚合器;不匯入 `PySide6`。
+
+## 本次更新 (2026-06-24) — 標題與內文分類 + 文件大綱
+
+以高度區分標題與內文,並建立文件大綱。完整參考:[`docs/source/Zh/doc/new_features/v176_features_doc.rst`](../docs/source/Zh/doc/new_features/v176_features_doc.rst)。
+
+- **`classify_lines` / `outline`**(`AC_classify_lines`、`AC_outline`):框架中沒有功能把行高對應到標題層級或建立章節大綱——`ocr/structure` / `element_parse` 純屬位置性,`text_blocks` 不排序。本功能套用標準啟發法:行高超過 `heading_ratio` × 中位行高者為標題,不同標題高度成為層級(最高 = 1)。`classify_lines` 為每行標記 `{box, text, role, level}`;`outline` 依序回傳標題作為目錄。純標準函式庫,作用於行字典;不匯入 `PySide6`。
+
+## 本次更新 (2026-06-24) — 變化量序列的穩定偵測
+
+判斷 UI 何時安定下來——以純粹、可測試的函式作用於變化序列。完整參考:[`docs/source/Zh/doc/new_features/v175_features_doc.rst`](../docs/source/Zh/doc/new_features/v175_features_doc.rst)。
+
+- **`settle_point` / `is_settled` / `SettleTracker`**(`AC_settle_point`):`smart_waits.wait_until_screen_stable` 把穩定邏輯包在 `time.sleep` 迴圈內、作用於即時幀——你無法餵記錄好的序列,也無法單元測試該決策。本功能把它抽離:給定一串*變化量*(像素差 / 元素數差 / 0-1 digest 是否變),在變化量連續 `quiet_samples` 次維持 ≤ `max_churn` 時回報穩定(尖峰重置 run)。`settle_point` 回傳穩定索引,`SettleTracker` 為供即時迴圈的增量形式。純標準函式庫,不需時鐘、不需擷取;不匯入 `PySide6`。
+
 ## 本次更新 (2026-06-24) — OCR 行的段落與清單分組
 
 把 OCR 行分組成段落,並偵測項目符號 / 編號清單。完整參考:[`docs/source/Zh/doc/new_features/v174_features_doc.rst`](../docs/source/Zh/doc/new_features/v174_features_doc.rst)。
diff --git a/WHATS_NEW.md b/WHATS_NEW.md
index 144438db..ed2d61ce 100644
--- a/WHATS_NEW.md
+++ b/WHATS_NEW.md
@@ -1,5 +1,23 @@
 # What's New — AutoControl
 
+## What's new (2026-06-24) — Per-Step Critic Features + Rule-Based Step Scorer
+
+Bundle the evidence to score an agent step, with a built-in rule-based scorer. Full reference: [`docs/source/Eng/doc/new_features/v177_features_doc.rst`](docs/source/Eng/doc/new_features/v177_features_doc.rst).
+
+- **`build_critic_record` / `score_step_rule_based` / `to_judge_prompt`** (`AC_build_critic_record`, `AC_score_step`): `trajectory_eval` scores a whole trajectory with no per-step evidence; `agent_trace` emits spans not quality; `agent_replay` stores steps but doesn't score. This composes `action_effect` + `observation_delta` + `postcondition` into one per-step record, then `score_step_rule_based` gives a deterministic `{outcome, process_score, reasons}` (no model needed) and `to_judge_prompt` renders it for an optional LLM-as-judge. Pure-stdlib aggregator; no `PySide6`.
+
+## What's new (2026-06-24) — Heading vs Body Classification + Document Outline
+
+Tell headings from body text by height and build a document outline. Full reference: [`docs/source/Eng/doc/new_features/v176_features_doc.rst`](docs/source/Eng/doc/new_features/v176_features_doc.rst).
+
+- **`classify_lines` / `outline`** (`AC_classify_lines`, `AC_outline`): nothing mapped line height to heading levels or built a section outline — `ocr/structure` / `element_parse` are positional and `text_blocks` doesn't rank. This applies the standard heuristic: a line taller than `heading_ratio` × the median line height is a heading, and distinct heading heights become levels (tallest = 1). `classify_lines` tags each line `{box, text, role, level}`; `outline` returns the headings in order as a table of contents. Pure-stdlib over line dicts; no `PySide6`.
+
+## What's new (2026-06-24) — Settle Detection Over a Churn Series
+
+Decide when the UI has gone quiet — as a pure, testable function over a change series. Full reference: [`docs/source/Eng/doc/new_features/v175_features_doc.rst`](docs/source/Eng/doc/new_features/v175_features_doc.rst).
+
+- **`settle_point` / `is_settled` / `SettleTracker`** (`AC_settle_point`): `smart_waits.wait_until_screen_stable` bakes the settle logic inside a `time.sleep` loop over live frames — you can't feed it a recorded series or unit-test the decision. This extracts it: given a stream of *churn* values (pixel delta / element-count delta / 0-1 digest-changed), it reports when churn stayed ≤ `max_churn` for `quiet_samples` in a row (a spike resets the run). `settle_point` returns the settle index, `SettleTracker` is the incremental form for a live loop. Pure-stdlib, no clock, no capture; no `PySide6`.
+
 ## What's new (2026-06-24) — Paragraph & List Grouping of OCR Lines
 
 Group OCR lines into paragraphs and detect bulleted / numbered lists. Full reference: [`docs/source/Eng/doc/new_features/v174_features_doc.rst`](docs/source/Eng/doc/new_features/v174_features_doc.rst).
diff --git a/docs/source/Eng/doc/new_features/v175_features_doc.rst b/docs/source/Eng/doc/new_features/v175_features_doc.rst
new file mode 100644
index 00000000..ba3faf3a
--- /dev/null
+++ b/docs/source/Eng/doc/new_features/v175_features_doc.rst
@@ -0,0 +1,43 @@
+Settle Detection Over a Churn Series
+====================================
+
+``smart_waits.wait_until_screen_stable`` and ``actionability``'s stability check bake the
+settle logic *inside* a ``time.sleep`` polling loop over live pixel frames — you cannot feed
+them a recorded series of a11y-element counts or screen-diff metrics, and you cannot unit-test
+the *decision* independently of capture. ``settle_detector`` extracts that decision: it takes a
+stream of *churn* values (how much changed each sample — a pixel delta, an element-count delta,
+a digest-changed 0/1, anything) and reports when the churn has stayed at or below ``max_churn``
+for ``quiet_samples`` in a row. A spike resets the quiet run, so "settled then changed again"
+is handled.
+
+Pure-stdlib; deterministic and unit-testable on an injected series with no capture and no
+clock. Imports no ``PySide6``.
+
+Headless API
+------------
+
+.. code-block:: python
+
+    from je_auto_control import settle_point, is_settled, SettleTracker
+
+    churns = [5, 4, 0.5, 0.3, 0.2]          # per-frame change metric
+    settle_point(churns, quiet_samples=3, max_churn=1.0)   # -> 4
+    is_settled(churns, quiet_samples=3, max_churn=1.0)     # -> True
+
+    # incremental, for a live loop (you supply the churn each tick)
+    tracker = SettleTracker(quiet_samples=3, max_churn=1.0)
+    state = tracker.update(current_churn)
+    if state.settled:
+        observe_now()
+
+``settle_point`` returns the index at which the series first settles (or ``None``);
+``is_settled`` is the boolean. ``SettleTracker`` is the incremental form: ``update(churn)``
+returns a ``SettleState`` (``settled`` / ``quiet_run`` / ``churn``); ``reset`` clears the run
+(e.g. right after acting again).
+
+Executor command
+----------------
+
+``AC_settle_point`` (``churns`` / ``quiet_samples`` / ``max_churn`` → ``{settled, index}``) is
+exposed as the MCP tool ``ac_settle_point`` (read-only) and as the Script Builder command
+**Settle Point (churn series)** under **Flow**.
diff --git a/docs/source/Eng/doc/new_features/v176_features_doc.rst b/docs/source/Eng/doc/new_features/v176_features_doc.rst
new file mode 100644
index 00000000..bf58d1ce
--- /dev/null
+++ b/docs/source/Eng/doc/new_features/v176_features_doc.rst
@@ -0,0 +1,37 @@
+Heading vs Body Classification + Document Outline
+=================================================
+
+Nothing in the framework maps line height to heading levels or builds a section outline —
+``ocr/structure`` and ``element_parse`` are purely positional, and ``text_blocks`` groups
+paragraphs / lists but does not rank them. ``heading_segment`` adds the standard heuristic:
+a line whose height exceeds ``heading_ratio`` times the median line height is a heading, and
+distinct heading heights become heading *levels* (the tallest is level 1). From that it emits
+a flat document outline.
+
+Pure-stdlib over plain line dicts (text + bbox); fully unit-testable with no image and no OCR
+engine. Reuses ``table_grid_fill``'s box-bounds reader. Imports no ``PySide6``.
+
+Headless API
+------------
+
+.. code-block:: python
+
+    from je_auto_control import classify_lines, outline
+
+    for item in classify_lines(ocr_lines, heading_ratio=1.2):
+        print(item["role"], item["level"], item["text"])
+
+    for heading in outline(ocr_lines):
+        print("  " * (heading["level"] - 1) + heading["text"])
+
+``classify_lines`` tags each line ``{box, text, role, level}`` — ``role`` is ``"heading"`` or
+``"body"``, ``level`` is the heading level (1 = tallest, 0 for body). ``outline`` returns just
+the headings in top-to-bottom order as ``{level, text, top}`` — a document table of contents.
+
+Executor commands
+-----------------
+
+``AC_classify_lines`` (``lines`` / ``heading_ratio`` → ``{count, lines}``) and ``AC_outline``
+(``lines`` / ``heading_ratio`` → ``{count, headings}``). They are exposed as the MCP tools
+``ac_classify_lines`` / ``ac_outline`` (read-only) and as the Script Builder commands
+**Classify Headings vs Body** / **Document Outline** under **OCR**.
diff --git a/docs/source/Eng/doc/new_features/v177_features_doc.rst b/docs/source/Eng/doc/new_features/v177_features_doc.rst
new file mode 100644
index 00000000..f1c074c0
--- /dev/null
+++ b/docs/source/Eng/doc/new_features/v177_features_doc.rst
@@ -0,0 +1,46 @@
+Per-Step Critic Features + Rule-Based Step Scorer
+=================================================
+
+Scoring an agent's step needs the evidence in one place — what the action was, what changed,
+whether it landed on target, whether the declared postcondition held. ``trajectory_eval``
+scores a *finished whole trajectory* against a static rubric and has no per-step evidence;
+``agent_trace`` emits OTel spans (tokens / latency), not decision quality; ``agent_replay``
+persists ``{obs, action, result}`` but does no scoring. ``critic_features`` is the missing
+per-step layer: it composes ``action_effect`` (did it do anything, where),
+``observation_delta`` (how much changed) and ``postcondition`` (did the expected outcome hold)
+into one compact record, and ships a deterministic rule-based scorer so the feature works fully
+headless — leaving the optional LLM-as-judge to the integrator.
+
+Pure-stdlib; composes existing pure modules; deterministic and unit-testable with no device.
+Imports no ``PySide6``.
+
+Headless API
+------------
+
+.. code-block:: python
+
+    from je_auto_control import (build_critic_record, score_step_rule_based,
+                                 to_judge_prompt)
+
+    record = build_critic_record({"type": "click", "x": 480, "y": 260},
+                                 before_elements, after_elements,
+                                 postcondition={"appears": {"role": "dialog"}})
+    score = score_step_rule_based(record)
+    # {"outcome": True, "process_score": 1.0, "reasons": [...]}
+
+    prompt = to_judge_prompt(record)   # compact text for an LLM-as-judge
+
+``build_critic_record`` returns ``{action, effect, delta_counts}`` plus a ``postcondition``
+report when a spec is given. ``score_step_rule_based`` returns ``{outcome, process_score,
+reasons}`` — ``outcome`` is a binary success (the action did something *and* any postcondition
+held), ``process_score`` is a 0..1 quality from the effect class (halved if the postcondition
+failed). ``to_judge_prompt`` renders the record for an external judge.
+
+Executor commands
+-----------------
+
+``AC_build_critic_record`` (``action`` / ``before`` / ``after`` / ``postcondition`` /
+``radius`` → the record) and ``AC_score_step`` (``record`` → ``{outcome, process_score,
+reasons}``). They are exposed as the MCP tools ``ac_build_critic_record`` / ``ac_score_step``
+(read-only) and as the Script Builder commands **Build Critic Record** / **Score Step
+(rule-based)** under **Native UI**.
diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst
index 7d1ed13f..a1cf10d5 100644
--- a/docs/source/Eng/eng_index.rst
+++ b/docs/source/Eng/eng_index.rst
@@ -197,6 +197,9 @@ Comprehensive guides for all AutoControl features.
    doc/new_features/v172_features_doc
    doc/new_features/v173_features_doc
    doc/new_features/v174_features_doc
+   doc/new_features/v175_features_doc
+   doc/new_features/v176_features_doc
+   doc/new_features/v177_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc
diff --git a/docs/source/Zh/doc/new_features/v175_features_doc.rst b/docs/source/Zh/doc/new_features/v175_features_doc.rst
new file mode 100644
index 00000000..346ef90b
--- /dev/null
+++ b/docs/source/Zh/doc/new_features/v175_features_doc.rst
@@ -0,0 +1,39 @@
+變化量序列的穩定偵測
+====================
+
+``smart_waits.wait_until_screen_stable`` 與 ``actionability`` 的穩定檢查把穩定邏輯包在
+``time.sleep`` 輪詢迴圈內、作用於即時像素幀——你無法餵給它一段記錄好的 a11y 元素數或畫面
+差異指標序列,也無法獨立於擷取去單元測試那個*決策*。``settle_detector`` 把該決策抽離:它接收
+一串*變化量*(churn,每個樣本變了多少——像素差、元素數差、digest 是否變的 0/1,皆可),並在
+變化量連續 ``quiet_samples`` 次維持在 ``max_churn`` 以下時回報穩定。尖峰會重置 quiet run,因此
+「穩定後又變動」也能處理。
+
+純標準函式庫;確定性、可在注入序列上單元測試,不需擷取、不需時鐘。不匯入 ``PySide6``。
+
+無頭 API
+--------
+
+.. code-block:: python
+
+    from je_auto_control import settle_point, is_settled, SettleTracker
+
+    churns = [5, 4, 0.5, 0.3, 0.2]          # 每幀變化量指標
+    settle_point(churns, quiet_samples=3, max_churn=1.0)   # -> 4
+    is_settled(churns, quiet_samples=3, max_churn=1.0)     # -> True
+
+    # 增量版,供即時迴圈(你每 tick 提供 churn)
+    tracker = SettleTracker(quiet_samples=3, max_churn=1.0)
+    state = tracker.update(current_churn)
+    if state.settled:
+        observe_now()
+
+``settle_point`` 回傳序列首次穩定的索引(或 ``None``);``is_settled`` 為布林。``SettleTracker``
+為增量形式:``update(churn)`` 回傳 ``SettleState``(``settled`` / ``quiet_run`` / ``churn``);
+``reset`` 清除 run(例如在再次動作後)。
+
+執行器指令
+----------
+
+``AC_settle_point``(``churns`` / ``quiet_samples`` / ``max_churn`` → ``{settled, index}``)
+以 MCP 工具 ``ac_settle_point``(唯讀)及 Script Builder 指令 **Settle Point (churn series)**
+(位於 **Flow** 分類下)形式提供。
diff --git a/docs/source/Zh/doc/new_features/v176_features_doc.rst b/docs/source/Zh/doc/new_features/v176_features_doc.rst
new file mode 100644
index 00000000..c37aecd5
--- /dev/null
+++ b/docs/source/Zh/doc/new_features/v176_features_doc.rst
@@ -0,0 +1,35 @@
+標題與內文分類 + 文件大綱
+==========================
+
+框架中沒有任何功能把行高對應到標題層級或建立章節大綱——``ocr/structure`` 與 ``element_parse``
+純屬位置性,``text_blocks`` 把段落 / 清單分組但不對其排序。``heading_segment`` 補上標準啟發法:
+行高超過 ``heading_ratio`` 乘以中位行高者為標題,且不同的標題高度成為標題*層級*(最高為第 1 級)。
+由此輸出扁平的文件大綱。
+
+純標準函式庫,作用於純行字典(text + bbox);可在無影像、無 OCR 引擎下完整單元測試。重用
+``table_grid_fill`` 的框邊界讀取器。不匯入 ``PySide6``。
+
+無頭 API
+--------
+
+.. code-block:: python
+
+    from je_auto_control import classify_lines, outline
+
+    for item in classify_lines(ocr_lines, heading_ratio=1.2):
+        print(item["role"], item["level"], item["text"])
+
+    for heading in outline(ocr_lines):
+        print("  " * (heading["level"] - 1) + heading["text"])
+
+``classify_lines`` 為每行標記 ``{box, text, role, level}``——``role`` 為 ``"heading"`` 或
+``"body"``,``level`` 為標題層級(1 = 最高,內文為 0)。``outline`` 只回傳依上到下順序的標題,
+為 ``{level, text, top}``——即文件目錄。
+
+執行器指令
+----------
+
+``AC_classify_lines``(``lines`` / ``heading_ratio`` → ``{count, lines}``)與 ``AC_outline``
+(``lines`` / ``heading_ratio`` → ``{count, headings}``)。兩者以 MCP 工具 ``ac_classify_lines`` /
+``ac_outline``(唯讀)及 Script Builder 指令 **Classify Headings vs Body** / **Document Outline**
+(位於 **OCR** 分類下)形式提供。
diff --git a/docs/source/Zh/doc/new_features/v177_features_doc.rst b/docs/source/Zh/doc/new_features/v177_features_doc.rst
new file mode 100644
index 00000000..7839708f
--- /dev/null
+++ b/docs/source/Zh/doc/new_features/v177_features_doc.rst
@@ -0,0 +1,41 @@
+逐步評審特徵 + 規則式步驟評分
+==============================
+
+為代理的步驟評分需要把證據集中一處——動作是什麼、變了什麼、是否落在目標、宣告的後置條件
+是否成立。``trajectory_eval`` 對*已完成的整條軌跡*依靜態準則評分,沒有逐步證據;
+``agent_trace`` 發出 OTel span(權杖 / 延遲),而非決策品質;``agent_replay`` 保存
+``{obs, action, result}`` 卻不評分。``critic_features`` 正是缺少的逐步層:它把 ``action_effect``
+(有無效果、落在何處)、``observation_delta``(變了多少)與 ``postcondition``(預期結果是否成立)
+組合成單一精簡記錄,並附上確定性的規則式評分器,使此功能可完整無頭運作——把可選的
+LLM-as-judge 留給整合者。
+
+純標準函式庫;組合既有純模組;確定性、可在無裝置下單元測試。不匯入 ``PySide6``。
+
+無頭 API
+--------
+
+.. code-block:: python
+
+    from je_auto_control import (build_critic_record, score_step_rule_based,
+                                 to_judge_prompt)
+
+    record = build_critic_record({"type": "click", "x": 480, "y": 260},
+                                 before_elements, after_elements,
+                                 postcondition={"appears": {"role": "dialog"}})
+    score = score_step_rule_based(record)
+    # {"outcome": True, "process_score": 1.0, "reasons": [...]}
+
+    prompt = to_judge_prompt(record)   # 給 LLM-as-judge 的精簡文字
+
+``build_critic_record`` 回傳 ``{action, effect, delta_counts}``,並在給定規格時附上
+``postcondition`` 報告。``score_step_rule_based`` 回傳 ``{outcome, process_score, reasons}``
+——``outcome`` 為二元成功(動作有效果*且*任何後置條件成立),``process_score`` 為依效果類別的
+0..1 品質(後置條件失敗時減半)。``to_judge_prompt`` 把記錄渲染給外部評審。
+
+執行器指令
+----------
+
+``AC_build_critic_record``(``action`` / ``before`` / ``after`` / ``postcondition`` /
+``radius`` → 該記錄)與 ``AC_score_step``(``record`` → ``{outcome, process_score, reasons}``)。
+兩者以 MCP 工具 ``ac_build_critic_record`` / ``ac_score_step``(唯讀)及 Script Builder 指令
+**Build Critic Record** / **Score Step (rule-based)**(位於 **Native UI** 分類下)形式提供。
diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst
index 198b3e6a..65497aa5 100644
--- a/docs/source/Zh/zh_index.rst
+++ b/docs/source/Zh/zh_index.rst
@@ -197,6 +197,9 @@ AutoControl 所有功能的完整使用指南。
    doc/new_features/v172_features_doc
    doc/new_features/v173_features_doc
    doc/new_features/v174_features_doc
+   doc/new_features/v175_features_doc
+   doc/new_features/v176_features_doc
+   doc/new_features/v177_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc
diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py
index 0be599b3..7f645d83 100644
--- a/je_auto_control/__init__.py
+++ b/je_auto_control/__init__.py
@@ -323,6 +323,10 @@
 from je_auto_control.utils.text_blocks import (
     detect_lists, group_paragraphs,
 )
+# Classify OCR lines as headings vs body and build a document outline
+from je_auto_control.utils.heading_segment import (
+    classify_lines, outline,
+)
 # Associate form labels with values (multi-direction) + checkbox state
 from je_auto_control.utils.form_fields import (
     associate_fields, checkbox_state, match_labels_to_widgets,
@@ -343,6 +347,14 @@
 from je_auto_control.utils.grounding_consensus import (
     ConsensusResult, consensus_element, consensus_point, is_confident,
 )
+# Decide when a UI has settled, as a pure seam over a churn series
+from je_auto_control.utils.settle_detector import (
+    SettleState, SettleTracker, is_settled, settle_point,
+)
+# Per-step critic feature bundle + rule-based step scorer
+from je_auto_control.utils.critic_features import (
+    build_critic_record, score_step_rule_based, to_judge_prompt,
+)
 # Locate on-screen regions by colour (mask + connected components)
 from je_auto_control.utils.color_region import (
     find_color_region, find_color_regions,
@@ -1285,6 +1297,8 @@ def start_autocontrol_gui(*args, **kwargs):
     "to_blocks",
     "group_paragraphs",
     "detect_lists",
+    "classify_lines",
+    "outline",
     "associate_fields",
     "match_labels_to_widgets",
     "checkbox_state",
@@ -1304,6 +1318,13 @@ def start_autocontrol_gui(*args, **kwargs):
     "consensus_point",
     "consensus_element",
     "is_confident",
+    "SettleState",
+    "SettleTracker",
+    "settle_point",
+    "is_settled",
+    "build_critic_record",
+    "score_step_rule_based",
+    "to_judge_prompt",
     "find_color_region",
     "find_color_regions",
     "ssim_compare",
diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py
index a022c6d5..9f0ab51b 100644
--- a/je_auto_control/gui/script_builder/command_schema.py
+++ b/je_auto_control/gui/script_builder/command_schema.py
@@ -827,6 +827,26 @@ def _add_ocr_specs(specs: List[CommandSpec]) -> None:
         ),
         description="Detect bulleted / numbered list items among OCR lines.",
     ))
+    specs.append(CommandSpec(
+        "AC_classify_lines", "OCR", "Classify Headings vs Body",
+        fields=(
+            FieldSpec("lines", FieldType.STRING,
+                      placeholder='[{"x":0,"y":0,"width":200,"height":40,'
+                                  '"text":"Title"}]'),
+            FieldSpec("heading_ratio", FieldType.FLOAT, optional=True, default=1.2),
+        ),
+        description="Tag OCR lines as heading/body by height; assign heading levels.",
+    ))
+    specs.append(CommandSpec(
+        "AC_outline", "OCR", "Document Outline",
+        fields=(
+            FieldSpec("lines", FieldType.STRING,
+                      placeholder='[{"x":0,"y":0,"width":200,"height":40,'
+                                  '"text":"Title"}]'),
+            FieldSpec("heading_ratio", FieldType.FLOAT, optional=True, default=1.2),
+        ),
+        description="Headings in order with levels (document outline) from OCR lines.",
+    ))
     specs.append(CommandSpec(
         "AC_scroll_to_find", "OCR", "Scroll Until Visible",
         fields=(
@@ -3273,6 +3293,39 @@ def _add_set_of_marks_specs(specs: List[CommandSpec]) -> None:
         ),
         description="Agreed target point from clustered grounding proposals.",
     ))
+    specs.append(CommandSpec(
+        "AC_settle_point", "Flow", "Settle Point (churn series)",
+        fields=(
+            FieldSpec("churns", FieldType.STRING,
+                      placeholder="[5, 4, 0.5, 0.3, 0.2]"),
+            FieldSpec("quiet_samples", FieldType.INT, optional=True, default=3),
+            FieldSpec("max_churn", FieldType.FLOAT, optional=True, default=1.0),
+        ),
+        description="Index where a churn series first settles (offline settle check).",
+    ))
+    specs.append(CommandSpec(
+        "AC_build_critic_record", "Native UI", "Build Critic Record",
+        fields=(
+            FieldSpec("action", FieldType.STRING,
+                      placeholder='{"type":"click","x":50,"y":50}'),
+            FieldSpec("before", FieldType.STRING,
+                      placeholder='[{"role":"button","x":0,"y":0}]'),
+            FieldSpec("after", FieldType.STRING,
+                      placeholder='[{"role":"dialog","x":40,"y":40}]'),
+            FieldSpec("postcondition", FieldType.STRING, optional=True,
+                      placeholder='{"appears":{"role":"dialog"}}'),
+            FieldSpec("radius", FieldType.INT, optional=True, default=64),
+        ),
+        description="Per-step critic evidence (effect + delta + postcondition).",
+    ))
+    specs.append(CommandSpec(
+        "AC_score_step", "Native UI", "Score Step (rule-based)",
+        fields=(
+            FieldSpec("record", FieldType.STRING,
+                      placeholder='{"effect":{"effect":"changed_near_target"}}'),
+        ),
+        description="Rule-based outcome + process score of a critic record.",
+    ))
     specs.append(CommandSpec(
         "AC_consensus_element", "Native UI", "Grounding Consensus Element",
         fields=(
diff --git a/je_auto_control/utils/critic_features/__init__.py b/je_auto_control/utils/critic_features/__init__.py
new file mode 100644
index 00000000..a3651049
--- /dev/null
+++ b/je_auto_control/utils/critic_features/__init__.py
@@ -0,0 +1,6 @@
+"""Per-step critic feature bundle + a rule-based step scorer."""
+from je_auto_control.utils.critic_features.critic_features import (
+    build_critic_record, score_step_rule_based, to_judge_prompt,
+)
+
+__all__ = ["build_critic_record", "score_step_rule_based", "to_judge_prompt"]
diff --git a/je_auto_control/utils/critic_features/critic_features.py b/je_auto_control/utils/critic_features/critic_features.py
new file mode 100644
index 00000000..4c299480
--- /dev/null
+++ b/je_auto_control/utils/critic_features/critic_features.py
@@ -0,0 +1,79 @@
+"""Per-step critic feature bundle + a rule-based step scorer.
+
+Scoring an agent's step needs the evidence in one place — what the action was, what changed,
+whether it landed on target, whether the declared postcondition held. ``trajectory_eval``
+scores a *finished whole trajectory* against a static rubric and has no per-step evidence;
+``agent_trace`` emits OTel spans (tokens / latency), not decision quality; ``agent_replay``
+persists ``{obs, action, result}`` but does no scoring. ``critic_features`` is the missing
+per-step layer: it composes ``action_effect`` (did it do anything, where), ``observation_delta``
+(how much changed) and ``postcondition`` (did the expected outcome hold) into one compact
+record, and ships a deterministic rule-based scorer so the feature works fully headless —
+leaving the optional LLM-as-judge to the integrator (``to_judge_prompt``).
+
+Pure-stdlib; composes existing pure modules; deterministic and unit-testable with no device.
+Imports no ``PySide6``.
+"""
+from typing import Any, Dict, Optional, Sequence
+
+Element = Dict[str, Any]
+
+_EFFECT_SCORE = {"changed_near_target": 1.0, "changed": 0.6,
+                 "changed_elsewhere": 0.3, "no_op": 0.0}
+
+
+def build_critic_record(action: Any, before: Sequence[Element],
+                        after: Sequence[Element], *,
+                        postcondition: Optional[Dict[str, Any]] = None,
+                        radius: int = 64) -> Dict[str, Any]:
+    """Compose a per-step critic record from the before/after observation + action.
+
+    Returns ``{action, effect, delta_counts}`` and, when a ``postcondition`` spec is
+    given, the ``postcondition`` report — the evidence bundle a step critic scores.
+    """
+    from je_auto_control.utils.action_effect import classify_effect
+    from je_auto_control.utils.observation_delta import delta_index
+    verdict = classify_effect(before, after, action, radius=int(radius)).to_dict()
+    delta = delta_index(before, after)
+    record: Dict[str, Any] = {
+        "action": action, "effect": verdict,
+        "delta_counts": {"added": len(delta["added"]),
+                         "removed": len(delta["removed"]),
+                         "changed": len(delta["changed"]),
+                         "stable": len(delta["stable"])}}
+    if postcondition is not None:
+        from je_auto_control.utils.postcondition import check_postcondition
+        record["postcondition"] = check_postcondition(
+            after, postcondition, before=before).to_dict()
+    return record
+
+
+def score_step_rule_based(record: Dict[str, Any]) -> Dict[str, Any]:
+    """Score a critic record deterministically → ``{outcome, process_score, reasons}``.
+
+    ``outcome`` is a binary success (the action did something *and* any postcondition held);
+    ``process_score`` is a 0..1 quality from the effect class, halved if the postcondition
+    failed.
+    """
+    effect = record["effect"]["effect"]
+    process = _EFFECT_SCORE.get(effect, 0.0)
+    report = record.get("postcondition")
+    postcondition_ok = report["ok"] if report else True
+    reasons = [f"effect={effect}"]
+    if report is not None:
+        reasons.append(f"postcondition={'ok' if postcondition_ok else 'failed'}")
+    return {"outcome": effect != "no_op" and postcondition_ok,
+            "process_score": round(process * (1.0 if postcondition_ok else 0.5), 4),
+            "reasons": reasons}
+
+
+def to_judge_prompt(record: Dict[str, Any]) -> str:
+    """Render a critic record as a compact text block for an LLM-as-judge."""
+    counts = record["delta_counts"]
+    lines = [f"Action: {record['action']}",
+             f"Effect: {record['effect']['effect']} ({record['effect']['reason']})",
+             f"Changed: +{counts['added']} -{counts['removed']} ~{counts['changed']}"]
+    report = record.get("postcondition")
+    if report is not None:
+        lines.append(f"Postcondition ok: {report['ok']} "
+                     f"(failed: {report['failed']})")
+    return "\n".join(lines)
diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py
index cc4bd1b7..8b66803b 100644
--- a/je_auto_control/utils/executor/action_executor.py
+++ b/je_auto_control/utils/executor/action_executor.py
@@ -3537,6 +3537,26 @@ def _detect_lists(lines: Any) -> Dict[str, Any]:
     return {"count": len(items), "items": items}
 
 
+def _classify_lines(lines: Any, heading_ratio: Any = 1.2) -> Dict[str, Any]:
+    """Adapter: classify OCR lines as headings vs body with levels."""
+    import json
+    from je_auto_control.utils.heading_segment import classify_lines
+    if isinstance(lines, str):
+        lines = json.loads(lines)
+    classified = classify_lines(lines, heading_ratio=float(heading_ratio))
+    return {"count": len(classified), "lines": classified}
+
+
+def _outline(lines: Any, heading_ratio: Any = 1.2) -> Dict[str, Any]:
+    """Adapter: the document outline (headings in order) from OCR lines."""
+    import json
+    from je_auto_control.utils.heading_segment import outline
+    if isinstance(lines, str):
+        lines = json.loads(lines)
+    headings = outline(lines, heading_ratio=float(heading_ratio))
+    return {"count": len(headings), "headings": headings}
+
+
 def _find_color_region(rgb: Any, tolerance: Any = 20, min_area: Any = 50,
                        region: Any = None) -> Dict[str, Any]:
     """Adapter: locate coloured regions on the screen, largest first."""
@@ -4232,6 +4252,45 @@ def _consensus_element(candidates: Any, elements: Any) -> Dict[str, Any]:
             "agreement": winner[1] if winner else 0.0}
 
 
+def _settle_point(churns: Any, quiet_samples: Any = 3,
+                  max_churn: Any = 1.0) -> Dict[str, Any]:
+    """Adapter: index at which a churn series first settles (or settled=False)."""
+    import json
+    from je_auto_control.utils.settle_detector import settle_point
+    if isinstance(churns, str):
+        churns = json.loads(churns)
+    index = settle_point([float(c) for c in churns],
+                         quiet_samples=int(quiet_samples),
+                         max_churn=float(max_churn))
+    return {"settled": index is not None, "index": index}
+
+
+def _build_critic_record(action: Any, before: Any, after: Any,
+                         postcondition: Any = None, radius: Any = 64) -> Dict[str, Any]:
+    """Adapter: per-step critic feature bundle (effect + delta + postcondition)."""
+    import json
+    from je_auto_control.utils.critic_features import build_critic_record
+    if isinstance(action, str):
+        action = json.loads(action)
+    if isinstance(before, str):
+        before = json.loads(before)
+    if isinstance(after, str):
+        after = json.loads(after)
+    if isinstance(postcondition, str):
+        postcondition = json.loads(postcondition) if postcondition.strip() else None
+    return build_critic_record(action, before, after, postcondition=postcondition,
+                               radius=int(radius))
+
+
+def _score_step(record: Any) -> Dict[str, Any]:
+    """Adapter: rule-based score of a critic record."""
+    import json
+    from je_auto_control.utils.critic_features import score_step_rule_based
+    if isinstance(record, str):
+        record = json.loads(record)
+    return score_step_rule_based(record)
+
+
 def _validate_action(action: Any, screen: Any = None,
                      targets: Any = None) -> Dict[str, Any]:
     """Adapter: validate a coordinate action (bounds + optional snap-to-target)."""
@@ -6076,6 +6135,8 @@ def __init__(self):
             "AC_xy_cut": _xy_cut,
             "AC_group_paragraphs": _group_paragraphs,
             "AC_detect_lists": _detect_lists,
+            "AC_classify_lines": _classify_lines,
+            "AC_outline": _outline,
             "AC_ssim_compare": _ssim_compare,
             "AC_ssim_changed_regions": _ssim_changed_regions,
             "AC_feature_match": _feature_match,
@@ -6121,6 +6182,9 @@ def __init__(self):
             "AC_plan_repair": _plan_repair,
             "AC_consensus_point": _consensus_point,
             "AC_consensus_element": _consensus_element,
+            "AC_settle_point": _settle_point,
+            "AC_build_critic_record": _build_critic_record,
+            "AC_score_step": _score_step,
             "AC_validate_action": _validate_action,
             "AC_replay_trace": _replay_trace,
             "AC_match_elements": _match_elements,
diff --git a/je_auto_control/utils/heading_segment/__init__.py b/je_auto_control/utils/heading_segment/__init__.py
new file mode 100644
index 00000000..638ff8e5
--- /dev/null
+++ b/je_auto_control/utils/heading_segment/__init__.py
@@ -0,0 +1,6 @@
+"""Classify OCR lines as headings vs body and build a document outline."""
+from je_auto_control.utils.heading_segment.heading_segment import (
+    classify_lines, outline,
+)
+
+__all__ = ["classify_lines", "outline"]
diff --git a/je_auto_control/utils/heading_segment/heading_segment.py b/je_auto_control/utils/heading_segment/heading_segment.py
new file mode 100644
index 00000000..52068c8d
--- /dev/null
+++ b/je_auto_control/utils/heading_segment/heading_segment.py
@@ -0,0 +1,63 @@
+"""Classify OCR lines as headings vs body and build a document outline.
+
+Nothing in the framework maps line height to heading levels or builds a section outline —
+``ocr/structure`` and ``element_parse`` are purely positional, and ``text_blocks`` groups
+paragraphs / lists but does not rank them. ``heading_segment`` adds the standard heuristic:
+a line whose height exceeds ``heading_ratio`` times the median line height is a heading, and
+distinct heading heights become heading *levels* (the tallest is level 1). From that it emits
+a flat document outline.
+
+Pure-stdlib over plain line dicts (text + bbox); fully unit-testable with no image and no OCR
+engine. Reuses ``table_grid_fill``'s box-bounds reader. Imports no ``PySide6``.
+"""
+from typing import Any, Dict, List, Sequence
+
+from je_auto_control.utils.table_grid_fill.table_grid_fill import _box_bounds
+
+Line = Dict[str, Any]
+
+
+def _height(line: Line) -> int:
+    _, top, _, bottom = _box_bounds(line)
+    return bottom - top
+
+
+def _box(line: Line) -> Dict[str, int]:
+    left, top, right, bottom = _box_bounds(line)
+    return {"left": left, "top": top, "right": right, "bottom": bottom}
+
+
+def classify_lines(lines: Sequence[Line], *,
+                   heading_ratio: float = 1.2) -> List[Dict[str, Any]]:
+    """Tag each line as a heading or body line with a heading ``level``.
+
+    A line taller than ``heading_ratio`` x the median line height is a heading; distinct
+    heading heights map to levels (tallest = 1). Body lines get ``level`` 0. Returns
+    ``{box, text, role, level}`` per line, in input order.
+    """
+    if not lines:
+        return []
+    heights = sorted(_height(line) for line in lines)
+    threshold = heights[len(heights) // 2] * float(heading_ratio)
+    heading_heights = sorted({_height(line) for line in lines
+                              if _height(line) > threshold}, reverse=True)
+    level_of = {height: index + 1 for index, height in enumerate(heading_heights)}
+    classified: List[Dict[str, Any]] = []
+    for line in lines:
+        height = _height(line)
+        is_heading = height > threshold
+        classified.append({"box": _box(line),
+                           "text": str(line.get("text", "")),
+                           "role": "heading" if is_heading else "body",
+                           "level": level_of.get(height, 0) if is_heading else 0})
+    return classified
+
+
+def outline(lines: Sequence[Line], *,
+            heading_ratio: float = 1.2) -> List[Dict[str, Any]]:
+    """Return the document outline: the headings in top-to-bottom order with levels."""
+    headings = [item for item in classify_lines(lines, heading_ratio=heading_ratio)
+                if item["role"] == "heading"]
+    headings.sort(key=lambda item: item["box"]["top"])
+    return [{"level": item["level"], "text": item["text"],
+             "top": item["box"]["top"]} for item in headings]
diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py
index 2793876e..a2c6829c 100644
--- a/je_auto_control/utils/mcp_server/tools/_factories.py
+++ b/je_auto_control/utils/mcp_server/tools/_factories.py
@@ -3439,6 +3439,48 @@ def observation_tools() -> List[MCPTool]:
             handler=h.consensus_element,
             annotations=READ_ONLY,
         ),
+        MCPTool(
+            name="ac_settle_point",
+            description=("Decide when a UI settled from a 'churns' series (how much "
+                         "changed each sample). Returns {settled, index} — the index "
+                         "where churn first stayed <= 'max_churn' for 'quiet_samples' "
+                         "in a row (a spike resets the run). Feed pixel deltas / "
+                         "element-count deltas / 0-1 digest-changed flags."),
+            input_schema=schema({
+                "churns": {"type": "array", "items": {"type": "number"}},
+                "quiet_samples": {"type": "integer"},
+                "max_churn": {"type": "number"}},
+                required=["churns"]),
+            handler=h.settle_point,
+            annotations=READ_ONLY,
+        ),
+        MCPTool(
+            name="ac_build_critic_record",
+            description=("Build a per-step critic record from 'action' + 'before' / "
+                         "'after' element lists (+ optional 'postcondition' spec): "
+                         "composes effect / delta-counts / postcondition into "
+                         "{action, effect, delta_counts, postcondition?} — the "
+                         "evidence a step critic scores."),
+            input_schema=schema({
+                "action": {"type": "object"},
+                "before": {"type": "array", "items": {"type": "object"}},
+                "after": {"type": "array", "items": {"type": "object"}},
+                "postcondition": {"type": "object"},
+                "radius": {"type": "integer"}},
+                required=["action", "before", "after"]),
+            handler=h.build_critic_record,
+            annotations=READ_ONLY,
+        ),
+        MCPTool(
+            name="ac_score_step",
+            description=("Rule-based score of a critic 'record' (from "
+                         "ac_build_critic_record): {outcome (binary success), "
+                         "process_score (0..1), reasons}. Deterministic, no model."),
+            input_schema=schema({"record": {"type": "object"}},
+                                required=["record"]),
+            handler=h.score_step,
+            annotations=READ_ONLY,
+        ),
     ]
 
 
@@ -3975,6 +4017,31 @@ def screen_grid_tools() -> List[MCPTool]:
             handler=h.detect_lists,
             annotations=READ_ONLY,
         ),
+        MCPTool(
+            name="ac_classify_lines",
+            description=("Classify OCR 'lines' as headings vs body by height: a line "
+                         "taller than 'heading_ratio' x the median line height is a "
+                         "heading, and distinct heading heights become levels (tallest "
+                         "= 1). Returns {count, lines:[{box,text,role,level}]}."),
+            input_schema=schema({
+                "lines": {"type": "array", "items": {"type": "object"}},
+                "heading_ratio": {"type": "number"}},
+                required=["lines"]),
+            handler=h.classify_lines,
+            annotations=READ_ONLY,
+        ),
+        MCPTool(
+            name="ac_outline",
+            description=("Return the document outline from OCR 'lines' — the headings "
+                         "in top-to-bottom order with levels. Returns {count, "
+                         "headings:[{level,text,top}]}."),
+            input_schema=schema({
+                "lines": {"type": "array", "items": {"type": "object"}},
+                "heading_ratio": {"type": "number"}},
+                required=["lines"]),
+            handler=h.outline,
+            annotations=READ_ONLY,
+        ),
     ]
 
 
diff --git a/je_auto_control/utils/mcp_server/tools/_handlers.py b/je_auto_control/utils/mcp_server/tools/_handlers.py
index 8a6fc2ef..38ec81cd 100644
--- a/je_auto_control/utils/mcp_server/tools/_handlers.py
+++ b/je_auto_control/utils/mcp_server/tools/_handlers.py
@@ -2205,6 +2205,16 @@ def detect_lists(lines):
     return _detect_lists(lines)
 
 
+def classify_lines(lines, heading_ratio=1.2):
+    from je_auto_control.utils.executor.action_executor import _classify_lines
+    return _classify_lines(lines, heading_ratio)
+
+
+def outline(lines, heading_ratio=1.2):
+    from je_auto_control.utils.executor.action_executor import _outline
+    return _outline(lines, heading_ratio)
+
+
 def find_color_region(rgb, tolerance=20, min_area=50, region=None):
     from je_auto_control.utils.executor.action_executor import (
         _find_color_region)
@@ -2478,6 +2488,21 @@ def consensus_element(candidates, elements):
     return _consensus_element(candidates, elements)
 
 
+def settle_point(churns, quiet_samples=3, max_churn=1.0):
+    from je_auto_control.utils.executor.action_executor import _settle_point
+    return _settle_point(churns, quiet_samples, max_churn)
+
+
+def build_critic_record(action, before, after, postcondition=None, radius=64):
+    from je_auto_control.utils.executor.action_executor import _build_critic_record
+    return _build_critic_record(action, before, after, postcondition, radius)
+
+
+def score_step(record):
+    from je_auto_control.utils.executor.action_executor import _score_step
+    return _score_step(record)
+
+
 def validate_action(action, screen=None, targets=None):
     from je_auto_control.utils.executor.action_executor import _validate_action
     return _validate_action(action, screen, targets)
diff --git a/je_auto_control/utils/settle_detector/__init__.py b/je_auto_control/utils/settle_detector/__init__.py
new file mode 100644
index 00000000..d41630a0
--- /dev/null
+++ b/je_auto_control/utils/settle_detector/__init__.py
@@ -0,0 +1,6 @@
+"""Decide when a UI has settled, as a pure seam over a churn series."""
+from je_auto_control.utils.settle_detector.settle_detector import (
+    SettleState, SettleTracker, is_settled, settle_point,
+)
+
+__all__ = ["SettleState", "SettleTracker", "settle_point", "is_settled"]
diff --git a/je_auto_control/utils/settle_detector/settle_detector.py b/je_auto_control/utils/settle_detector/settle_detector.py
new file mode 100644
index 00000000..d729480b
--- /dev/null
+++ b/je_auto_control/utils/settle_detector/settle_detector.py
@@ -0,0 +1,70 @@
+"""Decide when a UI has settled, as a pure seam over a churn series.
+
+``smart_waits.wait_until_screen_stable`` and ``actionability``'s stability check bake the
+settle logic *inside* a ``time.sleep`` polling loop over live pixel frames — you cannot feed
+them a recorded series of a11y-element counts or screen-diff metrics, and you cannot unit-test
+the *decision* independently of capture. ``settle_detector`` extracts that decision: it takes a
+stream of *churn* values (how much changed each sample — pixel delta, element-count delta, a
+digest-changed 0/1, anything) and reports when the churn has stayed at or below ``max_churn``
+for ``quiet_samples`` in a row. A spike resets the quiet run, so "settled then changed again"
+is handled.
+
+Pure-stdlib; deterministic and unit-testable on an injected series with no capture, no clock.
+Imports no ``PySide6``.
+"""
+from dataclasses import asdict, dataclass
+from typing import Any, Dict, Optional, Sequence
+
+
+@dataclass(frozen=True)
+class SettleState:
+    """One settle observation: whether settled, the quiet run length, latest churn."""
+
+    settled: bool
+    quiet_run: int
+    churn: float
+
+    def to_dict(self) -> Dict[str, Any]:
+        """Return the state as a plain dict."""
+        return asdict(self)
+
+
+class SettleTracker:
+    """Incremental settle detector: feed churn values, ask if it has gone quiet."""
+
+    def __init__(self, quiet_samples: int = 3, max_churn: float = 1.0) -> None:
+        """Settle after ``quiet_samples`` consecutive churns <= ``max_churn``."""
+        self.quiet_samples = int(quiet_samples)
+        self.max_churn = float(max_churn)
+        self.quiet_run = 0
+
+    def update(self, churn: float) -> SettleState:
+        """Feed the next churn value and return the current settle state."""
+        churn = float(churn)
+        if churn <= self.max_churn:
+            self.quiet_run += 1
+        else:
+            self.quiet_run = 0
+        return SettleState(self.quiet_run >= self.quiet_samples, self.quiet_run,
+                           churn)
+
+    def reset(self) -> None:
+        """Clear the quiet run (e.g. after acting again)."""
+        self.quiet_run = 0
+
+
+def settle_point(churns: Sequence[float], *, quiet_samples: int = 3,
+                 max_churn: float = 1.0) -> Optional[int]:
+    """Return the index at which the churn series first becomes settled, or ``None``."""
+    tracker = SettleTracker(quiet_samples, max_churn)
+    for index, churn in enumerate(churns):
+        if tracker.update(churn).settled:
+            return index
+    return None
+
+
+def is_settled(churns: Sequence[float], *, quiet_samples: int = 3,
+               max_churn: float = 1.0) -> bool:
+    """Return whether the churn series settles at any point."""
+    return settle_point(churns, quiet_samples=quiet_samples,
+                        max_churn=max_churn) is not None
diff --git a/test/unit_test/headless/test_critic_features_batch.py b/test/unit_test/headless/test_critic_features_batch.py
new file mode 100644
index 00000000..7e0fbd12
--- /dev/null
+++ b/test/unit_test/headless/test_critic_features_batch.py
@@ -0,0 +1,70 @@
+"""Headless tests for per-step critic features + rule-based scorer (pure stdlib)."""
+import je_auto_control as ac
+from je_auto_control.utils.critic_features import (
+    build_critic_record, score_step_rule_based, to_judge_prompt,
+)
+
+
+def _el(x, y, name="", role="button"):
+    return dict(x=x, y=y, width=40, height=20, role=role, name=name)
+
+
+def test_record_captures_effect_and_delta():
+    before = [_el(0, 0, "A")]
+    after = [_el(0, 0, "A"), _el(40, 40, "Popup", role="dialog")]
+    record = build_critic_record({"x": 50, "y": 50}, before, after)
+    assert record["effect"]["effect"] == "changed_near_target"
+    assert record["delta_counts"]["added"] == 1
+
+
+def test_score_good_step():
+    before = [_el(0, 0, "A")]
+    after = [_el(0, 0, "A"), _el(40, 40, "Popup", role="dialog")]
+    score = score_step_rule_based(build_critic_record({"x": 50, "y": 50},
+                                                      before, after))
+    assert score["outcome"] is True
+    assert abs(score["process_score"] - 1.0) < 1e-9
+
+
+def test_score_no_op_fails():
+    frame = [_el(0, 0, "A")]
+    score = score_step_rule_based(build_critic_record({"x": 9, "y": 9},
+                                                      frame, list(frame)))
+    assert score["outcome"] is False
+    assert abs(score["process_score"]) < 1e-9
+
+
+def test_postcondition_failure_lowers_outcome():
+    before = [_el(0, 0, "A")]
+    after = [_el(0, 0, "A"), _el(40, 40, "Popup", role="dialog")]
+    spec = {"appears": {"role": "menu"}}          # a menu that never appears
+    record = build_critic_record({"x": 50, "y": 50}, before, after,
+                                 postcondition=spec)
+    score = score_step_rule_based(record)
+    assert score["outcome"] is False              # effect ok but postcondition failed
+    assert record["postcondition"]["ok"] is False
+
+
+def test_to_judge_prompt_mentions_effect():
+    before = [_el(0, 0, "A")]
+    after = [_el(0, 0, "A"), _el(40, 40, "P", role="dialog")]
+    text = to_judge_prompt(build_critic_record({"x": 50, "y": 50}, before, after))
+    assert "Effect:" in text and "changed_near_target" in text
+
+
+# --- wiring ---------------------------------------------------------------
+
+def test_wiring():
+    known = set(ac.executor.known_commands())
+    assert {"AC_build_critic_record", "AC_score_step"} <= known
+    from je_auto_control.utils.mcp_server.tools import build_default_tool_registry
+    names = {t.name for t in build_default_tool_registry()}
+    assert {"ac_build_critic_record", "ac_score_step"} <= names
+    from je_auto_control.gui.script_builder.command_schema import _build_specs
+    specs = {s.command for s in _build_specs()}
+    assert {"AC_build_critic_record", "AC_score_step"} <= specs
+
+
+def test_facade_exports():
+    for name in ("build_critic_record", "score_step_rule_based", "to_judge_prompt"):
+        assert hasattr(ac, name) and name in ac.__all__
diff --git a/test/unit_test/headless/test_heading_segment_batch.py b/test/unit_test/headless/test_heading_segment_batch.py
new file mode 100644
index 00000000..67add083
--- /dev/null
+++ b/test/unit_test/headless/test_heading_segment_batch.py
@@ -0,0 +1,58 @@
+"""Headless tests for heading vs body classification + outline (pure stdlib)."""
+import je_auto_control as ac
+from je_auto_control.utils.heading_segment import classify_lines, outline
+
+
+def _line(y, text, h=20, x=0, w=200):
+    return {"x": x, "y": y, "width": w, "height": h, "text": text}
+
+
+def _doc():
+    # one big title (h=40), some body (h=20), a smaller heading (h=30)
+    return [_line(0, "Title", h=40), _line(50, "body one"),
+            _line(75, "body two"), _line(110, "Subsection", h=30),
+            _line(145, "more body")]
+
+
+def test_classify_marks_headings_and_levels():
+    by_text = {c["text"]: c for c in classify_lines(_doc(), heading_ratio=1.2)}
+    assert by_text["Title"]["role"] == "heading"
+    assert by_text["Subsection"]["role"] == "heading"
+    assert by_text["body one"]["role"] == "body"
+    # tallest heading is level 1, the next distinct height is level 2
+    assert by_text["Title"]["level"] == 1
+    assert by_text["Subsection"]["level"] == 2
+
+
+def test_body_only_has_no_headings():
+    lines = [_line(0, "a"), _line(25, "b"), _line(50, "c")]
+    assert all(c["role"] == "body" for c in classify_lines(lines))
+
+
+def test_outline_lists_headings_in_order():
+    result = outline(_doc(), heading_ratio=1.2)
+    assert [h["text"] for h in result] == ["Title", "Subsection"]
+    assert [h["level"] for h in result] == [1, 2]
+
+
+def test_empty():
+    assert classify_lines([]) == []
+    assert outline([]) == []
+
+
+# --- wiring ---------------------------------------------------------------
+
+def test_wiring():
+    known = set(ac.executor.known_commands())
+    assert {"AC_classify_lines", "AC_outline"} <= known
+    from je_auto_control.utils.mcp_server.tools import build_default_tool_registry
+    names = {t.name for t in build_default_tool_registry()}
+    assert {"ac_classify_lines", "ac_outline"} <= names
+    from je_auto_control.gui.script_builder.command_schema import _build_specs
+    specs = {s.command for s in _build_specs()}
+    assert {"AC_classify_lines", "AC_outline"} <= specs
+
+
+def test_facade_exports():
+    for name in ("classify_lines", "outline"):
+        assert hasattr(ac, name) and name in ac.__all__
diff --git a/test/unit_test/headless/test_settle_detector_batch.py b/test/unit_test/headless/test_settle_detector_batch.py
new file mode 100644
index 00000000..dbb89ff4
--- /dev/null
+++ b/test/unit_test/headless/test_settle_detector_batch.py
@@ -0,0 +1,52 @@
+"""Headless tests for the settle decision over a churn series (pure stdlib)."""
+import je_auto_control as ac
+from je_auto_control.utils.settle_detector import (
+    SettleTracker, is_settled, settle_point,
+)
+
+
+def test_settle_point_after_quiet_run():
+    # 5, 4 are noisy; then three values <= 1.0 → settled at index 4
+    assert settle_point([5, 4, 0.5, 0.3, 0.2], quiet_samples=3,
+                        max_churn=1.0) == 4
+
+
+def test_spike_resets_quiet_run():
+    # quiet, quiet, SPIKE, quiet x3 → settles only at the final index
+    assert settle_point([0.2, 0.2, 5, 0.1, 0.1, 0.1], quiet_samples=3,
+                        max_churn=1.0) == 5
+
+
+def test_never_settles_is_none():
+    assert settle_point([5, 4, 3], quiet_samples=2, max_churn=1.0) is None
+
+
+def test_is_settled_bool():
+    assert is_settled([0.1, 0.1], quiet_samples=2, max_churn=1.0) is True
+    assert is_settled([9, 8], quiet_samples=2, max_churn=1.0) is False
+
+
+def test_tracker_incremental_and_reset():
+    tracker = SettleTracker(quiet_samples=2, max_churn=1.0)
+    assert tracker.update(0.5).settled is False
+    state = tracker.update(0.4)
+    assert state.settled is True and state.quiet_run == 2
+    tracker.reset()
+    assert tracker.update(0.3).settled is False   # run cleared
+
+
+# --- wiring ---------------------------------------------------------------
+
+def test_wiring():
+    assert "AC_settle_point" in set(ac.executor.known_commands())
+    from je_auto_control.utils.mcp_server.tools import build_default_tool_registry
+    names = {t.name for t in build_default_tool_registry()}
+    assert "ac_settle_point" in names
+    from je_auto_control.gui.script_builder.command_schema import _build_specs
+    specs = {s.command for s in _build_specs()}
+    assert "AC_settle_point" in specs
+
+
+def test_facade_exports():
+    for name in ("settle_point", "is_settled", "SettleTracker", "SettleState"):
+        assert hasattr(ac, name) and name in ac.__all__