Integration-Automation · JE-Chen · Jun 26, 2026 · Jun 26, 2026 · Jun 26, 2026 · Jun 26, 2026
diff --git a/WHATS_NEW.md b/WHATS_NEW.md
@@ -2,6 +2,24 @@
 
 ## What's new (2026-06-26)
 
+### Template-Free Element Proposal (Pixels to Elements)
+
+Get a clean numbered element list straight from the screen when there's no accessibility tree. Full reference: [`docs/source/Eng/doc/new_features/v220_features_doc.rst`](docs/source/Eng/doc/new_features/v220_features_doc.rst).
+
+- **`propose_elements` / `tag_kinds`** (`AC_propose_elements`, `AC_tag_kinds`): Set-of-Marks, `observation` and the grounding helpers all assume you already have element boxes — but a game, a custom-drawn app or a remote desktop has no accessibility tree. `propose_elements` builds that top-of-funnel list from pixels: detect widget boxes (closed-edge blobs via Canny + morphology + `connected_boxes`) and text boxes (`text_regions.find_text_regions`), fuse them — the `element_parse` `ocr > icon` priority *is* the "drop widget-that-is-really-text" cross-check — and return them in reading order, each tagged `text` or `widget`. `tag_kinds` is the pure labeller. cv2 imported lazily; the labeller is fully testable. Seventh and final feature of the ROUND-15 perception lane. No `PySide6`.
+
+### Classify a Widget from Its Pixel Shape
+
+Tell a checkbox from a radio button from a text field — from pixels, no model. Full reference: [`docs/source/Eng/doc/new_features/v219_features_doc.rst`](docs/source/Eng/doc/new_features/v219_features_doc.rst).
+
+- **`classify_widget` / `box_features` / `classify_icon`** (`AC_classify_widget`, `AC_classify_icon`): Set-of-Marks and element proposers return *boxes* but not *what each box is*; `form_fields.checkbox_state` reads a box already known to be a checkbox — the gap is the typing step before it. `box_features` extracts `{aspect, fill, edge_density, circularity}` for a box; `classify_widget` is the pure heuristic classifier (round→radio, wide-rounded→toggle, square-sparse→checkbox, wide-hollow→text_field, wide-filled→button, else icon); `classify_icon` composes them. The classifier is pure and fully testable; cv2/numpy imported lazily so the module stays importable. Sixth feature of the ROUND-15 perception lane. No `PySide6`.
+
+### Localize a Change to the Elements That Changed
+
+Turn a raw screen diff into "element 3 changed" by scoring a list of element boxes. Full reference: [`docs/source/Eng/doc/new_features/v218_features_doc.rst`](docs/source/Eng/doc/new_features/v218_features_doc.rst).
+
+- **`localize_changes` / `rank_changes`** (`AC_localize_changes`, `AC_rank_changes`): existing diffs answer *where* pixels changed (`motion_regions`, `perceptual_diff`, `ssim_changed_regions` → raw pixel regions) or which *accessibility* elements differ (`element_diff`, needs metadata) — but not "given a frame diff **and a list of element boxes**, which of *those* changed?". `localize_changes` diffs a reference against the current screen and scores each supplied element box by its mean per-pixel change; `rank_changes` is the pure ranker that flags `changed` (score ≥ `threshold`) and sorts most-changed first. Pairs with `set_of_marks`/accessibility boxes to give a per-element "what changed" feedback signal after a click. cv2/numpy imported lazily; ranking is pure and fully testable. Fifth feature of the ROUND-15 perception lane. No `PySide6`.
+
 ### Theme-Invariant Matching (Light Template, Dark Mode)
 
 Find a button captured in light mode even after the app switches to dark mode. Full reference: [`docs/source/Eng/doc/new_features/v217_features_doc.rst`](docs/source/Eng/doc/new_features/v217_features_doc.rst).

diff --git a/docs/source/Eng/doc/new_features/v218_features_doc.rst b/docs/source/Eng/doc/new_features/v218_features_doc.rst
@@ -0,0 +1,50 @@
+Localize a Change to the Elements That Changed
+==============================================
+
+The existing diffs answer "*where* did pixels change" (``motion_regions``,
+``perceptual_diff``, ``ssim_changed_regions`` return raw pixel regions) or "which
+*accessibility* elements differ" (``element_diff``, needs a11y metadata). The
+missing middle is: given a frame diff **and a list of element boxes**, which of
+*those* elements changed? ``change_localize`` scores each supplied box by how
+much it changed and ranks them.
+
+* :func:`rank_changes` — pure: take ``[{box, score}]`` and mark each box
+  ``changed`` (score at or above ``threshold``), sorted most-changed first.
+* :func:`localize_changes` — diff a reference against the current screen, score
+  each element box by its mean pixel change, and rank them.
+
+``cv2`` / ``numpy`` are imported lazily (the module stays importable without
+them) and the loaders reuse :mod:`visual_match`. The ranking is pure and fully
+testable. Imports no ``PySide6``.
+
+Headless API
+------------
+
+.. code-block:: python
+
+    from je_auto_control import localize_changes, rank_changes, mark_elements
+
+    boxes = [mark["bbox"] for mark in mark_elements(elements)]
+
+    # After an action, which of those elements actually changed?
+    changed = localize_changes("before.png", boxes, current="after.png")
+    for entry in changed:
+        if entry["changed"]:
+            print("element changed:", entry["box"], entry["score"])
+
+    # Or rank pre-computed scores yourself:
+    rank_changes([{"box": [0, 0, 40, 20], "score": 0.6}], threshold=0.1)
+
+``localize_changes`` returns ``[{box, score, changed}]`` sorted most-changed
+first, where ``score`` is the box's mean per-pixel change (0..1). It pairs with
+``set_of_marks`` / accessibility element boxes to turn a raw screen diff into a
+per-element "what changed" signal — an agent feedback channel after a click.
+
+Executor commands
+-----------------
+
+``AC_localize_changes`` (``reference`` + ``boxes`` JSON list + ``current`` /
+``threshold`` / ``region`` → ``{changes}``) and ``AC_rank_changes``
+(``scored_boxes`` JSON list + ``threshold`` → ``{changes}``, pure). They are the
+matching read-only ``ac_*`` MCP tools and Script Builder commands under
+**Image**.
diff --git a/docs/source/Eng/doc/new_features/v219_features_doc.rst b/docs/source/Eng/doc/new_features/v219_features_doc.rst
@@ -0,0 +1,46 @@
+Classify a Widget from Its Pixel Shape
+======================================
+
+Set-of-Marks and element proposers hand back *boxes*, but not *what each box is*.
+``form_fields.checkbox_state`` already reads a box known to be a checkbox; the
+gap is the typing step before it — is this box a checkbox, a radio button, a push
+button, a text field or a toggle? ``icon_classify`` answers that from cheap
+geometric features (no model).
+
+* :func:`box_features` — extract ``{aspect, fill, edge_density, circularity}``
+  for a box region (the objective measurements).
+* :func:`classify_widget` — pure: map a feature dict to a widget type by
+  documented heuristics.
+* :func:`classify_icon` — compose the two: a box to ``{type, features}``.
+
+``classify_widget`` is pure and fully testable; ``box_features`` imports cv2 /
+numpy lazily (the module stays importable without them) and reuses
+:func:`visual_match._to_gray`. Imports no ``PySide6``.
+
+Headless API
+------------
+
+.. code-block:: python
+
+    from je_auto_control import classify_icon, classify_widget
+
+    # From a screenshot + a box:
+    classify_icon("dialog.png", [120, 80, 16, 16])
+    # {'type': 'checkbox', 'features': {'aspect': 1.0, 'fill': 0.12, ...}}
+
+    # From features you already have:
+    classify_widget({"aspect": 1.0, "circularity": 0.9, "fill": 0.4})  # 'radio'
+
+The heuristics: a round box (aspect ≈ 1, high circularity) is a ``radio``; a wide
+rounded box is a ``toggle``; a near-square sparse box is a ``checkbox``; a wide
+hollow box is a ``text_field``; a wide filled box is a ``button``; anything else
+is an ``icon``. Tune by reading ``features`` and applying your own rules where
+the defaults misfire — the measurements are the durable part.
+
+Executor commands
+-----------------
+
+``AC_classify_widget`` (``features`` JSON object → ``{type}``, pure) and
+``AC_classify_icon`` (``source`` image + ``box`` ``[x, y, w, h]`` →
+``{type, features}``). They are the matching read-only ``ac_*`` MCP tools and
+Script Builder commands under **Image**.
diff --git a/docs/source/Eng/doc/new_features/v220_features_doc.rst b/docs/source/Eng/doc/new_features/v220_features_doc.rst
@@ -0,0 +1,51 @@
+Template-Free Element Proposal (Pixels to Elements)
+===================================================
+
+Set-of-Marks, ``observation`` and the grounding helpers all assume you already
+have a list of element boxes — but on a screen the framework doesn't model
+(a game, a custom-drawn app, a remote desktop) there is no accessibility tree to
+provide one. ``element_proposal`` builds that top-of-funnel list from pixels:
+detect candidate *widget* boxes (closed-edge blobs) and *text* boxes
+(:func:`text_regions.find_text_regions`), fuse them — dropping widget boxes that
+are really just text — and return them in reading order, each tagged ``text`` or
+``widget``.
+
+* :func:`propose_elements` — the full pixel-to-elements pipeline.
+* :func:`tag_kinds` — pure: label fused boxes ``text`` / ``widget`` by source and
+  keep their reading-order ``index``.
+
+The fusion / cross-check / ordering reuse :mod:`element_parse` — the ``ocr`` >
+``icon`` source priority *is* the "drop widget-that-is-really-text" check — and
+the text detection reuses :mod:`text_regions`. ``cv2`` is imported lazily so the
+module stays importable; :func:`tag_kinds` is pure and fully testable. Imports no
+``PySide6``.
+
+Headless API
+------------
+
+.. code-block:: python
+
+    from je_auto_control import propose_elements, mark_elements
+
+    # No accessibility tree? Propose elements straight from the screen:
+    elements = propose_elements(min_area=120)
+    # [{'box': [x, y, w, h], 'kind': 'widget', 'index': 0}, ...]
+
+    # Feed them to Set-of-Marks like any other element list:
+    marks = mark_elements(elements)
+
+``propose_elements`` returns ``[{box, kind, index}]`` in reading order, where
+``kind`` is ``text`` or ``widget``. It is the missing top-of-funnel for the
+agent stack on un-modelled UIs: pixels in, a clean numbered element list out,
+ready for marking, observation or grounding. Tune ``min_area`` for the smallest
+control you care about and ``iou_threshold`` for how aggressively overlapping
+text and widget boxes are merged.
+
+Executor commands
+-----------------
+
+``AC_propose_elements`` (``region`` ``[x, y, w, h]`` / ``min_area`` /
+``iou_threshold`` → ``{elements}``) runs the full pipeline on the screen, and
+``AC_tag_kinds`` (``elements`` JSON list → ``{elements}``, pure) labels a
+pre-fused list. They are the matching read-only ``ac_*`` MCP tools and Script
+Builder commands under **Image**.
diff --git a/docs/source/Zh/doc/new_features/v218_features_doc.rst b/docs/source/Zh/doc/new_features/v218_features_doc.rst
@@ -0,0 +1,44 @@
+把變化歸因到實際改變的元素
+==========================
+
+既有的 diff 回答「像素在*哪裡*改變」(``motion_regions``、``perceptual_diff``、
+``ssim_changed_regions`` 回傳原始像素區域),或「哪些*無障礙*元素不同」(``element_diff``,需 a11y 中介資料)。
+缺少的中段是:給定一個畫面 diff **與一份元素方框清單**,*那些*元素中哪些改變了?``change_localize`` 依
+每個提供的方框改變多少評分並排序。
+
+* :func:`rank_changes` ——純函式:接受 ``[{box, score}]`` 並把每個方框標記為 ``changed``
+  (分數達到或超過 ``threshold``),依改變最多排在最前。
+* :func:`localize_changes` ——把參考影像對目前螢幕做 diff,依每個元素方框的平均像素改變評分,再排序。
+
+``cv2`` / ``numpy`` 採延遲匯入(模組無需它們即可匯入),載入器重用 :mod:`visual_match`。
+排序為純函式且可完整測試。不匯入 ``PySide6``。
+
+無頭 API
+--------
+
+.. code-block:: python
+
+    from je_auto_control import localize_changes, rank_changes, mark_elements
+
+    boxes = [mark["bbox"] for mark in mark_elements(elements)]
+
+    # 某動作後,那些元素中哪些真的改變了?
+    changed = localize_changes("before.png", boxes, current="after.png")
+    for entry in changed:
+        if entry["changed"]:
+            print("元素改變:", entry["box"], entry["score"])
+
+    # 或自行排序預先算好的分數:
+    rank_changes([{"box": [0, 0, 40, 20], "score": 0.6}], threshold=0.1)
+
+``localize_changes`` 回傳 ``[{box, score, changed}]`` 依改變最多排序,``score`` 是方框的平均
+逐像素改變(0..1)。它與 ``set_of_marks`` / 無障礙元素方框搭配,把原始螢幕 diff 轉成逐元素的
+「什麼改變了」訊號——點擊後的 agent 回饋通道。
+
+執行器指令
+----------
+
+``AC_localize_changes``(``reference`` 加上 ``boxes`` JSON 清單加上 ``current`` /
+``threshold`` / ``region`` → ``{changes}``)與 ``AC_rank_changes``(``scored_boxes`` JSON 清單加上
+``threshold`` → ``{changes}``,純函式)。皆以對應的唯讀 ``ac_*`` MCP 工具及 Script Builder 指令
+(位於 **Image** 分類下)形式提供。
diff --git a/docs/source/Zh/doc/new_features/v219_features_doc.rst b/docs/source/Zh/doc/new_features/v219_features_doc.rst
@@ -0,0 +1,38 @@
+從像素形狀分類控制項
+====================
+
+Set-of-Marks 與元素提案器回傳*方框*,卻不告訴你*每個方框是什麼*。``form_fields.checkbox_state``
+已能讀取一個已知是核取方塊的方框;缺少的是它之前的分類步驟——這個方框是核取方塊、單選鈕、按鈕、
+文字欄位還是切換開關?``icon_classify`` 從低成本的幾何特徵(無需模型)回答此問題。
+
+* :func:`box_features` ——擷取方框區域的 ``{aspect, fill, edge_density, circularity}``(客觀量測)。
+* :func:`classify_widget` ——純函式:以記載的啟發式規則把特徵字典映射為控制項型別。
+* :func:`classify_icon` ——組合兩者:把一個方框轉為 ``{type, features}``。
+
+``classify_widget`` 為純函式且可完整測試;``box_features`` 延遲匯入 cv2 / numpy(模組無需它們即可匯入),
+並重用 :func:`visual_match._to_gray`。不匯入 ``PySide6``。
+
+無頭 API
+--------
+
+.. code-block:: python
+
+    from je_auto_control import classify_icon, classify_widget
+
+    # 從截圖 + 方框:
+    classify_icon("dialog.png", [120, 80, 16, 16])
+    # {'type': 'checkbox', 'features': {'aspect': 1.0, 'fill': 0.12, ...}}
+
+    # 從你已有的特徵:
+    classify_widget({"aspect": 1.0, "circularity": 0.9, "fill": 0.4})  # 'radio'
+
+啟發式規則:圓形方框(aspect ≈ 1、高 circularity)為 ``radio``;寬且圓潤為 ``toggle``;
+近正方且稀疏為 ``checkbox``;寬且空心為 ``text_field``;寬且填滿為 ``button``;其餘為 ``icon``。
+在預設誤判處,可讀取 ``features`` 套用你自己的規則微調——量測值才是耐用的部分。
+
+執行器指令
+----------
+
+``AC_classify_widget``(``features`` JSON 物件 → ``{type}``,純函式)與
+``AC_classify_icon``(``source`` 影像 + ``box`` ``[x, y, w, h]`` → ``{type, features}``)。
+皆以對應的唯讀 ``ac_*`` MCP 工具及 Script Builder 指令(位於 **Image** 分類下)形式提供。
diff --git a/docs/source/Zh/doc/new_features/v220_features_doc.rst b/docs/source/Zh/doc/new_features/v220_features_doc.rst
@@ -0,0 +1,42 @@
+免模板元素提案(像素到元素)
+============================
+
+Set-of-Marks、``observation`` 與 grounding 輔助函式都假設你已有一份元素方框清單——但在框架無法
+建模的畫面上(遊戲、自繪 app、遠端桌面),並沒有無障礙樹可提供。``element_proposal`` 從像素建立
+這份漏斗頂端清單:偵測候選*控制項*方框(封閉邊緣 blob)與*文字*方框
+(:func:`text_regions.find_text_regions`),將兩者融合——丟棄其實只是文字的控制項方框——
+並依閱讀順序回傳,每個標記為 ``text`` 或 ``widget``。
+
+* :func:`propose_elements` ——完整的像素到元素管線。
+* :func:`tag_kinds` ——純函式:依來源把融合後的方框標記 ``text`` / ``widget``,並保留其閱讀順序 ``index``。
+
+融合 / 交叉檢查 / 排序重用 :mod:`element_parse`——``ocr`` > ``icon`` 來源優先序*即*「丟棄其實是
+文字的控制項」檢查——文字偵測則重用 :mod:`text_regions`。``cv2`` 採延遲匯入,故模組仍可匯入;
+:func:`tag_kinds` 為純函式且可完整測試。不匯入 ``PySide6``。
+
+無頭 API
+--------
+
+.. code-block:: python
+
+    from je_auto_control import propose_elements, mark_elements
+
+    # 沒有無障礙樹?直接從畫面提案元素:
+    elements = propose_elements(min_area=120)
+    # [{'box': [x, y, w, h], 'kind': 'widget', 'index': 0}, ...]
+
+    # 像任何元素清單一樣餵給 Set-of-Marks:
+    marks = mark_elements(elements)
+
+``propose_elements`` 依閱讀順序回傳 ``[{box, kind, index}]``,``kind`` 為 ``text`` 或 ``widget``。
+它是 agent 堆疊在未建模 UI 上缺少的漏斗頂端:像素進、乾淨的編號元素清單出,可供標記、observation
+或 grounding。以 ``min_area`` 調整你在意的最小控制項,以 ``iou_threshold`` 調整重疊文字與控制項
+方框合併的積極程度。
+
+執行器指令
+----------
+
+``AC_propose_elements``(``region`` ``[x, y, w, h]`` / ``min_area`` /
+``iou_threshold`` → ``{elements}``)在畫面上執行完整管線,``AC_tag_kinds``
+(``elements`` JSON 清單 → ``{elements}``,純函式)則標記預先融合的清單。皆以對應的唯讀
+``ac_*`` MCP 工具及 Script Builder 指令(位於 **Image** 分類下)形式提供。
diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py
@@ -143,6 +143,14 @@
 )
 # Theme-invariant matching so a light template matches dark mode
 from je_auto_control.utils.theme_normalize import match_theme, normalize_theme
+# Attribute a screen change to the specific element boxes that changed
+from je_auto_control.utils.change_localize import localize_changes, rank_changes
+# Classify what kind of widget a box is from its pixel shape
+from je_auto_control.utils.icon_classify import (
+    box_features, classify_icon, classify_widget,
+)
+# Propose a clean element list from raw pixels (template-free)
+from je_auto_control.utils.element_proposal import propose_elements, tag_kinds
 # Rich clipboard formats — RTF + CSV/TSV codecs and Windows get / set
 from je_auto_control.utils.clipboard_rich_formats import (
     build_rtf, csv_to_rows, get_clipboard_csv, get_clipboard_rtf, rows_to_csv,
@@ -1771,6 +1779,9 @@ def start_autocontrol_gui(*args, **kwargs):
     "place_labels", "label_color",
     "grade_contrast", "dominant_pair", "region_contrast",
     "normalize_theme", "match_theme",
+    "localize_changes", "rank_changes",
+    "classify_widget", "box_features", "classify_icon",
+    "propose_elements", "tag_kinds",
     "build_rtf", "rtf_to_text", "rows_to_csv", "csv_to_rows",
     "set_clipboard_rtf", "get_clipboard_rtf",
     "set_clipboard_csv", "get_clipboard_csv",