Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions WHATS_NEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,24 @@

## What's new (2026-06-26)

### Template-Free Element Proposal (Pixels to Elements)

Get a clean numbered element list straight from the screen when there's no accessibility tree. Full reference: [`docs/source/Eng/doc/new_features/v220_features_doc.rst`](docs/source/Eng/doc/new_features/v220_features_doc.rst).

- **`propose_elements` / `tag_kinds`** (`AC_propose_elements`, `AC_tag_kinds`): Set-of-Marks, `observation` and the grounding helpers all assume you already have element boxes — but a game, a custom-drawn app or a remote desktop has no accessibility tree. `propose_elements` builds that top-of-funnel list from pixels: detect widget boxes (closed-edge blobs via Canny + morphology + `connected_boxes`) and text boxes (`text_regions.find_text_regions`), fuse them — the `element_parse` `ocr > icon` priority *is* the "drop widget-that-is-really-text" cross-check — and return them in reading order, each tagged `text` or `widget`. `tag_kinds` is the pure labeller. cv2 imported lazily; the labeller is fully testable. Seventh and final feature of the ROUND-15 perception lane. No `PySide6`.

### Classify a Widget from Its Pixel Shape

Tell a checkbox from a radio button from a text field — from pixels, no model. Full reference: [`docs/source/Eng/doc/new_features/v219_features_doc.rst`](docs/source/Eng/doc/new_features/v219_features_doc.rst).

- **`classify_widget` / `box_features` / `classify_icon`** (`AC_classify_widget`, `AC_classify_icon`): Set-of-Marks and element proposers return *boxes* but not *what each box is*; `form_fields.checkbox_state` reads a box already known to be a checkbox — the gap is the typing step before it. `box_features` extracts `{aspect, fill, edge_density, circularity}` for a box; `classify_widget` is the pure heuristic classifier (round→radio, wide-rounded→toggle, square-sparse→checkbox, wide-hollow→text_field, wide-filled→button, else icon); `classify_icon` composes them. The classifier is pure and fully testable; cv2/numpy imported lazily so the module stays importable. Sixth feature of the ROUND-15 perception lane. No `PySide6`.

### Localize a Change to the Elements That Changed

Turn a raw screen diff into "element 3 changed" by scoring a list of element boxes. Full reference: [`docs/source/Eng/doc/new_features/v218_features_doc.rst`](docs/source/Eng/doc/new_features/v218_features_doc.rst).

- **`localize_changes` / `rank_changes`** (`AC_localize_changes`, `AC_rank_changes`): existing diffs answer *where* pixels changed (`motion_regions`, `perceptual_diff`, `ssim_changed_regions` → raw pixel regions) or which *accessibility* elements differ (`element_diff`, needs metadata) — but not "given a frame diff **and a list of element boxes**, which of *those* changed?". `localize_changes` diffs a reference against the current screen and scores each supplied element box by its mean per-pixel change; `rank_changes` is the pure ranker that flags `changed` (score ≥ `threshold`) and sorts most-changed first. Pairs with `set_of_marks`/accessibility boxes to give a per-element "what changed" feedback signal after a click. cv2/numpy imported lazily; ranking is pure and fully testable. Fifth feature of the ROUND-15 perception lane. No `PySide6`.

### Theme-Invariant Matching (Light Template, Dark Mode)

Find a button captured in light mode even after the app switches to dark mode. Full reference: [`docs/source/Eng/doc/new_features/v217_features_doc.rst`](docs/source/Eng/doc/new_features/v217_features_doc.rst).
Expand Down
50 changes: 50 additions & 0 deletions docs/source/Eng/doc/new_features/v218_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
Localize a Change to the Elements That Changed
==============================================

The existing diffs answer "*where* did pixels change" (``motion_regions``,
``perceptual_diff``, ``ssim_changed_regions`` return raw pixel regions) or "which
*accessibility* elements differ" (``element_diff``, needs a11y metadata). The
missing middle is: given a frame diff **and a list of element boxes**, which of
*those* elements changed? ``change_localize`` scores each supplied box by how
much it changed and ranks them.

* :func:`rank_changes` — pure: take ``[{box, score}]`` and mark each box
``changed`` (score at or above ``threshold``), sorted most-changed first.
* :func:`localize_changes` — diff a reference against the current screen, score
each element box by its mean pixel change, and rank them.

``cv2`` / ``numpy`` are imported lazily (the module stays importable without
them) and the loaders reuse :mod:`visual_match`. The ranking is pure and fully
testable. Imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import localize_changes, rank_changes, mark_elements

boxes = [mark["bbox"] for mark in mark_elements(elements)]

# After an action, which of those elements actually changed?
changed = localize_changes("before.png", boxes, current="after.png")
for entry in changed:
if entry["changed"]:
print("element changed:", entry["box"], entry["score"])

# Or rank pre-computed scores yourself:
rank_changes([{"box": [0, 0, 40, 20], "score": 0.6}], threshold=0.1)

``localize_changes`` returns ``[{box, score, changed}]`` sorted most-changed
first, where ``score`` is the box's mean per-pixel change (0..1). It pairs with
``set_of_marks`` / accessibility element boxes to turn a raw screen diff into a
per-element "what changed" signal — an agent feedback channel after a click.

Executor commands
-----------------

``AC_localize_changes`` (``reference`` + ``boxes`` JSON list + ``current`` /
``threshold`` / ``region`` → ``{changes}``) and ``AC_rank_changes``
(``scored_boxes`` JSON list + ``threshold`` → ``{changes}``, pure). They are the
matching read-only ``ac_*`` MCP tools and Script Builder commands under
**Image**.
46 changes: 46 additions & 0 deletions docs/source/Eng/doc/new_features/v219_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
Classify a Widget from Its Pixel Shape
======================================

Set-of-Marks and element proposers hand back *boxes*, but not *what each box is*.
``form_fields.checkbox_state`` already reads a box known to be a checkbox; the
gap is the typing step before it — is this box a checkbox, a radio button, a push
button, a text field or a toggle? ``icon_classify`` answers that from cheap
geometric features (no model).

* :func:`box_features` — extract ``{aspect, fill, edge_density, circularity}``
for a box region (the objective measurements).
* :func:`classify_widget` — pure: map a feature dict to a widget type by
documented heuristics.
* :func:`classify_icon` — compose the two: a box to ``{type, features}``.

``classify_widget`` is pure and fully testable; ``box_features`` imports cv2 /
numpy lazily (the module stays importable without them) and reuses
:func:`visual_match._to_gray`. Imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import classify_icon, classify_widget

# From a screenshot + a box:
classify_icon("dialog.png", [120, 80, 16, 16])
# {'type': 'checkbox', 'features': {'aspect': 1.0, 'fill': 0.12, ...}}

# From features you already have:
classify_widget({"aspect": 1.0, "circularity": 0.9, "fill": 0.4}) # 'radio'

The heuristics: a round box (aspect ≈ 1, high circularity) is a ``radio``; a wide
rounded box is a ``toggle``; a near-square sparse box is a ``checkbox``; a wide
hollow box is a ``text_field``; a wide filled box is a ``button``; anything else
is an ``icon``. Tune by reading ``features`` and applying your own rules where
the defaults misfire — the measurements are the durable part.

Executor commands
-----------------

``AC_classify_widget`` (``features`` JSON object → ``{type}``, pure) and
``AC_classify_icon`` (``source`` image + ``box`` ``[x, y, w, h]`` →
``{type, features}``). They are the matching read-only ``ac_*`` MCP tools and
Script Builder commands under **Image**.
51 changes: 51 additions & 0 deletions docs/source/Eng/doc/new_features/v220_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Template-Free Element Proposal (Pixels to Elements)
===================================================

Set-of-Marks, ``observation`` and the grounding helpers all assume you already
have a list of element boxes — but on a screen the framework doesn't model
(a game, a custom-drawn app, a remote desktop) there is no accessibility tree to
provide one. ``element_proposal`` builds that top-of-funnel list from pixels:
detect candidate *widget* boxes (closed-edge blobs) and *text* boxes
(:func:`text_regions.find_text_regions`), fuse them — dropping widget boxes that
are really just text — and return them in reading order, each tagged ``text`` or
``widget``.

* :func:`propose_elements` — the full pixel-to-elements pipeline.
* :func:`tag_kinds` — pure: label fused boxes ``text`` / ``widget`` by source and
keep their reading-order ``index``.

The fusion / cross-check / ordering reuse :mod:`element_parse` — the ``ocr`` >
``icon`` source priority *is* the "drop widget-that-is-really-text" check — and
the text detection reuses :mod:`text_regions`. ``cv2`` is imported lazily so the
module stays importable; :func:`tag_kinds` is pure and fully testable. Imports no
``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import propose_elements, mark_elements

# No accessibility tree? Propose elements straight from the screen:
elements = propose_elements(min_area=120)
# [{'box': [x, y, w, h], 'kind': 'widget', 'index': 0}, ...]

# Feed them to Set-of-Marks like any other element list:
marks = mark_elements(elements)

``propose_elements`` returns ``[{box, kind, index}]`` in reading order, where
``kind`` is ``text`` or ``widget``. It is the missing top-of-funnel for the
agent stack on un-modelled UIs: pixels in, a clean numbered element list out,
ready for marking, observation or grounding. Tune ``min_area`` for the smallest
control you care about and ``iou_threshold`` for how aggressively overlapping
text and widget boxes are merged.

Executor commands
-----------------

``AC_propose_elements`` (``region`` ``[x, y, w, h]`` / ``min_area`` /
``iou_threshold`` → ``{elements}``) runs the full pipeline on the screen, and
``AC_tag_kinds`` (``elements`` JSON list → ``{elements}``, pure) labels a
pre-fused list. They are the matching read-only ``ac_*`` MCP tools and Script
Builder commands under **Image**.
44 changes: 44 additions & 0 deletions docs/source/Zh/doc/new_features/v218_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
把變化歸因到實際改變的元素
==========================

既有的 diff 回答「像素在*哪裡*改變」(``motion_regions``、``perceptual_diff``、
``ssim_changed_regions`` 回傳原始像素區域),或「哪些*無障礙*元素不同」(``element_diff``,需 a11y 中介資料)。
缺少的中段是:給定一個畫面 diff **與一份元素方框清單**,*那些*元素中哪些改變了?``change_localize`` 依
每個提供的方框改變多少評分並排序。

* :func:`rank_changes` ——純函式:接受 ``[{box, score}]`` 並把每個方框標記為 ``changed``
(分數達到或超過 ``threshold``),依改變最多排在最前。
* :func:`localize_changes` ——把參考影像對目前螢幕做 diff,依每個元素方框的平均像素改變評分,再排序。

``cv2`` / ``numpy`` 採延遲匯入(模組無需它們即可匯入),載入器重用 :mod:`visual_match`。
排序為純函式且可完整測試。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import localize_changes, rank_changes, mark_elements

boxes = [mark["bbox"] for mark in mark_elements(elements)]

# 某動作後,那些元素中哪些真的改變了?
changed = localize_changes("before.png", boxes, current="after.png")
for entry in changed:
if entry["changed"]:
print("元素改變:", entry["box"], entry["score"])

# 或自行排序預先算好的分數:
rank_changes([{"box": [0, 0, 40, 20], "score": 0.6}], threshold=0.1)

``localize_changes`` 回傳 ``[{box, score, changed}]`` 依改變最多排序,``score`` 是方框的平均
逐像素改變(0..1)。它與 ``set_of_marks`` / 無障礙元素方框搭配,把原始螢幕 diff 轉成逐元素的
「什麼改變了」訊號——點擊後的 agent 回饋通道。

執行器指令
----------

``AC_localize_changes``(``reference`` 加上 ``boxes`` JSON 清單加上 ``current`` /
``threshold`` / ``region`` → ``{changes}``)與 ``AC_rank_changes``(``scored_boxes`` JSON 清單加上
``threshold`` → ``{changes}``,純函式)。皆以對應的唯讀 ``ac_*`` MCP 工具及 Script Builder 指令
(位於 **Image** 分類下)形式提供。
38 changes: 38 additions & 0 deletions docs/source/Zh/doc/new_features/v219_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
從像素形狀分類控制項
====================

Set-of-Marks 與元素提案器回傳*方框*,卻不告訴你*每個方框是什麼*。``form_fields.checkbox_state``
已能讀取一個已知是核取方塊的方框;缺少的是它之前的分類步驟——這個方框是核取方塊、單選鈕、按鈕、
文字欄位還是切換開關?``icon_classify`` 從低成本的幾何特徵(無需模型)回答此問題。

* :func:`box_features` ——擷取方框區域的 ``{aspect, fill, edge_density, circularity}``(客觀量測)。
* :func:`classify_widget` ——純函式:以記載的啟發式規則把特徵字典映射為控制項型別。
* :func:`classify_icon` ——組合兩者:把一個方框轉為 ``{type, features}``。

``classify_widget`` 為純函式且可完整測試;``box_features`` 延遲匯入 cv2 / numpy(模組無需它們即可匯入),
並重用 :func:`visual_match._to_gray`。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import classify_icon, classify_widget

# 從截圖 + 方框:
classify_icon("dialog.png", [120, 80, 16, 16])
# {'type': 'checkbox', 'features': {'aspect': 1.0, 'fill': 0.12, ...}}

# 從你已有的特徵:
classify_widget({"aspect": 1.0, "circularity": 0.9, "fill": 0.4}) # 'radio'

啟發式規則:圓形方框(aspect ≈ 1、高 circularity)為 ``radio``;寬且圓潤為 ``toggle``;
近正方且稀疏為 ``checkbox``;寬且空心為 ``text_field``;寬且填滿為 ``button``;其餘為 ``icon``。
在預設誤判處,可讀取 ``features`` 套用你自己的規則微調——量測值才是耐用的部分。

執行器指令
----------

``AC_classify_widget``(``features`` JSON 物件 → ``{type}``,純函式)與
``AC_classify_icon``(``source`` 影像 + ``box`` ``[x, y, w, h]`` → ``{type, features}``)。
皆以對應的唯讀 ``ac_*`` MCP 工具及 Script Builder 指令(位於 **Image** 分類下)形式提供。
42 changes: 42 additions & 0 deletions docs/source/Zh/doc/new_features/v220_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
免模板元素提案(像素到元素)
============================

Set-of-Marks、``observation`` 與 grounding 輔助函式都假設你已有一份元素方框清單——但在框架無法
建模的畫面上(遊戲、自繪 app、遠端桌面),並沒有無障礙樹可提供。``element_proposal`` 從像素建立
這份漏斗頂端清單:偵測候選*控制項*方框(封閉邊緣 blob)與*文字*方框
(:func:`text_regions.find_text_regions`),將兩者融合——丟棄其實只是文字的控制項方框——
並依閱讀順序回傳,每個標記為 ``text`` 或 ``widget``。

* :func:`propose_elements` ——完整的像素到元素管線。
* :func:`tag_kinds` ——純函式:依來源把融合後的方框標記 ``text`` / ``widget``,並保留其閱讀順序 ``index``。

融合 / 交叉檢查 / 排序重用 :mod:`element_parse`——``ocr`` > ``icon`` 來源優先序*即*「丟棄其實是
文字的控制項」檢查——文字偵測則重用 :mod:`text_regions`。``cv2`` 採延遲匯入,故模組仍可匯入;
:func:`tag_kinds` 為純函式且可完整測試。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import propose_elements, mark_elements

# 沒有無障礙樹?直接從畫面提案元素:
elements = propose_elements(min_area=120)
# [{'box': [x, y, w, h], 'kind': 'widget', 'index': 0}, ...]

# 像任何元素清單一樣餵給 Set-of-Marks:
marks = mark_elements(elements)

``propose_elements`` 依閱讀順序回傳 ``[{box, kind, index}]``,``kind`` 為 ``text`` 或 ``widget``。
它是 agent 堆疊在未建模 UI 上缺少的漏斗頂端:像素進、乾淨的編號元素清單出,可供標記、observation
或 grounding。以 ``min_area`` 調整你在意的最小控制項,以 ``iou_threshold`` 調整重疊文字與控制項
方框合併的積極程度。

執行器指令
----------

``AC_propose_elements``(``region`` ``[x, y, w, h]`` / ``min_area`` /
``iou_threshold`` → ``{elements}``)在畫面上執行完整管線,``AC_tag_kinds``
(``elements`` JSON 清單 → ``{elements}``,純函式)則標記預先融合的清單。皆以對應的唯讀
``ac_*`` MCP 工具及 Script Builder 指令(位於 **Image** 分類下)形式提供。
11 changes: 11 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,14 @@
)
# Theme-invariant matching so a light template matches dark mode
from je_auto_control.utils.theme_normalize import match_theme, normalize_theme
# Attribute a screen change to the specific element boxes that changed
from je_auto_control.utils.change_localize import localize_changes, rank_changes
# Classify what kind of widget a box is from its pixel shape
from je_auto_control.utils.icon_classify import (
box_features, classify_icon, classify_widget,
)
# Propose a clean element list from raw pixels (template-free)
from je_auto_control.utils.element_proposal import propose_elements, tag_kinds
# Rich clipboard formats — RTF + CSV/TSV codecs and Windows get / set
from je_auto_control.utils.clipboard_rich_formats import (
build_rtf, csv_to_rows, get_clipboard_csv, get_clipboard_rtf, rows_to_csv,
Expand Down Expand Up @@ -1771,6 +1779,9 @@ def start_autocontrol_gui(*args, **kwargs):
"place_labels", "label_color",
"grade_contrast", "dominant_pair", "region_contrast",
"normalize_theme", "match_theme",
"localize_changes", "rank_changes",
"classify_widget", "box_features", "classify_icon",
"propose_elements", "tag_kinds",
"build_rtf", "rtf_to_text", "rows_to_csv", "csv_to_rows",
"set_clipboard_rtf", "get_clipboard_rtf",
"set_clipboard_csv", "get_clipboard_csv",
Expand Down
Loading
Loading