Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README/WHATS_NEW_zh-CN.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# 本次更新 — AutoControl

## 本次更新 (2026-06-23) — 几何感知的元素差异与稳定 ID

以重叠跨帧追踪元素,并给予稳定 ID。完整参考:[`docs/source/Zh/doc/new_features/v155_features_doc.rst`](../docs/source/Zh/doc/new_features/v155_features_doc.rst)。

- **`match_elements` / `assign_stable_ids`**(`AC_match_elements`、`AC_assign_stable_ids`):`diff_snapshots` 以 `(role, name)` 作识别——无法比对改名但未移动或移动了的控制项,也无法跨帧给持久 ID。本功能以 IoU 比对元素框(沿用 `element_parse.iou`):`match_elements` 返回 `{matched, added, removed}`;`assign_stable_ids` 从 `prior` 帧延续每个元素的 `id`(移动的按钮保留 id、新增者取得新 id)——让 agent 能跨回合可靠地引用「element 7」。纯标准库、可无头测试。

## 本次更新 (2026-06-23) — 可携式 Agent 轨迹记录(录制与重播)

记录 agent 的观测→动作步骤并重播。完整参考:[`docs/source/Zh/doc/new_features/v154_features_doc.rst`](../docs/source/Zh/doc/new_features/v154_features_doc.rst)。
Expand Down
6 changes: 6 additions & 0 deletions README/WHATS_NEW_zh-TW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# 本次更新 — AutoControl

## 本次更新 (2026-06-23) — 幾何感知的元素差異與穩定 ID

以重疊跨影格追蹤元素,並給予穩定 ID。完整參考:[`docs/source/Zh/doc/new_features/v155_features_doc.rst`](../docs/source/Zh/doc/new_features/v155_features_doc.rst)。

- **`match_elements` / `assign_stable_ids`**(`AC_match_elements`、`AC_assign_stable_ids`):`diff_snapshots` 以 `(role, name)` 作識別——無法比對改名但未移動或移動了的控制項,也無法跨影格給持久 ID。本功能以 IoU 比對元素框(沿用 `element_parse.iou`):`match_elements` 回傳 `{matched, added, removed}`;`assign_stable_ids` 從 `prior` 影格延續每個元素的 `id`(移動的按鈕保留 id、新增者取得新 id)——讓 agent 能跨回合可靠地引用「element 7」。純標準函式庫、可無頭測試。

## 本次更新 (2026-06-23) — 可攜式 Agent 軌跡記錄(錄製與重播)

記錄 agent 的觀測→動作步驟並重播。完整參考:[`docs/source/Zh/doc/new_features/v154_features_doc.rst`](../docs/source/Zh/doc/new_features/v154_features_doc.rst)。
Expand Down
6 changes: 6 additions & 0 deletions WHATS_NEW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# What's New — AutoControl

## What's new (2026-06-23) — Geometry-Aware Element Diff & Stable IDs

Track elements across frames by overlap, with stable IDs. Full reference: [`docs/source/Eng/doc/new_features/v155_features_doc.rst`](docs/source/Eng/doc/new_features/v155_features_doc.rst).

- **`match_elements` / `assign_stable_ids`** (`AC_match_elements`, `AC_assign_stable_ids`): `diff_snapshots` keys identity on `(role, name)` — it can't match a renamed-but-stationary control or a moved one, nor give persistent IDs across frames. This matches element boxes by IoU (reusing `element_parse.iou`): `match_elements` returns `{matched, added, removed}`; `assign_stable_ids` carries each element's `id` from a `prior` frame (a moved button keeps its id, a new one gets a fresh id) — so an agent can reliably refer to "element 7" turn-over-turn. Pure-stdlib, headless-testable.

## What's new (2026-06-23) — Portable Agent-Trajectory Trace (Record & Replay)

Log an agent's observation→action steps and replay them. Full reference: [`docs/source/Eng/doc/new_features/v154_features_doc.rst`](docs/source/Eng/doc/new_features/v154_features_doc.rst).
Expand Down
43 changes: 43 additions & 0 deletions docs/source/Eng/doc/new_features/v155_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Geometry-Aware Element Diff & Stable IDs
========================================

``screen_state.diff_snapshots`` keys element identity strictly on ``(role, name)`` — so
it cannot match an element whose label changed but position is stable, cannot track a
renamed control, and cannot produce persistent IDs across frames. Geometry-aware
matching (intersection-over-union, reusing :doc:`v138_features_doc`'s ``iou``) is the
basis for stable element IDs an agent can refer to turn-over-turn: a button that moved
a few pixels keeps its id, a renamed-but-stationary control matches by overlap, a
genuinely new element gets a fresh id.

Pure-stdlib over plain element dicts (``x`` / ``y`` / ``width`` / ``height``), so it is
fully unit-testable. Imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import match_elements, assign_stable_ids

diff = match_elements(before_boxes, after_boxes, iou_threshold=0.5)
for pair in diff["matched"]:
print("moved/kept:", pair["before"], "->", pair["after"], pair["iou"])
print("appeared:", diff["added"], "disappeared:", diff["removed"])

# Carry stable IDs across frames so the agent can say "click element 7" reliably.
frame1 = assign_stable_ids(boxes1)
frame2 = assign_stable_ids(boxes2, prior=frame1)

``match_elements`` greedily pairs ``before`` ↔ ``after`` by overlap, returning
``{matched: [{before, after, iou}], added, removed}``. ``assign_stable_ids`` tags each
element with an ``id``; with a ``prior`` frame each element inherits the id of the
prior box it most overlaps (above ``iou_threshold``), and unmatched elements get fresh
ids beyond the highest prior id.

Executor commands
-----------------

``AC_match_elements`` (``before`` / ``after`` / ``iou_threshold`` → ``{matched, added,
removed}``) and ``AC_assign_stable_ids`` (``elements`` / ``prior`` / ``iou_threshold``
→ ``{count, elements}``). They are exposed as the MCP tools ``ac_match_elements`` /
``ac_assign_stable_ids`` and as Script Builder commands under **Native UI**.
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v152_features_doc
doc/new_features/v153_features_doc
doc/new_features/v154_features_doc
doc/new_features/v155_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
36 changes: 36 additions & 0 deletions docs/source/Zh/doc/new_features/v155_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
幾何感知的元素差異與穩定 ID
============================

``screen_state.diff_snapshots`` 嚴格以 ``(role, name)`` 作為元素識別——因此無法比對標籤變了但位置穩定的元素、無法
追蹤改名的控制項,也無法跨影格產生持久 ID。幾何感知比對(交集除以聯集,沿用 :doc:`v138_features_doc` 的 ``iou``)
是 agent 能跨回合引用穩定元素 ID 的基礎:移動幾像素的按鈕保留其 id、改名但未移動的控制項以重疊比對到、真正
新增的元素取得新 id。

純標準函式庫,作用於純元素字典(``x`` / ``y`` / ``width`` / ``height``),因此完全可單元測試。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import match_elements, assign_stable_ids

diff = match_elements(before_boxes, after_boxes, iou_threshold=0.5)
for pair in diff["matched"]:
print("moved/kept:", pair["before"], "->", pair["after"], pair["iou"])
print("appeared:", diff["added"], "disappeared:", diff["removed"])

# 跨影格延續穩定 ID,讓 agent 能可靠地說「click element 7」。
frame1 = assign_stable_ids(boxes1)
frame2 = assign_stable_ids(boxes2, prior=frame1)

``match_elements`` 以重疊貪婪配對 ``before`` ↔ ``after``,回傳 ``{matched: [{before, after, iou}], added, removed}``。
``assign_stable_ids`` 為每個元素標上 ``id``;給定 ``prior`` 影格時,每個元素繼承其最重疊(超過 ``iou_threshold``)
之 prior 框的 id,未配對者取得超過最大 prior id 的新 id。

執行器命令
----------

``AC_match_elements``(``before`` / ``after`` / ``iou_threshold`` → ``{matched, added, removed}``)與
``AC_assign_stable_ids``(``elements`` / ``prior`` / ``iou_threshold`` → ``{count, elements}``)。它們以 MCP 工具
``ac_match_elements`` / ``ac_assign_stable_ids`` 以及 Script Builder 中 **Native UI** 分類下的命令提供。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v152_features_doc
doc/new_features/v153_features_doc
doc/new_features/v154_features_doc
doc/new_features/v155_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
6 changes: 6 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,10 @@
from je_auto_control.utils.agent_replay import (
from_jsonl, record_step, replay_trace, to_jsonl,
)
# Geometry-aware element matching across frames (stable IDs, move tracking)
from je_auto_control.utils.element_diff import (
assign_stable_ids, match_elements,
)
# CI workflow annotations (GitHub Actions)
from je_auto_control.utils.ci_annotations import (
emit_annotations, format_annotation,
Expand Down Expand Up @@ -1265,6 +1269,8 @@ def start_autocontrol_gui(*args, **kwargs):
"to_jsonl",
"from_jsonl",
"replay_trace",
"match_elements",
"assign_stable_ids",
"emit_annotations", "format_annotation",
"ClipboardHistory", "default_clipboard_history",
"analyze_heal_log", "heal_stats", "scan_secrets",
Expand Down
24 changes: 24 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -2963,6 +2963,30 @@ def _add_set_of_marks_specs(specs: List[CommandSpec]) -> None:
description="Reject out-of-bounds clicks; snap a near-miss to the nearest "
"element.",
))
specs.append(CommandSpec(
"AC_match_elements", "Native UI", "Match Elements (frames)",
fields=(
FieldSpec("before", FieldType.STRING,
placeholder='[{"x":..,"y":..,"width":..,"height":..}]'),
FieldSpec("after", FieldType.STRING,
placeholder='[{"x":..,"y":..,"width":..,"height":..}]'),
FieldSpec("iou_threshold", FieldType.FLOAT, optional=True, default=0.5,
min_value=0.0, max_value=1.0),
),
description="Match element boxes across two frames by overlap (move/rename).",
))
specs.append(CommandSpec(
"AC_assign_stable_ids", "Native UI", "Assign Stable Element IDs",
fields=(
FieldSpec("elements", FieldType.STRING,
placeholder='[{"x":..,"y":..,"width":..,"height":..}]'),
FieldSpec("prior", FieldType.STRING, optional=True,
placeholder="prior frame's elements (with ids)"),
FieldSpec("iou_threshold", FieldType.FLOAT, optional=True, default=0.5,
min_value=0.0, max_value=1.0),
),
description="Tag elements with IDs carried across frames by overlap.",
))
specs.append(CommandSpec(
"AC_mark_screen", "Native UI", "Set-of-Marks: Number Elements",
fields=(
Expand Down
6 changes: 6 additions & 0 deletions je_auto_control/utils/element_diff/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Geometry-aware element matching across frames (stable IDs, move tracking)."""
from je_auto_control.utils.element_diff.element_diff import (
assign_stable_ids, match_elements,
)

__all__ = ["assign_stable_ids", "match_elements"]
82 changes: 82 additions & 0 deletions je_auto_control/utils/element_diff/element_diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
"""Geometry-aware element matching across frames — stable IDs, move tracking.

``screen_state.diff_snapshots`` keys element identity strictly on ``(role, name)`` — so
it cannot match an element whose label changed but position is stable, cannot track a
renamed control, and cannot produce persistent IDs across frames. Geometry-aware
matching (intersection-over-union, reusing :doc:`v138_features_doc`'s ``iou``) is the
basis for stable element IDs an agent can refer to turn-over-turn: a button that moved
3px keeps its id, a renamed-but-stationary control matches by overlap, a genuinely new
element gets a fresh id.

Pure-stdlib over plain element dicts (``x`` / ``y`` / ``width`` / ``height``), so it is
fully unit-testable. Imports no ``PySide6``.
"""
from typing import Any, Dict, List, Optional, Sequence

from je_auto_control.utils.element_parse import iou

Element = Dict[str, Any]


def match_elements(before: Sequence[Element], after: Sequence[Element], *,
iou_threshold: float = 0.5) -> Dict[str, Any]:
"""Greedily match ``before`` elements to ``after`` by overlap.

Returns ``{matched: [{before, after, iou}], added: [...], removed: [...]}`` — a
``before`` element with no overlap above ``iou_threshold`` is *removed*, an
unmatched ``after`` element is *added*.
"""
after = list(after)
taken: set = set()
matched: List[Dict[str, Any]] = []
removed: List[Element] = []
for element in before:
best_index, best_score = -1, float(iou_threshold)
for index, candidate in enumerate(after):
if index in taken:
continue
score = iou(element, candidate)
if score >= best_score:
best_index, best_score = index, score
if best_index >= 0:
taken.add(best_index)
matched.append({"before": element, "after": after[best_index],
"iou": round(best_score, 4)})
else:
removed.append(element)
added = [candidate for index, candidate in enumerate(after)
if index not in taken]
return {"matched": matched, "added": added, "removed": removed}


def _best_prior(element: Element, prior: Sequence[Element],
iou_threshold: float) -> Optional[Element]:
best, best_score = None, float(iou_threshold)
for candidate in prior:
score = iou(element, candidate)
if score >= best_score:
best, best_score = candidate, score
return best


def assign_stable_ids(elements: Sequence[Element],
prior: Optional[Sequence[Element]] = None, *,
iou_threshold: float = 0.5) -> List[Element]:
"""Return ``elements`` each tagged with a stable ``id``, carried from ``prior``.

With no ``prior`` every element gets a fresh sequential id; otherwise each element
inherits the id of the ``prior`` element it most overlaps (above ``iou_threshold``),
and unmatched elements get new ids beyond the highest prior id.
"""
if not prior:
return [dict(element, id=index) for index, element in enumerate(elements)]
next_id = max((int(p.get("id", -1)) for p in prior), default=-1) + 1
result: List[Element] = []
for element in elements:
match = _best_prior(element, prior, float(iou_threshold))
if match is not None and "id" in match:
result.append(dict(element, id=int(match["id"])))
else:
result.append(dict(element, id=next_id))
next_id += 1
return result
32 changes: 32 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3894,6 +3894,36 @@ def runner(action):
return {"count": len(results), "results": results}


def _match_elements(before: Any, after: Any,
iou_threshold: Any = 0.5) -> Dict[str, Any]:
"""Adapter: geometry-aware match of two element-box lists."""
import json
from je_auto_control.utils.element_diff import match_elements
if isinstance(before, str):
before = json.loads(before)
if isinstance(after, str):
after = json.loads(after)
result = match_elements(list(before), list(after),
iou_threshold=float(iou_threshold))
return {"matched": result["matched"], "added": result["added"],
"removed": result["removed"]}


def _assign_stable_ids(elements: Any, prior: Any = None,
iou_threshold: Any = 0.5) -> Dict[str, Any]:
"""Adapter: tag element boxes with stable IDs carried from a prior frame."""
import json
from je_auto_control.utils.element_diff import assign_stable_ids
if isinstance(elements, str):
elements = json.loads(elements)
if isinstance(prior, str):
prior = json.loads(prior) if prior.strip() else None
tagged = assign_stable_ids(list(elements),
prior=list(prior) if prior else None,
iou_threshold=float(iou_threshold))
return {"count": len(tagged), "elements": tagged}


def _with_modifiers(modifiers: Any, actions: Any) -> Dict[str, Any]:
"""Adapter: run nested actions while modifier keys are held down."""
import json
Expand Down Expand Up @@ -5653,6 +5683,8 @@ def __init__(self):
"AC_observation_index": _observation_index,
"AC_validate_action": _validate_action,
"AC_replay_trace": _replay_trace,
"AC_match_elements": _match_elements,
"AC_assign_stable_ids": _assign_stable_ids,
"AC_tile_rect": _tile_rect,
"AC_grid_rects": _grid_rects,
"AC_cascade_rects": _cascade_rects,
Expand Down
34 changes: 33 additions & 1 deletion je_auto_control/utils/mcp_server/tools/_factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -3345,6 +3345,38 @@ def agent_replay_tools() -> List[MCPTool]:
]


def element_diff_tools() -> List[MCPTool]:
return [
MCPTool(
name="ac_match_elements",
description=("Geometry-aware match of two element-box lists ('before' / "
"'after') by IoU. Returns {matched:[{before,after,iou}], "
"added, removed} — tracks moves/renames where (role,name) "
"diffing can't. 'iou_threshold'."),
input_schema=schema({
"before": {"type": "array", "items": {"type": "object"}},
"after": {"type": "array", "items": {"type": "object"}},
"iou_threshold": {"type": "number"}},
required=["before", "after"]),
handler=h.match_elements,
annotations=READ_ONLY,
),
MCPTool(
name="ac_assign_stable_ids",
description=("Tag 'elements' with a stable 'id' each, carried from a "
"'prior' frame by IoU (a moved element keeps its id, a new "
"one gets a fresh id). Returns {count, elements}."),
input_schema=schema({
"elements": {"type": "array", "items": {"type": "object"}},
"prior": {"type": "array", "items": {"type": "object"}},
"iou_threshold": {"type": "number"}},
required=["elements"]),
handler=h.assign_stable_ids,
annotations=READ_ONLY,
),
]


def ssim_tools() -> List[MCPTool]:
return [
MCPTool(
Expand Down Expand Up @@ -6854,7 +6886,7 @@ def media_assert_tools() -> List[MCPTool]:
motion_regions_tools, window_zorder_tools, soft_assert_tools,
perceptual_diff_tools, window_geometry_tools, cua_action_tools,
observation_tools, action_grounding_tools, agent_replay_tools,
plugin_sdk_tools, governance_tools,
element_diff_tools, plugin_sdk_tools, governance_tools,
credential_lease_tools, egress_tools, approval_testing_tools,
trajectory_eval_tools, compliance_tools, agent_trace_tools,
video_report_tools, fuzzy_tools, artifact_store_tools, image_dedup_tools,
Expand Down
10 changes: 10 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2330,6 +2330,16 @@ def replay_trace(trace):
return _replay_trace(trace)


def match_elements(before, after, iou_threshold=0.5):
from je_auto_control.utils.executor.action_executor import _match_elements
return _match_elements(before, after, iou_threshold)


def assign_stable_ids(elements, prior=None, iou_threshold=0.5):
from je_auto_control.utils.executor.action_executor import _assign_stable_ids
return _assign_stable_ids(elements, prior, iou_threshold)


def detect_drift(reference, current, threshold=0.25, bins=10):
from je_auto_control.utils.executor.action_executor import _detect_drift
return _detect_drift(reference, current, threshold, bins)
Expand Down
Loading
Loading