Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README/WHATS_NEW_zh-CN.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# 本次更新 — AutoControl

## 本次更新 (2026-06-24) — 动作效果分类(我的点击有没有效果?)

告诉代理点击有没有效果——以及是否发生在它瞄准之处。完整参考:[`docs/source/Zh/doc/new_features/v167_features_doc.rst`](../docs/source/Zh/doc/new_features/v167_features_doc.rst)。

- **`classify_effect` / `effect_near_point` / `is_no_op`**(`AC_classify_effect`、`AC_effect_near_point`):`screen_state`/`element_diff` 报告变了什么却不归因到动作;`loop_guard` 要重复 N 次才标记 no-op。本功能比对前后观测,并依动作目标点在*第一步*就分类结果为 `no_op` / `changed_near_target` / `changed_elsewhere`(意外对话框)/ `changed`,返回含变化中心与原因的 `EffectVerdict`。重用 `element_diff.match_elements` + `observation_delta` 的字段变更检查。纯标准库;不导入 `PySide6`。

## 本次更新 (2026-06-24) — 表单字段关联(多方向)+ 复选框状态

即使值在下方或右对齐也能把标签与值配对,并读取复选框状态。完整参考:[`docs/source/Zh/doc/new_features/v166_features_doc.rst`](../docs/source/Zh/doc/new_features/v166_features_doc.rst)。
Expand Down
6 changes: 6 additions & 0 deletions README/WHATS_NEW_zh-TW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# 本次更新 — AutoControl

## 本次更新 (2026-06-24) — 動作效果分類(我的點擊有沒有效果?)

告訴代理點擊有沒有效果——以及是否發生在它瞄準之處。完整參考:[`docs/source/Zh/doc/new_features/v167_features_doc.rst`](../docs/source/Zh/doc/new_features/v167_features_doc.rst)。

- **`classify_effect` / `effect_near_point` / `is_no_op`**(`AC_classify_effect`、`AC_effect_near_point`):`screen_state`/`element_diff` 回報變了什麼卻不歸因到動作;`loop_guard` 要重複 N 次才標記 no-op。本功能比對前後觀測,並依動作目標點在*第一步*就分類結果為 `no_op` / `changed_near_target` / `changed_elsewhere`(意外對話框)/ `changed`,回傳含變化中心與原因的 `EffectVerdict`。重用 `element_diff.match_elements` + `observation_delta` 的欄位變更檢查。純標準函式庫;不匯入 `PySide6`。

## 本次更新 (2026-06-24) — 表單欄位關聯(多方向)+ 核取方塊狀態

即使值在下方或右對齊也能把標籤與值配對,並讀取核取方塊狀態。完整參考:[`docs/source/Zh/doc/new_features/v166_features_doc.rst`](../docs/source/Zh/doc/new_features/v166_features_doc.rst)。
Expand Down
6 changes: 6 additions & 0 deletions WHATS_NEW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# What's New — AutoControl

## What's new (2026-06-24) — Action-Effect Classification (Did My Click Do Anything?)

Tell an agent whether a click did anything — and whether it happened where it aimed. Full reference: [`docs/source/Eng/doc/new_features/v167_features_doc.rst`](docs/source/Eng/doc/new_features/v167_features_doc.rst).

- **`classify_effect` / `effect_near_point` / `is_no_op`** (`AC_classify_effect`, `AC_effect_near_point`): `screen_state`/`element_diff` report what changed but never tie it to the action; `loop_guard` only flags a no-op after N repeats. This diffs the before/after observation and, given the action's target point, classifies the result on the *first* step as `no_op` / `changed_near_target` / `changed_elsewhere` (a surprise dialog) / `changed`, returning an `EffectVerdict` with the changed centres and a reason. Reuses `element_diff.match_elements` + `observation_delta`'s field-change check. Pure-stdlib; no `PySide6`.

## What's new (2026-06-24) — Form Field Association (Multi-Direction) + Checkbox State

Pair form labels with values even when the value is below or right-aligned, and read checkbox state. Full reference: [`docs/source/Eng/doc/new_features/v166_features_doc.rst`](docs/source/Eng/doc/new_features/v166_features_doc.rst).
Expand Down
49 changes: 49 additions & 0 deletions docs/source/Eng/doc/new_features/v167_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Action-Effect Classification (Did My Click Do Anything?)
========================================================

After an agent clicks, the crucial question is "did that do anything, and was it the *right*
thing?" — but nothing answered it on the *first* step. ``screen_state.diff_snapshots`` and
``element_diff`` report what changed but never tie the change back to the action;
``loop_guard`` only flags a no-op after the same digest repeats N times (so the agent loops
2–8 times first); ``actionability`` is purely a *pre*-action gate. ``action_effect`` closes the
loop: it diffs the before/after observation and, given the action's target point, classifies
the result so an agent can react immediately.

The verdict is one of ``no_op`` (nothing changed), ``changed_near_target`` (the change happened
where we acted — a button depressed), ``changed_elsewhere`` (a surprise dialog popped somewhere
else), or ``changed`` (something changed but the action carried no point to attribute to).

Pure-stdlib over element dicts + the action record; reuses ``element_diff.match_elements`` for
the overlap join and ``observation_delta``'s field-change check. Fully deterministic and
unit-testable with no device. Imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import classify_effect, effect_near_point, is_no_op

verdict = classify_effect(before_elements, after_elements,
{"type": "click", "x": 480, "y": 260})
if verdict.effect == "no_op":
retry_or_repair()
elif verdict.effect == "changed_elsewhere":
handle_unexpected_dialog()

if is_no_op(before_elements, after_elements):
...

``classify_effect`` returns an ``EffectVerdict`` (``effect`` / ``changed_near_target`` /
``changed_count`` / ``changed_centers`` / ``reason``). ``effect_near_point`` answers whether any
change landed within ``radius`` of an arbitrary point; ``is_no_op`` is the boolean shortcut.

Executor commands
-----------------

``AC_classify_effect`` (``before`` / ``after`` / ``action`` / ``radius`` →
``{effect, changed_near_target, changed_count, changed_centers, reason}``) and
``AC_effect_near_point`` (``before`` / ``after`` / ``point`` / ``radius`` → ``{near}``). They
are exposed as the MCP tools ``ac_classify_effect`` / ``ac_effect_near_point`` (read-only) and
as the Script Builder commands **Classify Action Effect** / **Effect Near Point?** under
**Native UI**.
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v164_features_doc
doc/new_features/v165_features_doc
doc/new_features/v166_features_doc
doc/new_features/v167_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
45 changes: 45 additions & 0 deletions docs/source/Zh/doc/new_features/v167_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
動作效果分類(我的點擊有沒有效果?)
====================================

代理點擊後最關鍵的問題是「這有沒有效果,而且是*正確的*效果嗎?」——但在*第一步*就回答這個
問題的功能並不存在。``screen_state.diff_snapshots`` 與 ``element_diff`` 回報變了什麼,卻從不
把變化歸因回該動作;``loop_guard`` 只在相同摘要重複 N 次後才標記 no-op(因此代理會先空轉
2–8 次);``actionability`` 純粹是*動作前*的閘門。``action_effect`` 補上這個迴圈:比對前後
觀測,並依動作的目標點分類結果,讓代理能立即反應。

判定為下列之一:``no_op``(無變化)、``changed_near_target``(變化發生在我們動作之處——按鈕被
按下)、``changed_elsewhere``(別處彈出意外對話框)、或 ``changed``(有變化但動作沒有可歸因的
座標點)。

純標準函式庫,作用於元素字典 + 動作記錄;重用 ``element_diff.match_elements`` 做重疊配對與
``observation_delta`` 的欄位變更檢查。完全確定性、可在無裝置下單元測試。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import classify_effect, effect_near_point, is_no_op

verdict = classify_effect(before_elements, after_elements,
{"type": "click", "x": 480, "y": 260})
if verdict.effect == "no_op":
retry_or_repair()
elif verdict.effect == "changed_elsewhere":
handle_unexpected_dialog()

if is_no_op(before_elements, after_elements):
...

``classify_effect`` 回傳 ``EffectVerdict``(``effect`` / ``changed_near_target`` /
``changed_count`` / ``changed_centers`` / ``reason``)。``effect_near_point`` 回答任一變化是否
落在任意點的 ``radius`` 內;``is_no_op`` 是布林捷徑。

執行器指令
----------

``AC_classify_effect``(``before`` / ``after`` / ``action`` / ``radius`` →
``{effect, changed_near_target, changed_count, changed_centers, reason}``)與
``AC_effect_near_point``(``before`` / ``after`` / ``point`` / ``radius`` → ``{near}``)。
兩者以 MCP 工具 ``ac_classify_effect`` / ``ac_effect_near_point``(唯讀)及 Script Builder 指令
**Classify Action Effect** / **Effect Near Point?**(位於 **Native UI** 分類下)形式提供。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v164_features_doc
doc/new_features/v165_features_doc
doc/new_features/v166_features_doc
doc/new_features/v167_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
8 changes: 8 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,10 @@
from je_auto_control.utils.form_fields import (
associate_fields, checkbox_state, match_labels_to_widgets,
)
# Classify whether an action did anything (target-local attribution)
from je_auto_control.utils.action_effect import (
EffectVerdict, classify_effect, effect_near_point, is_no_op,
)
# Locate on-screen regions by colour (mask + connected components)
from je_auto_control.utils.color_region import (
find_color_region, find_color_regions,
Expand Down Expand Up @@ -1245,6 +1249,10 @@ def start_autocontrol_gui(*args, **kwargs):
"associate_fields",
"match_labels_to_widgets",
"checkbox_state",
"EffectVerdict",
"classify_effect",
"effect_near_point",
"is_no_op",
"find_color_region",
"find_color_regions",
"ssim_compare",
Expand Down
25 changes: 25 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -3142,6 +3142,31 @@ def _add_set_of_marks_specs(specs: List[CommandSpec]) -> None:
),
description="Token-budgeted '+/~/-' summary of what changed between frames.",
))
specs.append(CommandSpec(
"AC_classify_effect", "Native UI", "Classify Action Effect",
fields=(
FieldSpec("before", FieldType.STRING,
placeholder='[{"role":"button","name":"OK","x":0,"y":0}]'),
FieldSpec("after", FieldType.STRING,
placeholder='[{"role":"button","name":"OK","x":0,"y":0}]'),
FieldSpec("action", FieldType.STRING,
placeholder='{"type":"click","x":50,"y":50}'),
FieldSpec("radius", FieldType.INT, optional=True, default=64),
),
description="Did the action change the screen near its target? (no_op/…).",
))
specs.append(CommandSpec(
"AC_effect_near_point", "Native UI", "Effect Near Point?",
fields=(
FieldSpec("before", FieldType.STRING,
placeholder='[{"role":"button","x":0,"y":0}]'),
FieldSpec("after", FieldType.STRING,
placeholder='[{"role":"button","x":0,"y":0}]'),
FieldSpec("point", FieldType.STRING, placeholder="[50, 50]"),
FieldSpec("radius", FieldType.INT, optional=True, default=64),
),
description="Did any before/after change land within radius of a point?",
))
specs.append(CommandSpec(
"AC_validate_action", "Native UI", "Validate / Snap Action",
fields=(
Expand Down
6 changes: 6 additions & 0 deletions je_auto_control/utils/action_effect/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Classify whether an action did anything, with target-local attribution."""
from je_auto_control.utils.action_effect.action_effect import (
EffectVerdict, classify_effect, effect_near_point, is_no_op,
)

__all__ = ["EffectVerdict", "classify_effect", "effect_near_point", "is_no_op"]
104 changes: 104 additions & 0 deletions je_auto_control/utils/action_effect/action_effect.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
"""Classify whether an action actually did anything, with target-local attribution.

After an agent clicks, the crucial question is "did that do anything, and was it the *right*
thing?" — but nothing answers it on the *first* step. ``screen_state.diff_snapshots`` and
``element_diff`` report what changed but never tie the change back to the action; ``loop_guard``
only flags a no-op after the same digest repeats N times (so the agent loops 2-8 times first);
``actionability`` is purely a *pre*-action gate. ``action_effect`` closes the loop: it diffs the
before/after observation, and given the action's target point classifies the result as
``no_op`` (nothing changed), ``changed_near_target`` (the change happened where we acted — a
button depressed), ``changed_elsewhere`` (a surprise dialog popped somewhere else), or
``changed`` (something changed but the action had no point to attribute to).

Pure-stdlib over element dicts + the action record; reuses ``element_diff.match_elements`` for
the overlap join and ``observation_delta``'s field-change check. Fully deterministic and
unit-testable with no device. Imports no ``PySide6``.
"""
from dataclasses import asdict, dataclass
from typing import Any, Dict, List, Optional, Sequence

Element = Dict[str, Any]


@dataclass(frozen=True)
class EffectVerdict:
"""The classified effect of an action plus its attribution evidence."""

effect: str
changed_near_target: bool
changed_count: int
changed_centers: List[List[int]]
reason: str

def to_dict(self) -> Dict[str, Any]:
"""Return the verdict as a plain dict."""
return asdict(self)


def _center(element: Element) -> List[int]:
return [int(element.get("x", 0)) + int(element.get("width", 0)) // 2,
int(element.get("y", 0)) + int(element.get("height", 0)) // 2]


def _action_point(action: Any) -> Optional[List[int]]:
"""Extract the (x, y) the action targets, or ``None`` if it has no coordinate."""
if not isinstance(action, dict):
return None
if "x" in action and "y" in action:
return [int(action["x"]), int(action["y"])]
point = action.get("point") or action.get("center")
return [int(point[0]), int(point[1])] if point else None


def _changed_elements(before: Sequence[Element], after: Sequence[Element],
iou_threshold: float, move_threshold: int):
"""Return every element that was added / removed / changed between two frames."""
from je_auto_control.utils.element_diff import match_elements
from je_auto_control.utils.observation_delta.observation_delta import (
_changed_fields)
diff = match_elements(list(before), list(after), iou_threshold=iou_threshold)
changed = [pair["after"] for pair in diff["matched"]
if _changed_fields(pair["before"], pair["after"], move_threshold)]
return diff["added"] + diff["removed"] + changed


def _near(centers: Sequence[List[int]], point: Sequence[int], radius: int) -> bool:
return any(abs(cx - point[0]) <= radius and abs(cy - point[1]) <= radius
for cx, cy in centers)


def classify_effect(before: Sequence[Element], after: Sequence[Element], action: Any,
*, radius: int = 64, iou_threshold: float = 0.5,
move_threshold: int = 5) -> EffectVerdict:
"""Classify the effect of ``action`` from the before/after observation pair."""
changed = _changed_elements(before, after, float(iou_threshold),
int(move_threshold))
centers = [_center(element) for element in changed]
if not changed:
return EffectVerdict("no_op", False, 0, [],
"no element was added, removed or changed")
point = _action_point(action)
if point is None:
return EffectVerdict("changed", False, len(changed), centers,
"the screen changed but the action had no target point")
if _near(centers, point, int(radius)):
return EffectVerdict("changed_near_target", True, len(changed), centers,
"a change occurred within radius of the target point")
return EffectVerdict("changed_elsewhere", False, len(changed), centers,
"the screen changed, but away from the target point")


def effect_near_point(before: Sequence[Element], after: Sequence[Element],
point: Sequence[int], *, radius: int = 64,
iou_threshold: float = 0.5, move_threshold: int = 5) -> bool:
"""Return whether any change between the frames lies within ``radius`` of ``point``."""
changed = _changed_elements(before, after, float(iou_threshold),
int(move_threshold))
return _near([_center(element) for element in changed], point, int(radius))


def is_no_op(before: Sequence[Element], after: Sequence[Element], *,
iou_threshold: float = 0.5, move_threshold: int = 5) -> bool:
"""Return whether the action produced no observable change at all."""
return not _changed_elements(before, after, float(iou_threshold),
int(move_threshold))
30 changes: 30 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -4078,6 +4078,34 @@ def _delta_observation(prev: Any, curr: Any, viewport: Any = None,
"removed": len(delta["removed"]), "changed": len(delta["changed"])}


def _classify_effect(before: Any, after: Any, action: Any,
radius: Any = 64) -> Dict[str, Any]:
"""Adapter: classify whether an action changed the screen (target-local)."""
import json
from je_auto_control.utils.action_effect import classify_effect
if isinstance(before, str):
before = json.loads(before)
if isinstance(after, str):
after = json.loads(after)
if isinstance(action, str):
action = json.loads(action)
return classify_effect(before, after, action, radius=int(radius)).to_dict()


def _effect_near_point(before: Any, after: Any, point: Any,
radius: Any = 64) -> Dict[str, Any]:
"""Adapter: did any before/after change land within radius of a point."""
import json
from je_auto_control.utils.action_effect import effect_near_point
if isinstance(before, str):
before = json.loads(before)
if isinstance(after, str):
after = json.loads(after)
if isinstance(point, str):
point = json.loads(point)
return {"near": effect_near_point(before, after, point, radius=int(radius))}


def _validate_action(action: Any, screen: Any = None,
targets: Any = None) -> Dict[str, Any]:
"""Adapter: validate a coordinate action (bounds + optional snap-to-target)."""
Expand Down Expand Up @@ -5954,6 +5982,8 @@ def __init__(self):
"AC_serialize_observation": _serialize_observation,
"AC_observation_index": _observation_index,
"AC_delta_observation": _delta_observation,
"AC_classify_effect": _classify_effect,
"AC_effect_near_point": _effect_near_point,
"AC_validate_action": _validate_action,
"AC_replay_trace": _replay_trace,
"AC_match_elements": _match_elements,
Expand Down
Loading
Loading