Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README/WHATS_NEW_zh-CN.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# 本次更新 — AutoControl

## 本次更新 (2026-06-23) — 窗口客户区几何

不论标题栏 / 边框,点击窗口*内部*。完整参考:[`docs/source/Zh/doc/new_features/v150_features_doc.rst`](../docs/source/Zh/doc/new_features/v150_features_doc.rst)。

- **`get_client_rect` / `client_point` / `frame_insets` / `client_to_screen`**(`AC_get_client_rect`、`AC_client_point`):`get_window_geometry` 只返回*外框*——没有客户区矩形、框边内缩运算或客户区→屏幕对应。`client_point("App", x, y)` 把内容相对点对应到屏幕,让点击不论外框都落在窗口内;`frame_insets` 报告边框 / 标题栏厚度。`frame_insets`/`client_to_screen` 是纯几何(可无头测试);`get_client_rect` 使用可注入的 Win32 读取器(`GetClientRect`+`ClientToScreen`)。

## 本次更新 (2026-06-23) — 感知式(YIQ)图像比对含反锯齿抑制

会忽略反锯齿边缘的视觉回归比对。完整参考:[`docs/source/Zh/doc/new_features/v149_features_doc.rst`](../docs/source/Zh/doc/new_features/v149_features_doc.rst)。
Expand Down
6 changes: 6 additions & 0 deletions README/WHATS_NEW_zh-TW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# 本次更新 — AutoControl

## 本次更新 (2026-06-23) — 視窗客戶區幾何

不論標題列 / 邊框,點擊視窗*內部*。完整參考:[`docs/source/Zh/doc/new_features/v150_features_doc.rst`](../docs/source/Zh/doc/new_features/v150_features_doc.rst)。

- **`get_client_rect` / `client_point` / `frame_insets` / `client_to_screen`**(`AC_get_client_rect`、`AC_client_point`):`get_window_geometry` 只回傳*外框*——沒有客戶區矩形、框邊內縮運算或客戶區→螢幕對應。`client_point("App", x, y)` 把內容相對點對應到螢幕,讓點擊不論外框都落在視窗內;`frame_insets` 回報邊框 / 標題列厚度。`frame_insets`/`client_to_screen` 是純幾何(可無頭測試);`get_client_rect` 使用可注入的 Win32 讀取器(`GetClientRect`+`ClientToScreen`)。

## 本次更新 (2026-06-23) — 感知式(YIQ)影像比對含反鋸齒抑制

會忽略反鋸齒邊緣的視覺回歸比對。完整參考:[`docs/source/Zh/doc/new_features/v149_features_doc.rst`](../docs/source/Zh/doc/new_features/v149_features_doc.rst)。
Expand Down
6 changes: 6 additions & 0 deletions WHATS_NEW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# What's New — AutoControl

## What's new (2026-06-23) — Window Client-Area Geometry

Click *inside* a window regardless of its title bar / borders. Full reference: [`docs/source/Eng/doc/new_features/v150_features_doc.rst`](docs/source/Eng/doc/new_features/v150_features_doc.rst).

- **`get_client_rect` / `client_point` / `frame_insets` / `client_to_screen`** (`AC_get_client_rect`, `AC_client_point`): `get_window_geometry` returns only the *outer* bbox — there was no client-area rect, frame-inset math, or client→screen mapping. `client_point("App", x, y)` maps a content-relative point to the screen so a click lands inside the window regardless of chrome; `frame_insets` reports border/title-bar thickness. `frame_insets`/`client_to_screen` are pure geometry (headless-testable); `get_client_rect` uses an injectable Win32 reader (`GetClientRect`+`ClientToScreen`).

## What's new (2026-06-23) — Perceptual (YIQ) Image Diff with Anti-Alias Suppression

Visual-regression diffing that ignores anti-aliased edges. Full reference: [`docs/source/Eng/doc/new_features/v149_features_doc.rst`](docs/source/Eng/doc/new_features/v149_features_doc.rst).
Expand Down
43 changes: 43 additions & 0 deletions docs/source/Eng/doc/new_features/v150_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Window Client-Area Geometry
===========================

``window_capture.get_window_geometry`` returns a window's *outer* bounding box (for
screenshotting), but there is no *client*-area rect, no frame-inset math, and no
client→screen point mapping. RPA needs "click at ``(x, y)`` *inside* this window's
client area regardless of title-bar height / borders" — the building block for
window-relative clicking. This adds the client rect, the pure frame-inset and
client-to-screen helpers, and a one-call ``client_point``.

``frame_insets`` / ``client_to_screen`` are pure geometry (headless-testable); only
``get_client_rect``'s default reader touches Win32 (``GetClientRect`` +
``ClientToScreen``), and it is injectable. Imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import (get_client_rect, client_point, frame_insets,
client_to_screen)

# Click 20px in, 30px down from the window's content origin (not its title bar).
point = client_point("Calculator", 20, 30)
if point:
click(*point)

rect = get_client_rect("Calculator") # (x, y, width, height)
insets = frame_insets(get_window_geometry("Calculator"), rect) # border sizes

``get_client_rect`` returns the client area as ``(x, y, width, height)`` with a
screen-coordinate origin (or ``None``); ``client_point`` maps a client-local point to
the screen so a click lands inside the content regardless of chrome. ``frame_insets``
returns the ``{left, top, right, bottom}`` border/title-bar thickness from the outer
and client rects, and ``client_to_screen`` is the underlying pure offset.

Executor commands
-----------------

``AC_get_client_rect`` (``title`` → ``{found, rect}``) and ``AC_client_point``
(``title`` / ``x`` / ``y`` → ``{found, point}``). They are exposed as the MCP tools
``ac_get_client_rect`` / ``ac_client_point`` and as Script Builder commands under
**Window**.
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v147_features_doc
doc/new_features/v148_features_doc
doc/new_features/v149_features_doc
doc/new_features/v150_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
36 changes: 36 additions & 0 deletions docs/source/Zh/doc/new_features/v150_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
視窗客戶區幾何
==============

``window_capture.get_window_geometry`` 回傳視窗的*外框*邊界框(供截圖),但沒有*客戶區*矩形、沒有框邊內縮運算、
也沒有客戶區→螢幕的點對應。RPA 需要「不論標題列高度 / 邊框,在此視窗客戶區的 ``(x, y)`` 點擊」——這是視窗相對
點擊的基礎。本功能加入客戶區矩形、純框邊內縮與客戶區轉螢幕輔助函式,以及一次呼叫的 ``client_point``。

``frame_insets`` / ``client_to_screen`` 是純幾何(可無頭測試);只有 ``get_client_rect`` 的預設讀取器觸及 Win32
(``GetClientRect`` + ``ClientToScreen``),且可注入。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import (get_client_rect, client_point, frame_insets,
client_to_screen)

# 從視窗內容原點(非標題列)往內 20px、往下 30px 點擊。
point = client_point("Calculator", 20, 30)
if point:
click(*point)

rect = get_client_rect("Calculator") # (x, y, width, height)
insets = frame_insets(get_window_geometry("Calculator"), rect) # 邊框大小

``get_client_rect`` 以螢幕座標原點回傳客戶區的 ``(x, y, width, height)``(或 ``None``);``client_point`` 把客戶區
區域內的點對應到螢幕,讓點擊不論視窗外框都落在內容上。``frame_insets`` 由外框與客戶區矩形回傳
``{left, top, right, bottom}`` 邊框 / 標題列厚度,``client_to_screen`` 則是底層的純位移。

執行器命令
----------

``AC_get_client_rect``(``title`` → ``{found, rect}``)與 ``AC_client_point``(``title`` / ``x`` / ``y`` →
``{found, point}``)。它們以 MCP 工具 ``ac_get_client_rect`` / ``ac_client_point`` 以及 Script Builder 中 **Window**
分類下的命令提供。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v147_features_doc
doc/new_features/v148_features_doc
doc/new_features/v149_features_doc
doc/new_features/v150_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
8 changes: 8 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,10 @@
from je_auto_control.utils.perceptual_diff import (
PerceptualDiffResult, assert_perceptual, perceptual_diff,
)
# Window client-area geometry (frame insets, client-to-screen mapping)
from je_auto_control.utils.window_geometry import (
client_point, client_to_screen, frame_insets, get_client_rect,
)
# CI workflow annotations (GitHub Actions)
from je_auto_control.utils.ci_annotations import (
emit_annotations, format_annotation,
Expand Down Expand Up @@ -1227,6 +1231,10 @@ def start_autocontrol_gui(*args, **kwargs):
"perceptual_diff",
"assert_perceptual",
"PerceptualDiffResult",
"frame_insets",
"client_to_screen",
"get_client_rect",
"client_point",
"emit_annotations", "format_annotation",
"ClipboardHistory", "default_clipboard_history",
"analyze_heal_log", "heal_stats", "scan_secrets",
Expand Down
14 changes: 14 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -707,6 +707,20 @@ def _add_window_specs(specs: List[CommandSpec]) -> None:
fields=(FieldSpec("title", FieldType.STRING),),
description="Send a window to the bottom of the z-order.",
))
specs.append(CommandSpec(
"AC_get_client_rect", "Window", "Get Client Rect",
fields=(FieldSpec("title", FieldType.STRING),),
description="A window's client-area rect (excludes title bar / borders).",
))
specs.append(CommandSpec(
"AC_client_point", "Window", "Client-Relative Point",
fields=(
FieldSpec("title", FieldType.STRING),
FieldSpec("x", FieldType.INT),
FieldSpec("y", FieldType.INT),
),
description="Screen point for an (x, y) inside a window's client area.",
))
specs.append(CommandSpec(
"AC_wait_window_closed", "Window", "Wait for Window to Close",
fields=(
Expand Down
18 changes: 18 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3798,6 +3798,22 @@ def _perceptual_diff(actual: str, expected: str, threshold: Any = 0.1,
"diff_ratio": result.diff_ratio, "regions": result.regions}


def _get_client_rect(title: str) -> Dict[str, Any]:
"""Adapter: a window's client-area rect in screen coordinates."""
from je_auto_control.utils.window_geometry import get_client_rect
rect = get_client_rect(title)
return {"found": rect is not None,
"rect": list(rect) if rect is not None else None}


def _client_point(title: str, x: Any, y: Any) -> Dict[str, Any]:
"""Adapter: screen point for a client-area-local (x, y) inside a window."""
from je_auto_control.utils.window_geometry import client_point
point = client_point(title, int(x), int(y))
return {"found": point is not None,
"point": list(point) if point is not None else None}


def _with_modifiers(modifiers: Any, actions: Any) -> Dict[str, Any]:
"""Adapter: run nested actions while modifier keys are held down."""
import json
Expand Down Expand Up @@ -5550,6 +5566,8 @@ def __init__(self):
"AC_send_to_back": _send_to_back,
"AC_soft_assert": _soft_assert,
"AC_perceptual_diff": _perceptual_diff,
"AC_get_client_rect": _get_client_rect,
"AC_client_point": _client_point,
"AC_tile_rect": _tile_rect,
"AC_grid_rects": _grid_rects,
"AC_cascade_rects": _cascade_rects,
Expand Down
30 changes: 29 additions & 1 deletion je_auto_control/utils/mcp_server/tools/_factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -3231,6 +3231,33 @@ def perceptual_diff_tools() -> List[MCPTool]:
]


def window_geometry_tools() -> List[MCPTool]:
return [
MCPTool(
name="ac_get_client_rect",
description=("The client-area rect [x,y,width,height] (screen coords, "
"excluding title bar / borders) of the window matching "
"'title'. Returns {found, rect}. Windows only."),
input_schema=schema({"title": {"type": "string"}}, required=["title"]),
handler=h.get_client_rect,
annotations=READ_ONLY,
),
MCPTool(
name="ac_client_point",
description=("Screen point for a client-area-local (x, y) inside the "
"window 'title' — click inside it regardless of title-bar "
"/ border thickness. Returns {found, point}."),
input_schema=schema({
"title": {"type": "string"},
"x": {"type": "integer"},
"y": {"type": "integer"}},
required=["title", "x", "y"]),
handler=h.client_point,
annotations=READ_ONLY,
),
]


def ssim_tools() -> List[MCPTool]:
return [
MCPTool(
Expand Down Expand Up @@ -6738,7 +6765,8 @@ def media_assert_tools() -> List[MCPTool]:
hsv_segment_tools, text_regions_tools, edge_lines_tools, expect_poll_tools,
locator_chain_tools, rich_clipboard_tools, img_histogram_tools,
motion_regions_tools, window_zorder_tools, soft_assert_tools,
perceptual_diff_tools, plugin_sdk_tools, governance_tools,
perceptual_diff_tools, window_geometry_tools, plugin_sdk_tools,
governance_tools,
credential_lease_tools, egress_tools, approval_testing_tools,
trajectory_eval_tools, compliance_tools, agent_trace_tools,
video_report_tools, fuzzy_tools, artifact_store_tools, image_dedup_tools,
Expand Down
10 changes: 10 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2294,6 +2294,16 @@ def perceptual_diff(actual, expected, threshold=0.1, include_aa=False,
return _perceptual_diff(actual, expected, threshold, include_aa, max_diff_ratio)


def get_client_rect(title):
from je_auto_control.utils.executor.action_executor import _get_client_rect
return _get_client_rect(title)


def client_point(title, x, y):
from je_auto_control.utils.executor.action_executor import _client_point
return _client_point(title, x, y)


def detect_drift(reference, current, threshold=0.25, bins=10):
from je_auto_control.utils.executor.action_executor import _detect_drift
return _detect_drift(reference, current, threshold, bins)
Expand Down
6 changes: 6 additions & 0 deletions je_auto_control/utils/window_geometry/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Window client-area geometry (frame insets, client-to-screen mapping)."""
from je_auto_control.utils.window_geometry.window_geometry import (
client_point, client_to_screen, frame_insets, get_client_rect,
)

__all__ = ["client_point", "client_to_screen", "frame_insets", "get_client_rect"]
75 changes: 75 additions & 0 deletions je_auto_control/utils/window_geometry/window_geometry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
"""Window client-area geometry — frame insets and client-relative point mapping.

``window_capture.get_window_geometry`` returns a window's outer bounding box (for
screenshotting), but there is no *client*-area rect, no frame-inset math, and no
client→screen point mapping. RPA needs "click at ``(x, y)`` *inside* this window's
client area regardless of title-bar height / borders" — the building block for
window-relative clicking. This adds the client rect, the pure frame-inset and
client-to-screen helpers, and a one-call ``client_point``.

``frame_insets`` / ``client_to_screen`` are pure geometry (headless-testable); only
``get_client_rect``'s default reader touches Win32 (``GetClientRect`` +
``ClientToScreen``), and it is injectable. Imports no ``PySide6``.
"""
import sys
from typing import Callable, Dict, Optional, Tuple

Rect = Tuple[int, int, int, int]
RectReader = Callable[[str], Optional[Rect]]


def frame_insets(window_rect: Rect, client_rect: Rect) -> Dict[str, int]:
"""Return the border / title-bar thickness as ``{left, top, right, bottom}``.

Both rects are ``(x, y, width, height)`` in screen coordinates; the client rect is
inset within the window rect by the frame.
"""
wx, wy, ww, wh = (int(v) for v in window_rect[:4])
cx, cy, cw, ch = (int(v) for v in client_rect[:4])
return {"left": cx - wx, "top": cy - wy,
"right": (wx + ww) - (cx + cw), "bottom": (wy + wh) - (cy + ch)}


def client_to_screen(client_rect: Rect, x: int, y: int) -> Tuple[int, int]:
"""Map a client-area-local point to absolute screen coordinates."""
return (int(client_rect[0]) + int(x), int(client_rect[1]) + int(y))


def _default_client_reader(title: str) -> Optional[Rect]:
"""Read a window's client rect in screen coordinates (Win32 only)."""
if not sys.platform.startswith("win"):
return None
from je_auto_control.wrapper.auto_control_window import find_window
hit = find_window(title)
if hit is None:
return None
import ctypes
from ctypes import wintypes
hwnd = int(hit[0])
rect = wintypes.RECT()
if not ctypes.windll.user32.GetClientRect(hwnd, ctypes.byref(rect)):
return None
origin = wintypes.POINT(0, 0)
ctypes.windll.user32.ClientToScreen(hwnd, ctypes.byref(origin))
return (origin.x, origin.y, rect.right - rect.left, rect.bottom - rect.top)


def get_client_rect(title: str, *,
reader: Optional[RectReader] = None) -> Optional[Rect]:
"""Return ``(x, y, width, height)`` of a window's client area (or ``None``).

The origin is in screen coordinates. ``reader`` is injectable for tests; the
default uses Win32 and returns ``None`` on other platforms.
"""
return (reader or _default_client_reader)(title)


def client_point(title: str, x: int, y: int, *,
reader: Optional[RectReader] = None) -> Optional[Tuple[int, int]]:
"""Return the screen point for a client-area-local ``(x, y)`` (or ``None``).

Lets you click at a position *inside* the window regardless of its title-bar /
border thickness.
"""
rect = get_client_rect(title, reader=reader)
return client_to_screen(rect, x, y) if rect is not None else None
Loading
Loading