diff --git a/README/WHATS_NEW_zh-CN.md b/README/WHATS_NEW_zh-CN.md index 6dc730f3..d5fabbb6 100644 --- a/README/WHATS_NEW_zh-CN.md +++ b/README/WHATS_NEW_zh-CN.md @@ -1,5 +1,65 @@ # 本次更新 — AutoControl +## 本次更新 (2026-06-24) — 失败 / 无效果动作的修复策略引擎 + +当动作没效果时选择下一个修复战术——并驱动重试循环。完整参考:[`docs/source/Zh/doc/new_features/v170_features_doc.rst`](../docs/source/Zh/doc/new_features/v170_features_doc.rst)。 + +- **`plan_repair` / `next_tactic` / `run_with_repair`**(`AC_plan_repair`):`self_healing`/`locator_repair` 只修复*无法解析*的定位器;`loop_guard` 只*检测*卡住循环而无战术选择。本功能消费效果判定(例如来自 `action_effect`)并返回有序战术——`wait_retry` / `relocate` / `nudge` / `scroll_into_view` / `escalate`——接着 `run_with_repair` 以注入的 `act` / `verify` / `apply_tactic` / `verdict_for` / `sleep` 接缝驱动有界重试循环,返回 `RepairOutcome`。纯标准库状态机;不导入 `PySide6`。与 `action_effect` + `postcondition` 完成自我修正三件套。 + +## 本次更新 (2026-06-24) — 声明式动作后置条件 + +以 JSON 规格断言动作的预期结果,并对照 before 帧做差异。完整参考:[`docs/source/Zh/doc/new_features/v169_features_doc.rst`](../docs/source/Zh/doc/new_features/v169_features_doc.rst)。 + +- **`check_postcondition` / `compile_postcondition`**(`AC_check_postcondition`):`expect_poll`/`assert_eventually` 轮询单一条件,没有与动作绑定的规格、也没有 before 基准(因此无法表达「一个*新*对话框出现了」);`trajectory_eval` 是整条轨迹层级。本功能对 after 观测评估一个小型 JSON 子句规格——`appears`/`disappears`(对照 `before`)、`enabled`/`disabled`、`text_present`/`text_absent`、`count`——返回逐子句的 `{ok, clauses, failed}` 报告。`compile_postcondition` 把规格转成 `after -> bool` 判定函数以供 `expect_poll` 使用。纯标准库;不导入 `PySide6`。 + +## 本次更新 (2026-06-24) — 边缘形状(Chamfer)模板匹配 + +以轮廓定位扁平图标,对填充 / 主题 / 抗锯齿稳健。完整参考:[`docs/source/Zh/doc/new_features/v168_features_doc.rst`](../docs/source/Zh/doc/new_features/v168_features_doc.rst)。 + +- **`edge_match` / `edge_match_all` / `chamfer_distance`**(`AC_edge_match`、`AC_edge_match_all`):强度 NCC(`visual_match`)在控件换填充 / 换主题时分数下降,ORB(`feature_match`)需要扁平图标缺乏的角点纹理。本功能以*边缘形状*匹配:对两图跑 Canny,对场景边缘做距离变换,把模板边缘滑过它并以平均边缘间距(Chamfer)评分。完美轮廓不论填充皆以约 0 成本对齐。重用 `visual_match` 的加载器 / resize / NMS / `Match` 与 `edge_lines` 的 Canny 默认。`haystack` 可注入;不导入 `PySide6`。 + +## 本次更新 (2026-06-24) — 动作效果分类(我的点击有没有效果?) + +告诉代理点击有没有效果——以及是否发生在它瞄准之处。完整参考:[`docs/source/Zh/doc/new_features/v167_features_doc.rst`](../docs/source/Zh/doc/new_features/v167_features_doc.rst)。 + +- **`classify_effect` / `effect_near_point` / `is_no_op`**(`AC_classify_effect`、`AC_effect_near_point`):`screen_state`/`element_diff` 报告变了什么却不归因到动作;`loop_guard` 要重复 N 次才标记 no-op。本功能比对前后观测,并依动作目标点在*第一步*就分类结果为 `no_op` / `changed_near_target` / `changed_elsewhere`(意外对话框)/ `changed`,返回含变化中心与原因的 `EffectVerdict`。重用 `element_diff.match_elements` + `observation_delta` 的字段变更检查。纯标准库;不导入 `PySide6`。 + +## 本次更新 (2026-06-24) — 表单字段关联(多方向)+ 复选框状态 + +即使值在下方或右对齐也能把标签与值配对,并读取复选框状态。完整参考:[`docs/source/Zh/doc/new_features/v166_features_doc.rst`](../docs/source/Zh/doc/new_features/v166_features_doc.rst)。 + +- **`associate_fields` / `match_labels_to_widgets` / `checkbox_state`**(`AC_associate_fields`、`AC_match_labels_to_widgets`):`ocr/structure` 只把 `label:` 与*紧接的下一格*配对——无法处理标签在上、双列 key/value、右对齐值或非文字 widget,且无复选框概念。本功能把每个标签与多*方向*(右 / 下)中 `max_gap` 内最近的对齐值配对,把独立 widget(复选框 / 单选钮 / 输入框)配到最近标签,并由框内暗像素填充比例读取复选框状态。关联部分纯标准库;只有 `checkbox_state` 触及像素(隔离在 `visual_match` 灰阶加载器之后)。不导入 `PySide6`。 + +## 本次更新 (2026-06-24) — 留白投影列推断(无框线表格) + +靠留白间隙推断列来读取无框线表格。完整参考:[`docs/source/Zh/doc/new_features/v165_features_doc.rst`](../docs/source/Zh/doc/new_features/v165_features_doc.rst)。 + +- **`detect_borderless_table` / `column_gutters` / `assign_columns` / `vertical_projection`**(`AC_detect_borderless_table`、`AC_column_gutters`):`ocr/structure` 只有在每一行单元格左缘 x 都相符时才检测得到表格——对 ragged / 无框线 / 右对齐列都失败;`edge_lines.find_grid` 需要框线,而留白表格没有。本功能靠*间隙*找列:把 OCR 框投影到 x 轴,读出持续为空的垂直带作为 gutter,指派列索引,依间距分组成行,输出 `{n_rows, n_cols, rows, columns}`。纯标准库差分数组投影(不需 numpy);重用 `table_grid_fill` 的框读取器。不导入 `PySide6`。 + +## 本次更新 (2026-06-24) — 自动门槛模板匹配(对分数图做 Otsu) + +不再手调 `min_score`——由分数图推导匹配门槛。完整参考:[`docs/source/Zh/doc/new_features/v164_features_doc.rst`](../docs/source/Zh/doc/new_features/v164_features_doc.rst)。 + +- **`match_auto` / `auto_threshold`**(`AC_match_auto`、`AC_auto_threshold`):每次 `match_template_all` 都迫使你猜 `min_score`(太低充满 NMS 噪声、太高漏掉换肤目标,且因素材而异)。本功能对*相关性分数直方图*套用 Otsu,找出背景相关与真正匹配之间的谷,返回该门槛加上 *separability* 分离度(接近 0 = 单峰、无明确匹配 → 不要信任)。`match_auto` 每个过门槛区域只返回单一峰(通过 `connected_boxes`,避免宽峰上的重复命中),并以 `floor` 夹住。重用新增的 `visual_match._score_map`;`haystack` 可注入;不导入 `PySide6`。 + +## 本次更新 (2026-06-24) — 令牌预算化的观测增量(变更了什么) + +告诉代理*变更了什么*,而非再次整个屏幕。完整参考:[`docs/source/Zh/doc/new_features/v163_features_doc.rst`](../docs/source/Zh/doc/new_features/v163_features_doc.rst)。 + +- **`delta_observation` / `delta_index` / `summarize_delta`**(`AC_delta_observation`):`serialize_observation` 渲染单一整帧(每回合都撑爆令牌预算);`element_diff` 提供稳定 ID 对应但止于 matched/added/removed 的元素配对。本功能正是缺少的序列化器——比对两帧,将配对元素分类为 changed(role/name/enabled/value/移动)或 stable,只渲染变动部分:`+ [i] role "name"` / `~ [i] … (fields)` / `- …`(added 与 changed 优先、stable 略去、上限 `max_lines`)。重用 `element_diff.match_elements` + `observation.observation_index`。纯标准库;不导入 `PySide6`。 + +## 本次更新 (2026-06-24) — 以 OCR 文字填入框线网格(可定址表格) + +把有框线表格的线条 + OCR 文字转成可定址的 `R x C` 表格。完整参考:[`docs/source/Zh/doc/new_features/v162_features_doc.rst`](../docs/source/Zh/doc/new_features/v162_features_doc.rst)。 + +- **`populate_table` / `assign_text_to_grid` / `table_to_records` / `table_to_csv`**(`AC_populate_table`):`edge_lines.find_grid` 能还原表格的框线几何但返回的单元格是*空的*;OCR 提供文字却无结构——两者从未串接。本功能把 OCR 框放入网格(依单元格中心指派,以重叠比例把关,使横跨细框线的框不被重复计入),将每个单元格的文字依阅读顺序串接,标记合并单元格的 span,并可直接转成 records / CSV。纯标准库,作用于纯字典——不需图像、OCR 引擎或设备。不导入 `PySide6`。 + +## 本次更新 (2026-06-24) — 信任评分模板匹配(歧义 / PSR) + +在点击前就知道某次模板匹配虽强但*有歧义*。完整参考:[`docs/source/Zh/doc/new_features/v161_features_doc.rst`](../docs/source/Zh/doc/new_features/v161_features_doc.rst)。 + +- **`match_with_trust` / `score_peaks`**(`AC_match_with_trust`):`match_template` 只返回最高分并点击——但工具栏中重复的按钮或近乎相同的同类控件可能在两处都相关到 ~0.95,因此高分并非*无歧义*的匹配。本功能为像素模板加入 Lowe 式比值测试(ORB 通过 `feature_match` 已有,`match_template` 从未有):检视整个相关性曲面,比较全局峰值与排除窗口外的次高峰,计算峰值对旁瓣比(PSR),返回带有 `second_score` / `peak_ratio` / `psr` / `is_ambiguous` 的 `TrustedMatch`。重用新增的 `visual_match._score_map`(公开匹配器丢弃的完整 `matchTemplate` 曲面)——不重复任何匹配代码。`haystack` 可注入;不导入 `PySide6`。 + ## 本次更新 (2026-06-23) — 剪贴板文件拖放列表(CF_HDROP) 把一份文件列表放上剪贴板,可直接粘贴进 Explorer。完整参考:[`docs/source/Zh/doc/new_features/v160_features_doc.rst`](../docs/source/Zh/doc/new_features/v160_features_doc.rst)。 diff --git a/README/WHATS_NEW_zh-TW.md b/README/WHATS_NEW_zh-TW.md index 19e5bf8e..4fe1e509 100644 --- a/README/WHATS_NEW_zh-TW.md +++ b/README/WHATS_NEW_zh-TW.md @@ -1,5 +1,65 @@ # 本次更新 — AutoControl +## 本次更新 (2026-06-24) — 失敗 / 無效果動作的修復策略引擎 + +當動作沒效果時選擇下一個修復戰術——並驅動重試迴圈。完整參考:[`docs/source/Zh/doc/new_features/v170_features_doc.rst`](../docs/source/Zh/doc/new_features/v170_features_doc.rst)。 + +- **`plan_repair` / `next_tactic` / `run_with_repair`**(`AC_plan_repair`):`self_healing`/`locator_repair` 只修復*無法解析*的定位器;`loop_guard` 只*偵測*卡住迴圈而無戰術選擇。本功能消費效果判定(例如來自 `action_effect`)並回傳有序戰術——`wait_retry` / `relocate` / `nudge` / `scroll_into_view` / `escalate`——接著 `run_with_repair` 以注入的 `act` / `verify` / `apply_tactic` / `verdict_for` / `sleep` 接縫驅動有界重試迴圈,回傳 `RepairOutcome`。純標準函式庫狀態機;不匯入 `PySide6`。與 `action_effect` + `postcondition` 完成自我修正三件套。 + +## 本次更新 (2026-06-24) — 宣告式動作後置條件 + +以 JSON 規格斷言動作的預期結果,並對照 before 幀做差異。完整參考:[`docs/source/Zh/doc/new_features/v169_features_doc.rst`](../docs/source/Zh/doc/new_features/v169_features_doc.rst)。 + +- **`check_postcondition` / `compile_postcondition`**(`AC_check_postcondition`):`expect_poll`/`assert_eventually` 輪詢單一條件,沒有與動作綁定的規格、也沒有 before 基準(因此無法表達「一個*新*對話框出現了」);`trajectory_eval` 是整條軌跡層級。本功能對 after 觀測評估一個小型 JSON 子句規格——`appears`/`disappears`(對照 `before`)、`enabled`/`disabled`、`text_present`/`text_absent`、`count`——回傳逐子句的 `{ok, clauses, failed}` 報告。`compile_postcondition` 把規格轉成 `after -> bool` 判定函式以供 `expect_poll` 使用。純標準函式庫;不匯入 `PySide6`。 + +## 本次更新 (2026-06-24) — 邊緣形狀(Chamfer)樣板比對 + +以輪廓定位扁平圖示,對填充 / 主題 / 抗鋸齒穩健。完整參考:[`docs/source/Zh/doc/new_features/v168_features_doc.rst`](../docs/source/Zh/doc/new_features/v168_features_doc.rst)。 + +- **`edge_match` / `edge_match_all` / `chamfer_distance`**(`AC_edge_match`、`AC_edge_match_all`):強度 NCC(`visual_match`)在控制項換填充 / 換主題時分數下降,ORB(`feature_match`)需要扁平圖示缺乏的角點紋理。本功能以*邊緣形狀*比對:對兩圖跑 Canny,對場景邊緣做距離轉換,把樣板邊緣滑過它並以平均邊緣間距(Chamfer)評分。完美輪廓不論填充皆以約 0 成本對齊。重用 `visual_match` 的載入器 / resize / NMS / `Match` 與 `edge_lines` 的 Canny 預設。`haystack` 可注入;不匯入 `PySide6`。 + +## 本次更新 (2026-06-24) — 動作效果分類(我的點擊有沒有效果?) + +告訴代理點擊有沒有效果——以及是否發生在它瞄準之處。完整參考:[`docs/source/Zh/doc/new_features/v167_features_doc.rst`](../docs/source/Zh/doc/new_features/v167_features_doc.rst)。 + +- **`classify_effect` / `effect_near_point` / `is_no_op`**(`AC_classify_effect`、`AC_effect_near_point`):`screen_state`/`element_diff` 回報變了什麼卻不歸因到動作;`loop_guard` 要重複 N 次才標記 no-op。本功能比對前後觀測,並依動作目標點在*第一步*就分類結果為 `no_op` / `changed_near_target` / `changed_elsewhere`(意外對話框)/ `changed`,回傳含變化中心與原因的 `EffectVerdict`。重用 `element_diff.match_elements` + `observation_delta` 的欄位變更檢查。純標準函式庫;不匯入 `PySide6`。 + +## 本次更新 (2026-06-24) — 表單欄位關聯(多方向)+ 核取方塊狀態 + +即使值在下方或右對齊也能把標籤與值配對,並讀取核取方塊狀態。完整參考:[`docs/source/Zh/doc/new_features/v166_features_doc.rst`](../docs/source/Zh/doc/new_features/v166_features_doc.rst)。 + +- **`associate_fields` / `match_labels_to_widgets` / `checkbox_state`**(`AC_associate_fields`、`AC_match_labels_to_widgets`):`ocr/structure` 只把 `label:` 與*緊接的下一格*配對——無法處理標籤在上、雙欄 key/value、右對齊值或非文字 widget,且無核取方塊概念。本功能把每個標籤與多*方向*(右 / 下)中 `max_gap` 內最近的對齊值配對,把獨立 widget(核取方塊 / 單選鈕 / 輸入框)配到最近標籤,並由框內暗像素填充比例讀取核取方塊狀態。關聯部分純標準函式庫;只有 `checkbox_state` 觸及像素(隔離在 `visual_match` 灰階載入器之後)。不匯入 `PySide6`。 + +## 本次更新 (2026-06-24) — 留白投影欄位偵測(無框線表格) + +靠留白間隙推導欄位來讀取無框線表格。完整參考:[`docs/source/Zh/doc/new_features/v165_features_doc.rst`](../docs/source/Zh/doc/new_features/v165_features_doc.rst)。 + +- **`detect_borderless_table` / `column_gutters` / `assign_columns` / `vertical_projection`**(`AC_detect_borderless_table`、`AC_column_gutters`):`ocr/structure` 只有在每一列儲存格左緣 x 都相符時才偵測得到表格——對 ragged / 無框線 / 右對齊欄都失敗;`edge_lines.find_grid` 需要框線,而留白表格沒有。本功能靠*間隙*找欄位:把 OCR 框投影到 x 軸,讀出持續為空的垂直帶作為 gutter,指派欄索引,依間距分群成列,輸出 `{n_rows, n_cols, rows, columns}`。純標準函式庫差分陣列投影(不需 numpy);重用 `table_grid_fill` 的框讀取器。不匯入 `PySide6`。 + +## 本次更新 (2026-06-24) — 自動門檻樣板比對(對分數圖做 Otsu) + +不再手調 `min_score`——由分數圖推導比對門檻。完整參考:[`docs/source/Zh/doc/new_features/v164_features_doc.rst`](../docs/source/Zh/doc/new_features/v164_features_doc.rst)。 + +- **`match_auto` / `auto_threshold`**(`AC_match_auto`、`AC_auto_threshold`):每次 `match_template_all` 都迫使你猜 `min_score`(太低充滿 NMS 雜訊、太高漏掉換膚目標,且因素材而異)。本功能對*相關性分數直方圖*套用 Otsu,找出背景相關與真正匹配之間的谷,回傳該門檻加上 *separability* 分離度(接近 0 = 單峰、無明確匹配 → 不要信任)。`match_auto` 每個過門檻區域只回傳單一峰(透過 `connected_boxes`,避免寬峰上的重複命中),並以 `floor` 夾住。重用新增的 `visual_match._score_map`;`haystack` 可注入;不匯入 `PySide6`。 + +## 本次更新 (2026-06-24) — 權杖預算化的觀測差異(變更了什麼) + +告訴代理*變更了什麼*,而非再次整個畫面。完整參考:[`docs/source/Zh/doc/new_features/v163_features_doc.rst`](../docs/source/Zh/doc/new_features/v163_features_doc.rst)。 + +- **`delta_observation` / `delta_index` / `summarize_delta`**(`AC_delta_observation`):`serialize_observation` 渲染單一整幀(每回合都撐爆權杖預算);`element_diff` 提供穩定 ID 對應但止於 matched/added/removed 的元素配對。本功能正是缺少的序列化器——比對兩幀,將配對元素分類為 changed(role/name/enabled/value/移動)或 stable,只渲染變動部分:`+ [i] role "name"` / `~ [i] … (fields)` / `- …`(added 與 changed 優先、stable 略去、上限 `max_lines`)。重用 `element_diff.match_elements` + `observation.observation_index`。純標準函式庫;不匯入 `PySide6`。 + +## 本次更新 (2026-06-24) — 以 OCR 文字填入框線網格(可定址表格) + +把有框線表格的線條 + OCR 文字轉成可定址的 `R x C` 表格。完整參考:[`docs/source/Zh/doc/new_features/v162_features_doc.rst`](../docs/source/Zh/doc/new_features/v162_features_doc.rst)。 + +- **`populate_table` / `assign_text_to_grid` / `table_to_records` / `table_to_csv`**(`AC_populate_table`):`edge_lines.find_grid` 能還原表格的框線幾何但回傳的儲存格是*空的*;OCR 提供文字卻無結構——兩者從未串接。本功能把 OCR 框放入網格(依儲存格中心指派,以重疊比例把關,使橫跨細框線的框不被重複計入),將每個儲存格的文字依閱讀順序串接,標記合併儲存格的 span,並可直接轉成 records / CSV。純標準函式庫,作用於純字典——不需影像、OCR 引擎或裝置。不匯入 `PySide6`。 + +## 本次更新 (2026-06-24) — 信任評分樣板比對(歧義 / PSR) + +在點擊前就知道某次樣板比對雖強但*有歧義*。完整參考:[`docs/source/Zh/doc/new_features/v161_features_doc.rst`](../docs/source/Zh/doc/new_features/v161_features_doc.rst)。 + +- **`match_with_trust` / `score_peaks`**(`AC_match_with_trust`):`match_template` 只回傳最高分並點擊——但工具列中重複的按鈕或近乎相同的同類元件可能在兩處都相關到 ~0.95,因此高分並非*無歧義*的比對。本功能為像素樣板加入 Lowe 式比值測試(ORB 透過 `feature_match` 已有,`match_template` 從未有):檢視整個相關性曲面,比較全域峰值與排除視窗外的次高峰,計算峰值對旁瓣比(PSR),回傳帶有 `second_score` / `peak_ratio` / `psr` / `is_ambiguous` 的 `TrustedMatch`。重用新增的 `visual_match._score_map`(公開比對器丟棄的完整 `matchTemplate` 曲面)——不重複任何比對程式。`haystack` 可注入;不匯入 `PySide6`。 + ## 本次更新 (2026-06-23) — 剪貼簿檔案拖放清單(CF_HDROP) 把一份檔案清單放上剪貼簿,可直接貼進 Explorer。完整參考:[`docs/source/Zh/doc/new_features/v160_features_doc.rst`](../docs/source/Zh/doc/new_features/v160_features_doc.rst)。 diff --git a/WHATS_NEW.md b/WHATS_NEW.md index a90d4ac7..d1b4ebd6 100644 --- a/WHATS_NEW.md +++ b/WHATS_NEW.md @@ -1,5 +1,65 @@ # What's New — AutoControl +## What's new (2026-06-24) — Repair-Tactic Policy for Failed / No-Effect Actions + +Pick the next repair tactic when an action does nothing — and drive the retry loop. Full reference: [`docs/source/Eng/doc/new_features/v170_features_doc.rst`](docs/source/Eng/doc/new_features/v170_features_doc.rst). + +- **`plan_repair` / `next_tactic` / `run_with_repair`** (`AC_plan_repair`): `self_healing`/`locator_repair` only fix a locator that *didn't resolve*; `loop_guard` only *detects* a stuck loop with no tactic selection. This consumes an effect verdict (e.g. from `action_effect`) and returns the ordered tactics to try — `wait_retry` / `relocate` / `nudge` / `scroll_into_view` / `escalate` — then `run_with_repair` drives a bounded retry loop with injected `act` / `verify` / `apply_tactic` / `verdict_for` / `sleep` seams, returning a `RepairOutcome`. Pure-stdlib state machine; no `PySide6`. Completes the self-correction trio with `action_effect` + `postcondition`. + +## What's new (2026-06-24) — Declarative Action Postconditions + +Assert an action's expected outcome as a JSON spec, diffed against the before-frame. Full reference: [`docs/source/Eng/doc/new_features/v169_features_doc.rst`](docs/source/Eng/doc/new_features/v169_features_doc.rst). + +- **`check_postcondition` / `compile_postcondition`** (`AC_check_postcondition`): `expect_poll`/`assert_eventually` poll a single condition with no action-bound spec and no before-baseline (so they can't express "a *new* dialog appeared"); `trajectory_eval` is whole-trajectory. This evaluates a small JSON spec of clauses — `appears`/`disappears` (diffed vs `before`), `enabled`/`disabled`, `text_present`/`text_absent`, `count` — against the after-observation, returning a per-clause `{ok, clauses, failed}` report. `compile_postcondition` turns a spec into an `after -> bool` predicate for `expect_poll`. Pure-stdlib; no `PySide6`. + +## What's new (2026-06-24) — Edge-Shape (Chamfer) Template Matching + +Locate flat icons by outline, robust to fill / theme / anti-aliasing. Full reference: [`docs/source/Eng/doc/new_features/v168_features_doc.rst`](docs/source/Eng/doc/new_features/v168_features_doc.rst). + +- **`edge_match` / `edge_match_all` / `chamfer_distance`** (`AC_edge_match`, `AC_edge_match_all`): intensity NCC (`visual_match`) drops when a control is re-filled / re-themed, and ORB (`feature_match`) needs corner texture flat-design glyphs lack. This matches by *edge shape*: Canny both images, distance-transform the scene edges, slide the template's edges over it and score by mean edge-to-edge distance (Chamfer). A perfect outline aligns at ~0 cost regardless of fill. Reuses `visual_match`'s loaders / resize / NMS / `Match` and `edge_lines`'s Canny default. Injectable `haystack`; no `PySide6`. + +## What's new (2026-06-24) — Action-Effect Classification (Did My Click Do Anything?) + +Tell an agent whether a click did anything — and whether it happened where it aimed. Full reference: [`docs/source/Eng/doc/new_features/v167_features_doc.rst`](docs/source/Eng/doc/new_features/v167_features_doc.rst). + +- **`classify_effect` / `effect_near_point` / `is_no_op`** (`AC_classify_effect`, `AC_effect_near_point`): `screen_state`/`element_diff` report what changed but never tie it to the action; `loop_guard` only flags a no-op after N repeats. This diffs the before/after observation and, given the action's target point, classifies the result on the *first* step as `no_op` / `changed_near_target` / `changed_elsewhere` (a surprise dialog) / `changed`, returning an `EffectVerdict` with the changed centres and a reason. Reuses `element_diff.match_elements` + `observation_delta`'s field-change check. Pure-stdlib; no `PySide6`. + +## What's new (2026-06-24) — Form Field Association (Multi-Direction) + Checkbox State + +Pair form labels with values even when the value is below or right-aligned, and read checkbox state. Full reference: [`docs/source/Eng/doc/new_features/v166_features_doc.rst`](docs/source/Eng/doc/new_features/v166_features_doc.rst). + +- **`associate_fields` / `match_labels_to_widgets` / `checkbox_state`** (`AC_associate_fields`, `AC_match_labels_to_widgets`): `ocr/structure` only pairs a `label:` with the *immediately next* cell — it can't handle label-above-value, two-column key/value, right-aligned values, or non-text widgets, and has no checkbox notion. This pairs each label with the nearest aligned value across *directions* (right / below) within `max_gap`, matches free-standing widgets (checkbox/radio/input) to their nearest label, and reads checkbox state from the box's dark-pixel fill ratio. Association is pure-stdlib; only `checkbox_state` touches pixels (behind the `visual_match` gray loader). No `PySide6`. + +## What's new (2026-06-24) — Whitespace-Projection Columns (Borderless Tables) + +Read borderless tables by inferring columns from the whitespace gaps. Full reference: [`docs/source/Eng/doc/new_features/v165_features_doc.rst`](docs/source/Eng/doc/new_features/v165_features_doc.rst). + +- **`detect_borderless_table` / `column_gutters` / `assign_columns` / `vertical_projection`** (`AC_detect_borderless_table`, `AC_column_gutters`): `ocr/structure` only detects a table when every row's cell-left-x matches — it fails on ragged / borderless / right-aligned columns; `edge_lines.find_grid` needs ruling lines a whitespace table doesn't have. This finds columns by the *gaps*: project OCR boxes onto the x-axis, read the persistent empty vertical bands as gutters, assign column indices, bucket rows by spacing, and emit `{n_rows, n_cols, rows, columns}`. Pure-stdlib difference-array projection (no numpy); reuses `table_grid_fill`'s box reader. No `PySide6`. + +## What's new (2026-06-24) — Auto-Thresholded Template Matching (Otsu on the Score Map) + +No more hand-tuned `min_score` — derive the match threshold from the score map. Full reference: [`docs/source/Eng/doc/new_features/v164_features_doc.rst`](docs/source/Eng/doc/new_features/v164_features_doc.rst). + +- **`match_auto` / `auto_threshold`** (`AC_match_auto`, `AC_auto_threshold`): every `match_template_all` call forces you to guess `min_score` (too low floods NMS, too high drops re-themed targets, and it differs per asset). This runs Otsu on the *correlation score histogram* to find the valley between background correlation and real matches, returns that cut-off plus a *separability* score (near 0 = unimodal, no clear match → don't trust it). `match_auto` returns one peak per above-threshold region (via `connected_boxes`, avoiding duplicate hits on a wide peak), clamped by a `floor`. Reuses the new `visual_match._score_map`; injectable `haystack`; no `PySide6`. + +## What's new (2026-06-24) — Token-Budgeted Observation Delta (What Changed) + +Tell an agent *what changed* since the last step, not the whole screen again. Full reference: [`docs/source/Eng/doc/new_features/v163_features_doc.rst`](docs/source/Eng/doc/new_features/v163_features_doc.rst). + +- **`delta_observation` / `delta_index` / `summarize_delta`** (`AC_delta_observation`): `serialize_observation` renders one full frame (blows the token budget every turn); `element_diff` gives the stable-ID correspondence but stops at matched/added/removed element pairs. This is the missing serializer — it diffs two frames, classifies matched elements as changed (role/name/enabled/value/moved) or stable, and renders only the churn as `+ [i] role "name"` / `~ [i] … (fields)` / `- …` lines (added & changed first, stable dropped, capped at `max_lines`). Reuses `element_diff.match_elements` + `observation.observation_index`. Pure-stdlib; no `PySide6`. + +## What's new (2026-06-24) — Fill a Ruling-Line Grid With OCR Text (Addressable Tables) + +Turn a bordered table's lines + OCR words into an addressable `R x C` table. Full reference: [`docs/source/Eng/doc/new_features/v162_features_doc.rst`](docs/source/Eng/doc/new_features/v162_features_doc.rst). + +- **`populate_table` / `assign_text_to_grid` / `table_to_records` / `table_to_csv`** (`AC_populate_table`): `edge_lines.find_grid` recovers a table's ruling-line geometry but the cells come back *empty*; OCR gives the text but no structure — nothing joined them. This drops OCR boxes into the grid (assigned by cell-centre, gated by an overlap fraction so a box straddling a thin rule isn't double-counted), concatenates each cell's text in reading order, flags merged-cell spans, and converts straight to records / CSV. Pure-stdlib over plain dicts — no image, no OCR engine, no device. No `PySide6`. + +## What's new (2026-06-24) — Trust-Scored Template Matching (Ambiguity / PSR) + +Know when a template match is strong but *ambiguous* before clicking it. Full reference: [`docs/source/Eng/doc/new_features/v161_features_doc.rst`](docs/source/Eng/doc/new_features/v161_features_doc.rst). + +- **`match_with_trust` / `score_peaks`** (`AC_match_with_trust`): `match_template` returns only the top score and clicks it — but a button repeated in a toolbar or a near-identical sibling correlates ~0.95 in two places, so a high score is not an *unambiguous* match. This adds a Lowe-style ratio test *for pixel templates* (ORB got one via `feature_match`; `match_template` never did): it inspects the whole correlation surface, compares the global peak to the next-best peak outside an exclusion window, computes the peak-to-sidelobe ratio (PSR), and returns a `TrustedMatch` with `second_score` / `peak_ratio` / `psr` / `is_ambiguous`. Reuses a new `visual_match._score_map` (the full `matchTemplate` surface the public matchers discard) — no matching code duplicated. Injectable `haystack`; no `PySide6`. + ## What's new (2026-06-23) — Clipboard File-Drop List (CF_HDROP) Put a list of files on the clipboard, ready to paste into Explorer. Full reference: [`docs/source/Eng/doc/new_features/v160_features_doc.rst`](docs/source/Eng/doc/new_features/v160_features_doc.rst). diff --git a/docs/source/Eng/doc/new_features/v161_features_doc.rst b/docs/source/Eng/doc/new_features/v161_features_doc.rst new file mode 100644 index 00000000..7237337f --- /dev/null +++ b/docs/source/Eng/doc/new_features/v161_features_doc.rst @@ -0,0 +1,46 @@ +Trust-Scored Template Matching (Ambiguity / PSR) +================================================ + +``match_template`` returns the single best score and happily clicks it — but a control +repeated in a toolbar, or a near-identical sibling, correlates ~0.95 in *two* places, so a +high score does **not** mean an *unambiguous* match, and the matcher can confidently click +the wrong one. ``match_with_trust`` adds a Lowe-style ratio test *for pixel templates* +(``feature_match`` already does this for ORB keypoints, but nothing did it for +``match_template``): it inspects the whole correlation surface, compares the global peak to +the next-best peak outside an exclusion window, and computes the peak-to-sidelobe ratio +(PSR), flagging matches that are strong-but-ambiguous. + +It reuses ``visual_match._score_map`` — the full ``matchTemplate`` surface the public matchers +discard — so no matching code is duplicated. The ``haystack`` is injectable (ndarray / path / +PIL); the analysis is unit-testable on synthetic arrays. Imports no ``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import match_with_trust, score_peaks + + hit = match_with_trust("save_button.png", min_score=0.8) + if hit and not hit.is_ambiguous: + click(*hit.center) + elif hit: + print("ambiguous!", hit.peak_ratio, "second:", hit.second_score) + + # just the metrics, no match object + print(score_peaks("icon.png")) # {best, second, peak_ratio, psr, ambiguous, location} + +``match_with_trust`` returns a ``TrustedMatch`` (``x`` / ``y`` / ``width`` / ``height`` / +``score`` / ``scale`` / ``second_score`` / ``peak_ratio`` / ``psr`` / ``is_ambiguous`` + +``center``) or ``None``. ``is_ambiguous`` is set when the next-best peak scores at least +``ambiguous_ratio`` (default 0.9) times the best. ``psr`` is the peak-to-sidelobe ratio +(``None`` when the sidelobe is perfectly flat). ``score_peaks`` returns just the metric dict +at scale 1.0. + +Executor command +---------------- + +``AC_match_with_trust`` (``template`` / ``min_score`` / ``scales`` / ``ambiguous_ratio`` / +``region`` / ``method`` → ``{found, match}``) is exposed as the MCP tool +``ac_match_with_trust`` (read-only) and as the Script Builder command **Match Template +(trust-scored)** under **Image**. diff --git a/docs/source/Eng/doc/new_features/v162_features_doc.rst b/docs/source/Eng/doc/new_features/v162_features_doc.rst new file mode 100644 index 00000000..1ed11fc6 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v162_features_doc.rst @@ -0,0 +1,46 @@ +Fill a Ruling-Line Grid With OCR Text (Addressable Tables) +========================================================== + +``edge_lines.find_grid`` recovers a bordered table's geometry — ``{rows: [y…], cols: [x…], +cells: […]}`` — but the cells come back *empty* (just rectangles between the ruling lines). +OCR gives the text but no table structure. Nothing joined the two, so reading a bordered +table meant hand-rolling the box→cell assignment. ``table_grid_fill`` drops OCR text boxes +into the grid and returns an addressable ``R x C`` table. + +Each box is assigned to the cell its centre falls in (gated by an overlap fraction so a box +straddling a thin rule is not double counted); text within a cell is concatenated in reading +order; boxes that straddle multiple cells are reported as merged-cell candidates. The result +converts straight to records or CSV. + +Pure-stdlib geometry over plain dicts (the grid + the boxes) — no image, no OCR engine, no +device. Imports no ``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import (find_grid, find_text_lines, # producers + populate_table, assign_text_to_grid, + table_to_records, table_to_csv) + + grid = find_grid(region=[0, 0, 800, 400]) # ruling-line geometry + boxes = [{"x": 10, "y": 5, "width": 60, "height": 20, "text": "Name"}, ...] + + table = assign_text_to_grid(grid, boxes) # [["Name","Age"], ["Ann","30"]] + records = table_to_records(table) # [{"Name": "Ann", "Age": "30"}] + csv_text = table_to_csv(table) + + full = populate_table(grid, boxes) # {n_rows, n_cols, cells, spans} + +``assign_text_to_grid`` returns the 2-D text table; ``populate_table`` returns the richer +``{n_rows, n_cols, cells:[{row, col, text}], spans:[{row, col, row_span, col_span, text}]}``. +``table_to_records`` uses the first row as headers; ``table_to_csv`` renders CSV. Boxes accept +either ``{x, y, width, height}`` or ``{left, top, right, bottom}`` plus a ``text`` field. + +Executor command +---------------- + +``AC_populate_table`` (``grid`` / ``text_boxes`` / ``overlap`` → ``{n_rows, n_cols, cells, +spans}``) is exposed as the MCP tool ``ac_populate_table`` (read-only) and as the Script +Builder command **Fill Table From Grid + OCR** under **OCR**. diff --git a/docs/source/Eng/doc/new_features/v163_features_doc.rst b/docs/source/Eng/doc/new_features/v163_features_doc.rst new file mode 100644 index 00000000..88a035b3 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v163_features_doc.rst @@ -0,0 +1,45 @@ +Token-Budgeted Observation Delta (What Changed) +=============================================== + +``observation.serialize_observation`` renders *one full frame* of the UI — feeding it to a +model every turn blows the very token budget that module was built to respect, and forces the +model to re-read the whole screen just to spot the one new dialog. ``element_diff`` gives the +stable-ID correspondence between two frames but stops at matched / added / removed *element +pairs* — it does not render a compact, indexed, budget-capped delta the model can act on. + +``observation_delta`` is the missing serializer: it diffs the previous and current observation, +classifies each matched element as *changed* (role / name / enabled / value / moved) or +*stable*, and renders only the churn — ``+ [i] role "name"`` (appeared) / ``- role "name"`` +(vanished) / ``~ [i] role "name" (fields)`` (changed) — added and changed first, stable dropped, +capped at ``max_lines``. The model sees *what changed* instead of the whole screen again. + +Pure-stdlib over element dicts; reuses ``element_diff.match_elements`` for the overlap join and +``observation.observation_index`` for reading-order indexing. Imports no ``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import delta_observation, delta_index, summarize_delta + + summary = delta_observation(prev_elements, curr_elements, max_lines=40) + # + [12] dialog "Saved" + # ~ [4] button "Submit" (enabled) + # - button "Spinner" + + delta = delta_index(prev_elements, curr_elements) # {added, removed, changed, stable} + text = summarize_delta(delta, max_lines=20) + +``delta_index`` returns ``{added, removed, changed, stable}`` (``changed`` items are +``{"after", "fields"}``); ``summarize_delta`` renders a ``delta_index`` result as budget-capped +``+`` / ``~`` / ``-`` lines; ``delta_observation`` indexes both frames (reading order, viewport +clip, interactive-only) then diffs and renders in one call. + +Executor command +---------------- + +``AC_delta_observation`` (``prev`` / ``curr`` / ``viewport`` / ``max_elements`` / ``max_lines`` +/ ``interactive_only`` → ``{summary, added, removed, changed}``) is exposed as the MCP tool +``ac_delta_observation`` (read-only) and as the Script Builder command **Observation: Delta +(what changed)** under **Native UI**. diff --git a/docs/source/Eng/doc/new_features/v164_features_doc.rst b/docs/source/Eng/doc/new_features/v164_features_doc.rst new file mode 100644 index 00000000..9138c109 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v164_features_doc.rst @@ -0,0 +1,47 @@ +Auto-Thresholded Template Matching (Otsu on the Score Map) +========================================================== + +Every call to ``match_template_all`` forces the caller to guess ``min_score``: too low +floods NMS with background noise, too high drops scaled / re-themed targets, and the right +value differs per asset and per screen. ``match_autothresh`` removes the magic number — it +runs Otsu's method on the *correlation score histogram* (not pixel intensities, the way +``preprocess.binarize`` does) to find the valley between the "background correlation" mass +and the "real match" mass, and returns that cut-off plus a *separability* number so the +caller knows when the histogram is unimodal (no clear match → don't trust the threshold). + +It reuses ``visual_match._score_map`` (the full ``matchTemplate`` surface the public matchers +discard) and ``cv2_utils.blobs.connected_boxes`` — each above-threshold region yields a single +peak, avoiding the duplicate hits a raw pixel scan + NMS leaves on a wide correlation peak. The +``haystack`` is injectable; the analysis is unit-testable on synthetic arrays. Imports no +``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import match_auto, auto_threshold + + # no min_score to tune — the threshold is derived from the score map + for hit in match_auto("save_button.png", floor=0.5): + print(hit.center, hit.score) + + info = auto_threshold("save_button.png") + # {"threshold": 0.83, "separability": 0.61, "n_above": 2} + if info["separability"] < 0.3: + print("no clear match — threshold not trustworthy") + +``match_auto`` returns one ``Match`` per above-threshold region, ordered by score and capped at +``max_results``; the cut-off is ``max(floor, otsu_threshold)`` so a unimodal / noisy surface +cannot drag it below a sane floor. ``auto_threshold`` returns ``{threshold, separability, +n_above}`` — ``separability`` near 0 means the score histogram is unimodal and the threshold +should be treated as unreliable. + +Executor commands +----------------- + +``AC_match_auto`` (``template`` / ``floor`` / ``max_results`` / ``region`` / ``method`` → +``{count, matches}``) and ``AC_auto_threshold`` (``template`` / ``region`` / ``method`` → +``{found, info}``). They are exposed as the MCP tools ``ac_match_auto`` / ``ac_auto_threshold`` +(read-only) and as the Script Builder commands **Match Template (auto-threshold)** / **Auto +Threshold (Otsu on scores)** under **Image**. diff --git a/docs/source/Eng/doc/new_features/v165_features_doc.rst b/docs/source/Eng/doc/new_features/v165_features_doc.rst new file mode 100644 index 00000000..ac16a688 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v165_features_doc.rst @@ -0,0 +1,46 @@ +Whitespace-Projection Columns (Borderless Tables) +================================================= + +``ocr/structure`` detects tables only when *every* row's cell-left-x matches within a +tolerance — it collapses on ragged or borderless tables, right-aligned numeric columns, or +any row with a missing cell. ``edge_lines.find_grid`` needs ruling lines, so a table drawn +purely with whitespace has no grid at all. ``column_layout`` finds columns the robust way the +layout-analysis literature uses: by the *gaps*. It projects the OCR boxes onto the x-axis (an +ink-density profile), reads off the persistent empty vertical bands as column gutters, assigns +each box a column index, and buckets rows by vertical spacing to emit a borderless table. + +Pure-stdlib over plain box dicts (a difference-array projection — no numpy), so it is fully +unit-testable with no image and no OCR engine. Reuses ``table_grid_fill``'s box-bounds reader. +Imports no ``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import (detect_borderless_table, column_gutters, + assign_columns, vertical_projection) + + table = detect_borderless_table(ocr_boxes) + # {"n_rows": 3, "n_cols": 2, "rows": [["Name","Age"],["Ann","30"],["Bob","25"]], + # "columns": [{"start": 70, "end": 120, "width": 50}]} + + gutters = column_gutters(ocr_boxes, min_gap=8) # empty vertical bands + tagged = assign_columns(ocr_boxes) # each box + "column" index + profile = vertical_projection(ocr_boxes) # ink density per x + +``vertical_projection`` returns the per-x ink-density profile; ``column_gutters`` returns the +interior empty bands ``[{start, end, width}]`` at least ``min_gap`` wide; ``assign_columns`` +tags every box with a 0-based ``column``; ``detect_borderless_table`` combines columns (from +gutters) with rows (from vertical spacing) into ``{n_rows, n_cols, rows, columns}``, or +``None`` when fewer than ``min_cols`` columns / ``min_rows`` rows are found. Boxes accept +``{x, y, width, height}`` or ``{left, top, right, bottom}`` plus an optional ``text``. + +Executor commands +----------------- + +``AC_detect_borderless_table`` (``boxes`` / ``page_width`` / ``min_gap`` / ``min_cols`` / +``min_rows`` → ``{found, table}``) and ``AC_column_gutters`` (``boxes`` / ``page_width`` / +``min_gap`` → ``{count, gutters}``). They are exposed as the MCP tools +``ac_detect_borderless_table`` / ``ac_column_gutters`` (read-only) and as the Script Builder +commands **Detect Borderless Table** / **Column Gutters (whitespace)** under **OCR**. diff --git a/docs/source/Eng/doc/new_features/v166_features_doc.rst b/docs/source/Eng/doc/new_features/v166_features_doc.rst new file mode 100644 index 00000000..c35bce63 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v166_features_doc.rst @@ -0,0 +1,47 @@ +Form Field Association (Multi-Direction) + Checkbox State +========================================================= + +``ocr/structure`` recognises a label only if its text ends in ``:`` and pairs it with the +*immediately next* cell — it cannot handle a label sitting *above* its value, a two-column +key/value layout, right-aligned values, or any widget that isn't a text cell, and it has no +notion of checkbox / radio state at all. ``form_fields`` generalises this: it pairs each +label with the nearest aligned value in any of several *directions* (right, below), matches +free-standing widgets (checkboxes, radios, inputs) to their nearest label, and reads a +checkbox's checked state from its fill ratio. + +The association is pure-stdlib over plain box dicts (fully unit-testable, no image); only +``checkbox_state`` touches pixels, isolated behind the shared ``visual_match`` gray loader so +tests can pass a synthetic array. Reuses ``table_grid_fill``'s box-bounds reader. Imports no +``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import (associate_fields, match_labels_to_widgets, + checkbox_state) + + fields = associate_fields(ocr_boxes, directions=("right", "below")) + # [{"label": "Name", "value": "Ann", "direction": "right", "gap": 20, ...}] + + pairs = match_labels_to_widgets(label_boxes, checkbox_boxes) + # [{"widget": {...}, "label": "Accept terms", "distance": 22}] + + state = checkbox_state(screenshot, checkbox_box) # "checked" | "unchecked" + +``associate_fields`` treats boxes whose text ends in ``:`` as labels and pairs each with the +nearest value box in the requested directions (within ``max_gap``), returning +``{label, value, direction, gap, label_box, value_box}``. ``match_labels_to_widgets`` assigns +each widget to its nearest label by centre distance. ``checkbox_state`` returns +``"checked"`` / ``"unchecked"`` from the dark-pixel fill ratio of the box (image is injectable +— path / ndarray / PIL). + +Executor commands +----------------- + +``AC_associate_fields`` (``text_boxes`` / ``directions`` / ``max_gap`` → ``{count, fields}``) +and ``AC_match_labels_to_widgets`` (``labels`` / ``widgets`` → ``{count, pairs}``). They are +exposed as the MCP tools ``ac_associate_fields`` / ``ac_match_labels_to_widgets`` (read-only) +and as the Script Builder commands **Associate Form Fields** / **Match Labels To Widgets** +under **OCR**. diff --git a/docs/source/Eng/doc/new_features/v167_features_doc.rst b/docs/source/Eng/doc/new_features/v167_features_doc.rst new file mode 100644 index 00000000..8939ab53 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v167_features_doc.rst @@ -0,0 +1,49 @@ +Action-Effect Classification (Did My Click Do Anything?) +======================================================== + +After an agent clicks, the crucial question is "did that do anything, and was it the *right* +thing?" — but nothing answered it on the *first* step. ``screen_state.diff_snapshots`` and +``element_diff`` report what changed but never tie the change back to the action; +``loop_guard`` only flags a no-op after the same digest repeats N times (so the agent loops +2–8 times first); ``actionability`` is purely a *pre*-action gate. ``action_effect`` closes the +loop: it diffs the before/after observation and, given the action's target point, classifies +the result so an agent can react immediately. + +The verdict is one of ``no_op`` (nothing changed), ``changed_near_target`` (the change happened +where we acted — a button depressed), ``changed_elsewhere`` (a surprise dialog popped somewhere +else), or ``changed`` (something changed but the action carried no point to attribute to). + +Pure-stdlib over element dicts + the action record; reuses ``element_diff.match_elements`` for +the overlap join and ``observation_delta``'s field-change check. Fully deterministic and +unit-testable with no device. Imports no ``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import classify_effect, effect_near_point, is_no_op + + verdict = classify_effect(before_elements, after_elements, + {"type": "click", "x": 480, "y": 260}) + if verdict.effect == "no_op": + retry_or_repair() + elif verdict.effect == "changed_elsewhere": + handle_unexpected_dialog() + + if is_no_op(before_elements, after_elements): + ... + +``classify_effect`` returns an ``EffectVerdict`` (``effect`` / ``changed_near_target`` / +``changed_count`` / ``changed_centers`` / ``reason``). ``effect_near_point`` answers whether any +change landed within ``radius`` of an arbitrary point; ``is_no_op`` is the boolean shortcut. + +Executor commands +----------------- + +``AC_classify_effect`` (``before`` / ``after`` / ``action`` / ``radius`` → +``{effect, changed_near_target, changed_count, changed_centers, reason}``) and +``AC_effect_near_point`` (``before`` / ``after`` / ``point`` / ``radius`` → ``{near}``). They +are exposed as the MCP tools ``ac_classify_effect`` / ``ac_effect_near_point`` (read-only) and +as the Script Builder commands **Classify Action Effect** / **Effect Near Point?** under +**Native UI**. diff --git a/docs/source/Eng/doc/new_features/v168_features_doc.rst b/docs/source/Eng/doc/new_features/v168_features_doc.rst new file mode 100644 index 00000000..25be2ca8 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v168_features_doc.rst @@ -0,0 +1,47 @@ +Edge-Shape (Chamfer) Template Matching +====================================== + +Intensity correlation (``visual_match``) is dragged down when the same control is rendered +with a different fill, gradient, theme or anti-aliasing, and ORB feature matching +(``feature_match``) needs corner texture that flat-design glyphs — a hamburger menu, a plain +chevron — simply do not have. ``edge_match`` locates a template by its *edge shape* instead: +it runs Canny on both images, builds a distance transform of the scene edges, and slides the +template's edges over it, scoring each position by the mean distance from a template edge to +the nearest scene edge (Chamfer matching). A perfect alignment costs ~0 regardless of how the +shape is filled or shaded. + +It reuses ``visual_match``'s gray loaders / resize / NMS / ``Match`` and ``edge_lines``'s Canny +default, so no matching or geometry code is duplicated. The ``haystack`` is injectable +(ndarray / path / PIL); the search is unit-testable on synthetic arrays. Imports no +``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import edge_match, edge_match_all, chamfer_distance + + # find a flat icon regardless of its fill colour / theme + hit = edge_match("chevron.png", min_score=0.7) + if hit: + click(*hit.center) + + for m in edge_match_all("divider_handle.png", min_score=0.8): + print(m.center, m.score) + + print(chamfer_distance("logo_outline.png")) # 0 = edges coincide + +``edge_match`` returns the best ``Match`` (score = ``1 / (1 + mean edge distance)``, so 1.0 is +a perfect outline match) over the requested ``scales``, or ``None``. ``edge_match_all`` returns +every match at or above ``min_score`` with overlaps removed by NMS. ``chamfer_distance`` returns +the mean edge-to-edge distance at the best alignment (0 = the outlines coincide). + +Executor commands +----------------- + +``AC_edge_match`` (``template`` / ``min_score`` / ``scales`` / ``region`` → +``{found, match}``) and ``AC_edge_match_all`` (adds ``max_results`` / ``nms_iou`` → +``{count, matches}``). They are exposed as the MCP tools ``ac_edge_match`` / +``ac_edge_match_all`` (read-only) and as the Script Builder commands **Match Template (edge +shape)** / **Match Template All (edge shape)** under **Image**. diff --git a/docs/source/Eng/doc/new_features/v169_features_doc.rst b/docs/source/Eng/doc/new_features/v169_features_doc.rst new file mode 100644 index 00000000..86c98f0c --- /dev/null +++ b/docs/source/Eng/doc/new_features/v169_features_doc.rst @@ -0,0 +1,45 @@ +Declarative Action Postconditions +================================= + +After an action an agent (or a replay harness) usually has a concrete expectation: "a dialog +saying 'Saved' should appear AND the Submit button should disable". ``expect_poll`` / +``assert_eventually`` poll a *single condition* but have no notion of an action-bound +*postcondition spec*, and they don't diff against a *before* baseline (so they cannot express +"a NEW dialog appeared" — only "a dialog exists"). ``trajectory_eval`` rubrics are +whole-trajectory, not per-step screen state. ``postcondition`` fills the gap: a small JSON spec +of clauses evaluated against the after-observation (optionally diffed against the +before-observation), returning a per-clause pass/fail report. + +Clauses: ``appears`` / ``disappears`` (diffed against ``before``), ``enabled`` / ``disabled``, +``text_present`` / ``text_absent``, and ``count`` (``equals`` / ``min``). Pure-stdlib over +element dicts; the spec is plain JSON so it rides into action files / MCP / the scheduler. +Imports no ``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import check_postcondition, compile_postcondition + + spec = {"appears": {"role": "dialog", "name": "Saved"}, + "disabled": {"name": "Submit"}} + report = check_postcondition(after_elements, spec, before=before_elements) + if not report.ok: + print("failed clauses:", report.failed) + + # turn a spec into a predicate to drive expect_poll + predicate = compile_postcondition({"text_present": "Saved"}) + +``check_postcondition`` returns a ``PostconditionReport`` (``ok`` / ``clauses`` — +``[{type, ok, detail}]`` — / ``failed``). ``appears`` succeeds only when the element is in +``after`` and *not* in ``before`` (a genuinely new element); ``disappears`` requires a +``before`` frame. ``compile_postcondition`` returns an ``after -> bool`` predicate for pairing +with ``expect_poll`` / ``assert_eventually``. + +Executor command +---------------- + +``AC_check_postcondition`` (``after`` / ``spec`` / ``before`` → +``{ok, clauses, failed}``) is exposed as the MCP tool ``ac_check_postcondition`` (read-only) +and as the Script Builder command **Check Postcondition** under **Native UI**. diff --git a/docs/source/Eng/doc/new_features/v170_features_doc.rst b/docs/source/Eng/doc/new_features/v170_features_doc.rst new file mode 100644 index 00000000..90bd65ff --- /dev/null +++ b/docs/source/Eng/doc/new_features/v170_features_doc.rst @@ -0,0 +1,50 @@ +Repair-Tactic Policy for Failed / No-Effect Actions +=================================================== + +When an action does nothing or lands wrong, the agent needs a *policy* for what to try next — +re-locate and retry, nudge the coordinate, scroll the target into view, wait and retry, or give +up and escalate. ``self_healing`` / ``locator_repair`` only repair a locator that *did not +resolve* (element not found); they do nothing when the element was found and clicked but had no +effect. ``loop_guard`` only *detects* a stuck loop — it has no tactic selection or backoff. +``step_repair`` is that missing controller: it consumes an effect verdict (e.g. from +``action_effect``) and drives a bounded retry loop, choosing the next untried tactic each round. + +Pure-stdlib state machine; every side effect — performing the action, verifying it, applying a +tactic, sleeping — is an injected callable, so the loop is fully deterministic and +unit-testable with no device. Imports no ``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import (plan_repair, run_with_repair, RepairPolicy, + classify_effect) + + # just the plan + plan_repair("no_op") # ['wait_retry', 'relocate', 'nudge'] + plan_repair("changed_elsewhere") # ['escalate'] + + # drive the loop with injected seams + outcome = run_with_repair( + act=lambda: click(*target), + verify=lambda: not is_no_op(before(), after()), + apply_tactic=apply, # e.g. relocate / nudge the target + verdict_for=lambda: classify_effect(before(), after(), action).effect, + policy=RepairPolicy(max_attempts=3)) + print(outcome.ok, outcome.tactics_used) + +``plan_repair`` returns the ordered tactics for a verdict (a string like ``no_op`` / +``changed_elsewhere`` or an ``EffectVerdict`` dict), capped at ``max_attempts``; +``next_tactic`` returns the next untried one. ``run_with_repair`` runs ``act`` then ``verify``; +on failure it applies tactics until success or exhaustion, returning a ``RepairOutcome`` +(``ok`` / ``attempts`` / ``tactics_used`` / ``detail``). ``RepairPolicy`` caps attempts and +lists the allowed tactics. + +Executor command +---------------- + +``AC_plan_repair`` (``verdict`` / ``max_attempts`` → ``{count, tactics}``) is exposed as the +MCP tool ``ac_plan_repair`` (read-only) and as the Script Builder command **Plan Repair +Tactics** under **Native UI**. (The live ``run_with_repair`` loop is driven from Python, since +it takes injected callables.) diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst index 57929869..e46aa880 100644 --- a/docs/source/Eng/eng_index.rst +++ b/docs/source/Eng/eng_index.rst @@ -183,6 +183,16 @@ Comprehensive guides for all AutoControl features. doc/new_features/v158_features_doc doc/new_features/v159_features_doc doc/new_features/v160_features_doc + doc/new_features/v161_features_doc + doc/new_features/v162_features_doc + doc/new_features/v163_features_doc + doc/new_features/v164_features_doc + doc/new_features/v165_features_doc + doc/new_features/v166_features_doc + doc/new_features/v167_features_doc + doc/new_features/v168_features_doc + doc/new_features/v169_features_doc + doc/new_features/v170_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/docs/source/Zh/doc/new_features/v161_features_doc.rst b/docs/source/Zh/doc/new_features/v161_features_doc.rst new file mode 100644 index 00000000..f24596f2 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v161_features_doc.rst @@ -0,0 +1,41 @@ +信任評分樣板比對(歧義 / PSR) +============================== + +``match_template`` 只回傳最佳分數並直接點擊——但工具列中重複出現的控制項,或近乎相同的 +同類元件,可能在*兩處*都相關到 ~0.95,因此高分**不代表**比對*無歧義*,比對器可能自信地 +點錯目標。``match_with_trust`` 為像素樣板加入 Lowe 式比值測試(``feature_match`` 已對 ORB +關鍵點這麼做,但 ``match_template`` 從未如此):它檢視整個相關性曲面,比較全域峰值與排除 +視窗外的次高峰,並計算峰值對旁瓣比(PSR),標記出強但有歧義的比對。 + +本功能重用 ``visual_match._score_map``——即公開比對器丟棄的完整 ``matchTemplate`` 曲面 +——因此不重複任何比對程式。``haystack`` 可注入(ndarray / 路徑 / PIL);分析可在合成陣列上 +單元測試。不匯入 ``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import match_with_trust, score_peaks + + hit = match_with_trust("save_button.png", min_score=0.8) + if hit and not hit.is_ambiguous: + click(*hit.center) + elif hit: + print("有歧義!", hit.peak_ratio, "次高:", hit.second_score) + + # 只要指標,不要 match 物件 + print(score_peaks("icon.png")) # {best, second, peak_ratio, psr, ambiguous, location} + +``match_with_trust`` 回傳 ``TrustedMatch``(``x`` / ``y`` / ``width`` / ``height`` / +``score`` / ``scale`` / ``second_score`` / ``peak_ratio`` / ``psr`` / ``is_ambiguous`` + +``center``)或 ``None``。當次高峰至少達到最佳值的 ``ambiguous_ratio`` 倍(預設 0.9)時, +``is_ambiguous`` 為真。``psr`` 為峰值對旁瓣比(旁瓣完全平坦時為 ``None``)。``score_peaks`` +僅回傳縮放 1.0 時的指標字典。 + +執行器指令 +---------- + +``AC_match_with_trust``(``template`` / ``min_score`` / ``scales`` / ``ambiguous_ratio`` / +``region`` / ``method`` → ``{found, match}``)以 MCP 工具 ``ac_match_with_trust``(唯讀)及 +Script Builder 指令 **Match Template (trust-scored)**(位於 **Image** 分類下)形式提供。 diff --git a/docs/source/Zh/doc/new_features/v162_features_doc.rst b/docs/source/Zh/doc/new_features/v162_features_doc.rst new file mode 100644 index 00000000..4feb92a7 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v162_features_doc.rst @@ -0,0 +1,44 @@ +以 OCR 文字填入框線網格(可定址表格) +====================================== + +``edge_lines.find_grid`` 能還原有框線表格的幾何——``{rows: [y…], cols: [x…], cells: […]}`` +——但回傳的儲存格是*空的*(僅是框線之間的矩形)。OCR 提供文字卻無表格結構。兩者從未串接, +因此讀取有框線表格只能自行撰寫 box→cell 的指派。``table_grid_fill`` 把 OCR 文字框放入網格, +回傳可定址的 ``R x C`` 表格。 + +每個框依其中心落在哪個儲存格而被指派(以重疊比例把關,使橫跨細框線的框不被重複計入); +同一儲存格內的文字依閱讀順序串接;橫跨多個儲存格的框則回報為合併儲存格候選。結果可直接 +轉成 records 或 CSV。 + +純標準函式庫幾何,作用於純字典(網格 + 框)——不需影像、不需 OCR 引擎、不需裝置。不匯入 +``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import (find_grid, find_text_lines, # 產生來源 + populate_table, assign_text_to_grid, + table_to_records, table_to_csv) + + grid = find_grid(region=[0, 0, 800, 400]) # 框線幾何 + boxes = [{"x": 10, "y": 5, "width": 60, "height": 20, "text": "Name"}, ...] + + table = assign_text_to_grid(grid, boxes) # [["Name","Age"], ["Ann","30"]] + records = table_to_records(table) # [{"Name": "Ann", "Age": "30"}] + csv_text = table_to_csv(table) + + full = populate_table(grid, boxes) # {n_rows, n_cols, cells, spans} + +``assign_text_to_grid`` 回傳二維文字表格;``populate_table`` 回傳更豐富的 +``{n_rows, n_cols, cells:[{row, col, text}], spans:[{row, col, row_span, col_span, text}]}``。 +``table_to_records`` 以第一列為標頭;``table_to_csv`` 輸出 CSV。框接受 ``{x, y, width, height}`` +或 ``{left, top, right, bottom}`` 加上 ``text`` 欄位。 + +執行器指令 +---------- + +``AC_populate_table``(``grid`` / ``text_boxes`` / ``overlap`` → ``{n_rows, n_cols, cells, +spans}``)以 MCP 工具 ``ac_populate_table``(唯讀)及 Script Builder 指令 +**Fill Table From Grid + OCR**(位於 **OCR** 分類下)形式提供。 diff --git a/docs/source/Zh/doc/new_features/v163_features_doc.rst b/docs/source/Zh/doc/new_features/v163_features_doc.rst new file mode 100644 index 00000000..58eb411b --- /dev/null +++ b/docs/source/Zh/doc/new_features/v163_features_doc.rst @@ -0,0 +1,44 @@ +權杖預算化的觀測差異(變更了什麼) +==================================== + +``observation.serialize_observation`` 渲染*單一整幀*的 UI——每回合都餵給模型會撐爆該模組 +原本就要節省的權杖預算,並迫使模型為了發現那一個新對話框而重讀整個畫面。``element_diff`` +提供兩幀之間的穩定 ID 對應,但止於 matched / added / removed 的*元素配對*——它不會渲染出 +模型可據以行動的精簡、帶索引、受預算限制的差異。 + +``observation_delta`` 正是缺少的序列化器:它比對前一幀與當前觀測,將每個配對元素分類為 +*changed*(role / name / enabled / value / 移動)或 *stable*,並只渲染變動部分—— +``+ [i] role "name"``(出現)/ ``- role "name"``(消失)/ ``~ [i] role "name" (fields)`` +(變更)——added 與 changed 優先、stable 略去、上限為 ``max_lines``。模型看到的是*變更了什麼*, +而非再次整個畫面。 + +純標準函式庫,作用於元素字典;重用 ``element_diff.match_elements`` 做重疊配對、 +``observation.observation_index`` 做閱讀順序索引。不匯入 ``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import delta_observation, delta_index, summarize_delta + + summary = delta_observation(prev_elements, curr_elements, max_lines=40) + # + [12] dialog "Saved" + # ~ [4] button "Submit" (enabled) + # - button "Spinner" + + delta = delta_index(prev_elements, curr_elements) # {added, removed, changed, stable} + text = summarize_delta(delta, max_lines=20) + +``delta_index`` 回傳 ``{added, removed, changed, stable}``(``changed`` 項目為 +``{"after", "fields"}``);``summarize_delta`` 把 ``delta_index`` 結果渲染為受預算限制的 +``+`` / ``~`` / ``-`` 行;``delta_observation`` 將兩幀索引化(閱讀順序、視口裁切、僅互動元素) +後再比對並渲染,一次完成。 + +執行器指令 +---------- + +``AC_delta_observation``(``prev`` / ``curr`` / ``viewport`` / ``max_elements`` / ``max_lines`` +/ ``interactive_only`` → ``{summary, added, removed, changed}``)以 MCP 工具 +``ac_delta_observation``(唯讀)及 Script Builder 指令 **Observation: Delta (what changed)** +(位於 **Native UI** 分類下)形式提供。 diff --git a/docs/source/Zh/doc/new_features/v164_features_doc.rst b/docs/source/Zh/doc/new_features/v164_features_doc.rst new file mode 100644 index 00000000..ca033afe --- /dev/null +++ b/docs/source/Zh/doc/new_features/v164_features_doc.rst @@ -0,0 +1,43 @@ +自動門檻樣板比對(對分數圖做 Otsu) +==================================== + +每次呼叫 ``match_template_all`` 都迫使呼叫者猜 ``min_score``:太低會讓 NMS 充滿背景雜訊, +太高會漏掉縮放 / 換膚的目標,而正確值因素材與畫面而異。``match_autothresh`` 移除這個魔術 +數字——它對*相關性分數直方圖*(而非像素強度,後者是 ``preprocess.binarize`` 的做法)套用 +Otsu 法,找出「背景相關」團與「真正匹配」團之間的谷,並回傳該門檻加上一個 *separability* +(分離度)值,讓呼叫端知道何時直方圖為單峰(無明確匹配 → 不要信任該門檻)。 + +本功能重用 ``visual_match._score_map``(公開比對器丟棄的完整 ``matchTemplate`` 曲面)與 +``cv2_utils.blobs.connected_boxes``——每個過門檻區域只回傳單一峰,避免原始像素掃描 + NMS +在寬相關峰上留下的重複命中。``haystack`` 可注入;分析可在合成陣列上單元測試。不匯入 +``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import match_auto, auto_threshold + + # 無需手調 min_score——門檻由分數圖推導 + for hit in match_auto("save_button.png", floor=0.5): + print(hit.center, hit.score) + + info = auto_threshold("save_button.png") + # {"threshold": 0.83, "separability": 0.61, "n_above": 2} + if info["separability"] < 0.3: + print("無明確匹配——門檻不可信") + +``match_auto`` 每個過門檻區域回傳一個 ``Match``,依分數排序並以 ``max_results`` 為上限; +門檻為 ``max(floor, otsu_threshold)``,使單峰 / 雜訊曲面無法把它拉到合理下限之下。 +``auto_threshold`` 回傳 ``{threshold, separability, n_above}``——``separability`` 接近 0 +表示分數直方圖為單峰,該門檻應視為不可靠。 + +執行器指令 +---------- + +``AC_match_auto``(``template`` / ``floor`` / ``max_results`` / ``region`` / ``method`` → +``{count, matches}``)與 ``AC_auto_threshold``(``template`` / ``region`` / ``method`` → +``{found, info}``)。兩者以 MCP 工具 ``ac_match_auto`` / ``ac_auto_threshold``(唯讀)及 +Script Builder 指令 **Match Template (auto-threshold)** / **Auto Threshold (Otsu on scores)** +(位於 **Image** 分類下)形式提供。 diff --git a/docs/source/Zh/doc/new_features/v165_features_doc.rst b/docs/source/Zh/doc/new_features/v165_features_doc.rst new file mode 100644 index 00000000..b5101971 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v165_features_doc.rst @@ -0,0 +1,42 @@ +留白投影欄位偵測(無框線表格) +============================== + +``ocr/structure`` 只有在*每一列*的儲存格左緣 x 都在容差內相符時才偵測得到表格——對 ragged +或無框線表格、右對齊數字欄、或任何缺格的列都會失敗。``edge_lines.find_grid`` 需要框線, +因此純以留白繪製的表格根本沒有網格。``column_layout`` 以版面分析文獻常用的穩健方法找欄位: +靠*間隙*。它把 OCR 框投影到 x 軸(墨水密度剖面),讀出持續為空的垂直帶作為欄間隙(gutter), +為每個框指派欄索引,並依垂直間距分群成列,輸出無框線表格。 + +純標準函式庫,作用於純框字典(差分陣列投影——不需 numpy),因此可在無影像、無 OCR 引擎下 +完整單元測試。重用 ``table_grid_fill`` 的框邊界讀取器。不匯入 ``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import (detect_borderless_table, column_gutters, + assign_columns, vertical_projection) + + table = detect_borderless_table(ocr_boxes) + # {"n_rows": 3, "n_cols": 2, "rows": [["Name","Age"],["Ann","30"],["Bob","25"]], + # "columns": [{"start": 70, "end": 120, "width": 50}]} + + gutters = column_gutters(ocr_boxes, min_gap=8) # 空白垂直帶 + tagged = assign_columns(ocr_boxes) # 每個框 + "column" 索引 + profile = vertical_projection(ocr_boxes) # 每個 x 的墨水密度 + +``vertical_projection`` 回傳每個 x 的墨水密度剖面;``column_gutters`` 回傳至少 ``min_gap`` 寬 +的內部空白帶 ``[{start, end, width}]``;``assign_columns`` 為每個框標上 0 起算的 ``column``; +``detect_borderless_table`` 將欄(來自 gutter)與列(來自垂直間距)組合成 +``{n_rows, n_cols, rows, columns}``,或在欄數少於 ``min_cols`` / 列數少於 ``min_rows`` 時回傳 +``None``。框接受 ``{x, y, width, height}`` 或 ``{left, top, right, bottom}`` 加上可選 ``text``。 + +執行器指令 +---------- + +``AC_detect_borderless_table``(``boxes`` / ``page_width`` / ``min_gap`` / ``min_cols`` / +``min_rows`` → ``{found, table}``)與 ``AC_column_gutters``(``boxes`` / ``page_width`` / +``min_gap`` → ``{count, gutters}``)。兩者以 MCP 工具 ``ac_detect_borderless_table`` / +``ac_column_gutters``(唯讀)及 Script Builder 指令 **Detect Borderless Table** / +**Column Gutters (whitespace)**(位於 **OCR** 分類下)形式提供。 diff --git a/docs/source/Zh/doc/new_features/v166_features_doc.rst b/docs/source/Zh/doc/new_features/v166_features_doc.rst new file mode 100644 index 00000000..49b28fe1 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v166_features_doc.rst @@ -0,0 +1,41 @@ +表單欄位關聯(多方向)+ 核取方塊狀態 +====================================== + +``ocr/structure`` 只有在框文字以 ``:`` 結尾時才視為標籤,並只與*緊接的下一格*配對——無法 +處理標籤在值*上方*、雙欄 key/value 版面、右對齊值,或任何非文字格的 widget,且完全沒有 +核取方塊 / 單選鈕狀態的概念。``form_fields`` 將其一般化:把每個標籤與多個*方向*(右、下)中 +最近的對齊值配對,把獨立的 widget(核取方塊、單選鈕、輸入框)配到最近的標籤,並由填充比例 +讀取核取方塊的勾選狀態。 + +關聯部分為純標準函式庫,作用於純框字典(可完整單元測試、不需影像);只有 ``checkbox_state`` +觸及像素,且隔離在共用的 ``visual_match`` 灰階載入器之後,讓測試可傳入合成陣列。重用 +``table_grid_fill`` 的框邊界讀取器。不匯入 ``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import (associate_fields, match_labels_to_widgets, + checkbox_state) + + fields = associate_fields(ocr_boxes, directions=("right", "below")) + # [{"label": "Name", "value": "Ann", "direction": "right", "gap": 20, ...}] + + pairs = match_labels_to_widgets(label_boxes, checkbox_boxes) + # [{"widget": {...}, "label": "Accept terms", "distance": 22}] + + state = checkbox_state(screenshot, checkbox_box) # "checked" | "unchecked" + +``associate_fields`` 把文字以 ``:`` 結尾的框視為標籤,並在指定方向中(``max_gap`` 內)配到 +最近的值框,回傳 ``{label, value, direction, gap, label_box, value_box}``。 +``match_labels_to_widgets`` 依中心距離把每個 widget 配到最近標籤。``checkbox_state`` 由框內 +暗像素填充比例回傳 ``"checked"`` / ``"unchecked"``(image 可注入——路徑 / ndarray / PIL)。 + +執行器指令 +---------- + +``AC_associate_fields``(``text_boxes`` / ``directions`` / ``max_gap`` → ``{count, fields}``) +與 ``AC_match_labels_to_widgets``(``labels`` / ``widgets`` → ``{count, pairs}``)。兩者以 +MCP 工具 ``ac_associate_fields`` / ``ac_match_labels_to_widgets``(唯讀)及 Script Builder 指令 +**Associate Form Fields** / **Match Labels To Widgets**(位於 **OCR** 分類下)形式提供。 diff --git a/docs/source/Zh/doc/new_features/v167_features_doc.rst b/docs/source/Zh/doc/new_features/v167_features_doc.rst new file mode 100644 index 00000000..b4172b2d --- /dev/null +++ b/docs/source/Zh/doc/new_features/v167_features_doc.rst @@ -0,0 +1,45 @@ +動作效果分類(我的點擊有沒有效果?) +==================================== + +代理點擊後最關鍵的問題是「這有沒有效果,而且是*正確的*效果嗎?」——但在*第一步*就回答這個 +問題的功能並不存在。``screen_state.diff_snapshots`` 與 ``element_diff`` 回報變了什麼,卻從不 +把變化歸因回該動作;``loop_guard`` 只在相同摘要重複 N 次後才標記 no-op(因此代理會先空轉 +2–8 次);``actionability`` 純粹是*動作前*的閘門。``action_effect`` 補上這個迴圈:比對前後 +觀測,並依動作的目標點分類結果,讓代理能立即反應。 + +判定為下列之一:``no_op``(無變化)、``changed_near_target``(變化發生在我們動作之處——按鈕被 +按下)、``changed_elsewhere``(別處彈出意外對話框)、或 ``changed``(有變化但動作沒有可歸因的 +座標點)。 + +純標準函式庫,作用於元素字典 + 動作記錄;重用 ``element_diff.match_elements`` 做重疊配對與 +``observation_delta`` 的欄位變更檢查。完全確定性、可在無裝置下單元測試。不匯入 ``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import classify_effect, effect_near_point, is_no_op + + verdict = classify_effect(before_elements, after_elements, + {"type": "click", "x": 480, "y": 260}) + if verdict.effect == "no_op": + retry_or_repair() + elif verdict.effect == "changed_elsewhere": + handle_unexpected_dialog() + + if is_no_op(before_elements, after_elements): + ... + +``classify_effect`` 回傳 ``EffectVerdict``(``effect`` / ``changed_near_target`` / +``changed_count`` / ``changed_centers`` / ``reason``)。``effect_near_point`` 回答任一變化是否 +落在任意點的 ``radius`` 內;``is_no_op`` 是布林捷徑。 + +執行器指令 +---------- + +``AC_classify_effect``(``before`` / ``after`` / ``action`` / ``radius`` → +``{effect, changed_near_target, changed_count, changed_centers, reason}``)與 +``AC_effect_near_point``(``before`` / ``after`` / ``point`` / ``radius`` → ``{near}``)。 +兩者以 MCP 工具 ``ac_classify_effect`` / ``ac_effect_near_point``(唯讀)及 Script Builder 指令 +**Classify Action Effect** / **Effect Near Point?**(位於 **Native UI** 分類下)形式提供。 diff --git a/docs/source/Zh/doc/new_features/v168_features_doc.rst b/docs/source/Zh/doc/new_features/v168_features_doc.rst new file mode 100644 index 00000000..405a7df7 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v168_features_doc.rst @@ -0,0 +1,42 @@ +邊緣形狀(Chamfer)樣板比對 +============================ + +當同一個控制項以不同填充、漸層、主題或抗鋸齒繪製時,強度相關(``visual_match``)會被拉低; +而 ORB 特徵比對(``feature_match``)需要角點紋理,扁平設計的圖示——漢堡選單、單純的箭號 +——根本沒有。``edge_match`` 改以*邊緣形狀*定位樣板:對兩張影像跑 Canny,對場景邊緣建立 +距離轉換,再把樣板邊緣滑過它,以「每個樣板邊緣到最近場景邊緣的平均距離」為每個位置評分 +(Chamfer 比對)。完美對齊的成本約為 0,與形狀如何填充或著色無關。 + +本功能重用 ``visual_match`` 的灰階載入器 / resize / NMS / ``Match`` 與 ``edge_lines`` 的 +Canny 預設,因此不重複任何比對或幾何程式。``haystack`` 可注入(ndarray / 路徑 / PIL); +搜尋可在合成陣列上單元測試。不匯入 ``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import edge_match, edge_match_all, chamfer_distance + + # 不論填色 / 主題,找出扁平圖示 + hit = edge_match("chevron.png", min_score=0.7) + if hit: + click(*hit.center) + + for m in edge_match_all("divider_handle.png", min_score=0.8): + print(m.center, m.score) + + print(chamfer_distance("logo_outline.png")) # 0 = 邊緣完全重合 + +``edge_match`` 在指定 ``scales`` 中回傳最佳 ``Match``(score = ``1 / (1 + 平均邊緣距離)``, +故 1.0 為完美輪廓匹配)或 ``None``。``edge_match_all`` 回傳所有達到 ``min_score`` 的匹配, +重疊以 NMS 移除。``chamfer_distance`` 回傳最佳對齊處的平均邊緣間距(0 = 輪廓重合)。 + +執行器指令 +---------- + +``AC_edge_match``(``template`` / ``min_score`` / ``scales`` / ``region`` → +``{found, match}``)與 ``AC_edge_match_all``(另加 ``max_results`` / ``nms_iou`` → +``{count, matches}``)。兩者以 MCP 工具 ``ac_edge_match`` / ``ac_edge_match_all``(唯讀)及 +Script Builder 指令 **Match Template (edge shape)** / **Match Template All (edge shape)** +(位於 **Image** 分類下)形式提供。 diff --git a/docs/source/Zh/doc/new_features/v169_features_doc.rst b/docs/source/Zh/doc/new_features/v169_features_doc.rst new file mode 100644 index 00000000..81723684 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v169_features_doc.rst @@ -0,0 +1,41 @@ +宣告式動作後置條件 +================== + +動作之後,代理(或重播框架)通常有具體的預期:「應出現寫著『Saved』的對話框,且 Submit +按鈕應停用」。``expect_poll`` / ``assert_eventually`` 輪詢*單一條件*,卻沒有與動作綁定的 +*後置條件規格*概念,也不對照 *before* 基準做差異(因此無法表達「一個*新*對話框出現了」 +——只能表達「存在對話框」)。``trajectory_eval`` 的評分準則是整條軌跡層級,而非每步畫面 +狀態。``postcondition`` 補上這個缺口:用一個小型 JSON 子句規格,對照 after 觀測(可選擇與 +before 觀測做差異)評估,回傳逐子句的通過 / 失敗報告。 + +子句:``appears`` / ``disappears``(對照 ``before``)、``enabled`` / ``disabled``、 +``text_present`` / ``text_absent``,以及 ``count``(``equals`` / ``min``)。純標準函式庫, +作用於元素字典;規格為純 JSON,可帶入 action 檔 / MCP / 排程器。不匯入 ``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import check_postcondition, compile_postcondition + + spec = {"appears": {"role": "dialog", "name": "Saved"}, + "disabled": {"name": "Submit"}} + report = check_postcondition(after_elements, spec, before=before_elements) + if not report.ok: + print("失敗子句:", report.failed) + + # 把規格轉成判定函式以驅動 expect_poll + predicate = compile_postcondition({"text_present": "Saved"}) + +``check_postcondition`` 回傳 ``PostconditionReport``(``ok`` / ``clauses`` — +``[{type, ok, detail}]`` — / ``failed``)。``appears`` 只有在元素位於 ``after`` 且*不*在 +``before``(確為新元素)時才成功;``disappears`` 需要 ``before`` 幀。``compile_postcondition`` +回傳 ``after -> bool`` 判定函式,可與 ``expect_poll`` / ``assert_eventually`` 搭配。 + +執行器指令 +---------- + +``AC_check_postcondition``(``after`` / ``spec`` / ``before`` → ``{ok, clauses, failed}``) +以 MCP 工具 ``ac_check_postcondition``(唯讀)及 Script Builder 指令 **Check Postcondition** +(位於 **Native UI** 分類下)形式提供。 diff --git a/docs/source/Zh/doc/new_features/v170_features_doc.rst b/docs/source/Zh/doc/new_features/v170_features_doc.rst new file mode 100644 index 00000000..7438d776 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v170_features_doc.rst @@ -0,0 +1,45 @@ +失敗 / 無效果動作的修復策略引擎 +================================ + +當動作沒有效果或落點錯誤時,代理需要一個*策略*決定下一步該試什麼——重新定位重試、微調座標、 +把目標捲入視野、等待重試,或放棄並升級。``self_healing`` / ``locator_repair`` 只修復*無法 +解析*的定位器(找不到元素);當元素找到並點擊了卻無效果時,它們無能為力。``loop_guard`` 只 +*偵測*卡住的迴圈——沒有戰術選擇或退避。``step_repair`` 正是缺少的控制器:它消費一個效果判定 +(例如來自 ``action_effect``),並驅動有界的重試迴圈,每輪選擇下一個尚未嘗試的戰術。 + +純標準函式庫狀態機;每個副作用——執行動作、驗證、套用戰術、睡眠——都是注入的可呼叫物件, +因此迴圈完全確定性、可在無裝置下單元測試。不匯入 ``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import (plan_repair, run_with_repair, RepairPolicy, + classify_effect) + + # 只要規劃 + plan_repair("no_op") # ['wait_retry', 'relocate', 'nudge'] + plan_repair("changed_elsewhere") # ['escalate'] + + # 以注入接縫驅動迴圈 + outcome = run_with_repair( + act=lambda: click(*target), + verify=lambda: not is_no_op(before(), after()), + apply_tactic=apply, # 例如 relocate / nudge 目標 + verdict_for=lambda: classify_effect(before(), after(), action).effect, + policy=RepairPolicy(max_attempts=3)) + print(outcome.ok, outcome.tactics_used) + +``plan_repair`` 回傳某判定(字串如 ``no_op`` / ``changed_elsewhere`` 或 ``EffectVerdict`` +字典)的有序戰術,截到 ``max_attempts``;``next_tactic`` 回傳下一個尚未試過的。 +``run_with_repair`` 執行 ``act`` 然後 ``verify``;失敗時套用戰術直到成功或耗盡,回傳 +``RepairOutcome``(``ok`` / ``attempts`` / ``tactics_used`` / ``detail``)。``RepairPolicy`` +限制嘗試次數並列出允許的戰術。 + +執行器指令 +---------- + +``AC_plan_repair``(``verdict`` / ``max_attempts`` → ``{count, tactics}``)以 MCP 工具 +``ac_plan_repair``(唯讀)及 Script Builder 指令 **Plan Repair Tactics**(位於 **Native UI** +分類下)形式提供。(實際的 ``run_with_repair`` 迴圈因接受注入可呼叫物件,由 Python 驅動。) diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst index 7193a73f..a8364cd9 100644 --- a/docs/source/Zh/zh_index.rst +++ b/docs/source/Zh/zh_index.rst @@ -183,6 +183,16 @@ AutoControl 所有功能的完整使用指南。 doc/new_features/v158_features_doc doc/new_features/v159_features_doc doc/new_features/v160_features_doc + doc/new_features/v161_features_doc + doc/new_features/v162_features_doc + doc/new_features/v163_features_doc + doc/new_features/v164_features_doc + doc/new_features/v165_features_doc + doc/new_features/v166_features_doc + doc/new_features/v167_features_doc + doc/new_features/v168_features_doc + doc/new_features/v169_features_doc + doc/new_features/v170_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py index e92b7b31..c01e1065 100644 --- a/je_auto_control/__init__.py +++ b/je_auto_control/__init__.py @@ -283,10 +283,50 @@ from je_auto_control.utils.rotated_match import ( RotatedMatch, match_rotated, match_rotated_all, scale_space, ) +# Template-match trustworthiness (second-peak ratio + peak-to-sidelobe) +from je_auto_control.utils.match_trust import ( + TrustedMatch, match_with_trust, score_peaks, +) +# Edge-shape (Chamfer / distance-transform) template matching +from je_auto_control.utils.edge_match import ( + chamfer_distance, edge_match, edge_match_all, +) +# Otsu auto-thresholding for template matching (no hand-tuned min_score) +from je_auto_control.utils.match_autothresh import ( + auto_threshold, match_auto, +) # Coarse labelled cell grid for VLM grounding (point <-> cell mapping) from je_auto_control.utils.screen_grid import ( GridCell, cell_for_point, grid_cells, point_for_cell, ) +# Fill a ruling-line grid with OCR text → addressable table +from je_auto_control.utils.table_grid_fill import ( + assign_text_to_grid, populate_table, table_to_csv, table_to_records, +) +# Token-budgeted observation delta (what changed between two UI frames) +from je_auto_control.utils.observation_delta import ( + delta_index, delta_observation, summarize_delta, +) +# Infer columns from vertical whitespace (borderless tables) +from je_auto_control.utils.column_layout import ( + assign_columns, column_gutters, detect_borderless_table, vertical_projection, +) +# Associate form labels with values (multi-direction) + checkbox state +from je_auto_control.utils.form_fields import ( + associate_fields, checkbox_state, match_labels_to_widgets, +) +# Classify whether an action did anything (target-local attribution) +from je_auto_control.utils.action_effect import ( + EffectVerdict, classify_effect, effect_near_point, is_no_op, +) +# Declarative action postconditions (expected outcome vs before/after) +from je_auto_control.utils.postcondition import ( + PostconditionReport, check_postcondition, compile_postcondition, +) +# Repair-tactic policy for failed / no-effect actions (self-correction loop) +from je_auto_control.utils.step_repair import ( + RepairOutcome, RepairPolicy, next_tactic, plan_repair, run_with_repair, +) # Locate on-screen regions by colour (mask + connected components) from je_auto_control.utils.color_region import ( find_color_region, find_color_regions, @@ -1198,10 +1238,44 @@ def start_autocontrol_gui(*args, **kwargs): "match_rotated", "match_rotated_all", "scale_space", + "TrustedMatch", + "match_with_trust", + "score_peaks", + "auto_threshold", + "match_auto", + "edge_match", + "edge_match_all", + "chamfer_distance", "GridCell", "grid_cells", "cell_for_point", "point_for_cell", + "assign_text_to_grid", + "populate_table", + "table_to_records", + "table_to_csv", + "delta_index", + "delta_observation", + "summarize_delta", + "vertical_projection", + "column_gutters", + "assign_columns", + "detect_borderless_table", + "associate_fields", + "match_labels_to_widgets", + "checkbox_state", + "EffectVerdict", + "classify_effect", + "effect_near_point", + "is_no_op", + "PostconditionReport", + "check_postcondition", + "compile_postcondition", + "RepairPolicy", + "RepairOutcome", + "plan_repair", + "next_tactic", + "run_with_repair", "find_color_region", "find_color_regions", "ssim_compare", diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py index 8498a99f..ba6bb37a 100644 --- a/je_auto_control/gui/script_builder/command_schema.py +++ b/je_auto_control/gui/script_builder/command_schema.py @@ -335,6 +335,67 @@ def _add_image_specs(specs: List[CommandSpec]) -> None: ), description="Find every rotation/scale-tolerant match (NMS-deduped).", )) + specs.append(CommandSpec( + "AC_match_with_trust", "Image", "Match Template (trust-scored)", + fields=( + FieldSpec("template", FieldType.FILE_PATH), + FieldSpec("min_score", FieldType.FLOAT, optional=True, default=0.0, + min_value=0.0, max_value=1.0), + FieldSpec("ambiguous_ratio", FieldType.FLOAT, optional=True, + default=0.9, min_value=0.0, max_value=1.0), + FieldSpec("scales", FieldType.STRING, optional=True, + placeholder="[0.9, 1.0, 1.1]"), + FieldSpec("region", FieldType.STRING, optional=True, + placeholder=_REGION_PLACEHOLDER), + ), + description="Match a template and flag if it is ambiguous (duplicate peak).", + )) + specs.append(CommandSpec( + "AC_match_auto", "Image", "Match Template (auto-threshold)", + fields=( + FieldSpec("template", FieldType.FILE_PATH), + FieldSpec("floor", FieldType.FLOAT, optional=True, default=0.5, + min_value=0.0, max_value=1.0), + FieldSpec("max_results", FieldType.INT, optional=True, default=20), + FieldSpec("region", FieldType.STRING, optional=True, + placeholder=_REGION_PLACEHOLDER), + ), + description="Find a template with an Otsu auto-threshold (no min_score).", + )) + specs.append(CommandSpec( + "AC_auto_threshold", "Image", "Auto Threshold (Otsu on scores)", + fields=( + FieldSpec("template", FieldType.FILE_PATH), + FieldSpec("region", FieldType.STRING, optional=True, + placeholder=_REGION_PLACEHOLDER), + ), + description="Derive a match threshold + separability from the score map.", + )) + specs.append(CommandSpec( + "AC_edge_match", "Image", "Match Template (edge shape)", + fields=( + FieldSpec("template", FieldType.FILE_PATH), + FieldSpec("min_score", FieldType.FLOAT, optional=True, default=0.7, + min_value=0.0, max_value=1.0), + FieldSpec("scales", FieldType.STRING, optional=True, + placeholder="[0.9, 1.0, 1.1]"), + FieldSpec("region", FieldType.STRING, optional=True, + placeholder=_REGION_PLACEHOLDER), + ), + description="Locate by edge shape (Chamfer) — robust to fill / theme / AA.", + )) + specs.append(CommandSpec( + "AC_edge_match_all", "Image", "Match Template All (edge shape)", + fields=( + FieldSpec("template", FieldType.FILE_PATH), + FieldSpec("min_score", FieldType.FLOAT, optional=True, default=0.7, + min_value=0.0, max_value=1.0), + FieldSpec("max_results", FieldType.INT, optional=True, default=20), + FieldSpec("nms_iou", FieldType.FLOAT, optional=True, default=0.3, + min_value=0.0, max_value=1.0), + ), + description="Find every edge-shape (Chamfer) match (NMS-deduped).", + )) specs.append(CommandSpec( "AC_grid_cells", "Image", "Grid Cells (coarse grounding)", fields=( @@ -659,6 +720,63 @@ def _add_ocr_specs(specs: List[CommandSpec]) -> None: ), description="Decode 1-D barcodes (EAN / UPC) in an image / screen region.", )) + specs.append(CommandSpec( + "AC_populate_table", "OCR", "Fill Table From Grid + OCR", + fields=( + FieldSpec("grid", FieldType.STRING, + placeholder='{"rows": [0, 30, 60], "cols": [0, 100, 200]}'), + FieldSpec("text_boxes", FieldType.STRING, + placeholder='[{"x": 10, "y": 5, "width": 60, "height": 20, ' + '"text": "Name"}]'), + FieldSpec("overlap", FieldType.FLOAT, optional=True, default=0.4, + min_value=0.0, max_value=1.0), + ), + description="Drop OCR text boxes into a ruling-line grid → addressable table.", + )) + specs.append(CommandSpec( + "AC_detect_borderless_table", "OCR", "Detect Borderless Table", + fields=( + FieldSpec("boxes", FieldType.STRING, + placeholder='[{"x":10,"y":0,"width":60,"height":18,' + '"text":"Name"}]'), + FieldSpec("min_gap", FieldType.INT, optional=True, default=8), + FieldSpec("page_width", FieldType.INT, optional=True), + ), + description="Infer a borderless table from OCR boxes via whitespace columns.", + )) + specs.append(CommandSpec( + "AC_column_gutters", "OCR", "Column Gutters (whitespace)", + fields=( + FieldSpec("boxes", FieldType.STRING, + placeholder='[{"x":10,"y":0,"width":60,"height":18}]'), + FieldSpec("min_gap", FieldType.INT, optional=True, default=8), + FieldSpec("page_width", FieldType.INT, optional=True), + ), + description="Find borderless-table column separators by whitespace projection.", + )) + specs.append(CommandSpec( + "AC_associate_fields", "OCR", "Associate Form Fields", + fields=( + FieldSpec("text_boxes", FieldType.STRING, + placeholder='[{"x":0,"y":0,"width":60,"height":20,' + '"text":"Name:"}]'), + FieldSpec("directions", FieldType.STRING, optional=True, + placeholder='["right", "below"]'), + FieldSpec("max_gap", FieldType.INT, optional=True, default=150), + ), + description="Pair 'label:' boxes with the nearest aligned value (right/below).", + )) + specs.append(CommandSpec( + "AC_match_labels_to_widgets", "OCR", "Match Labels To Widgets", + fields=( + FieldSpec("labels", FieldType.STRING, + placeholder='[{"x":0,"y":0,"width":60,"height":20,' + '"text":"Accept"}]'), + FieldSpec("widgets", FieldType.STRING, + placeholder='[{"x":120,"y":0,"width":16,"height":16}]'), + ), + description="Match each checkbox/radio/input to its nearest label by centre.", + )) specs.append(CommandSpec( "AC_scroll_to_find", "OCR", "Scroll Until Visible", fields=( @@ -3035,6 +3153,67 @@ def _add_set_of_marks_specs(specs: List[CommandSpec]) -> None: ), description="Reading-ordered, viewport-clipped, indexed element list.", )) + specs.append(CommandSpec( + "AC_delta_observation", "Native UI", "Observation: Delta (what changed)", + fields=( + FieldSpec("prev", FieldType.STRING, + placeholder='[{"role":"button","name":"OK","x":..,"y":..}]'), + FieldSpec("curr", FieldType.STRING, + placeholder='[{"role":"button","name":"OK","x":..,"y":..}]'), + FieldSpec("viewport", FieldType.STRING, optional=True, + placeholder="[x, y, w, h]"), + FieldSpec("max_elements", FieldType.INT, optional=True, default=80), + FieldSpec("max_lines", FieldType.INT, optional=True, default=40), + ), + description="Token-budgeted '+/~/-' summary of what changed between frames.", + )) + specs.append(CommandSpec( + "AC_classify_effect", "Native UI", "Classify Action Effect", + fields=( + FieldSpec("before", FieldType.STRING, + placeholder='[{"role":"button","name":"OK","x":0,"y":0}]'), + FieldSpec("after", FieldType.STRING, + placeholder='[{"role":"button","name":"OK","x":0,"y":0}]'), + FieldSpec("action", FieldType.STRING, + placeholder='{"type":"click","x":50,"y":50}'), + FieldSpec("radius", FieldType.INT, optional=True, default=64), + ), + description="Did the action change the screen near its target? (no_op/…).", + )) + specs.append(CommandSpec( + "AC_effect_near_point", "Native UI", "Effect Near Point?", + fields=( + FieldSpec("before", FieldType.STRING, + placeholder='[{"role":"button","x":0,"y":0}]'), + FieldSpec("after", FieldType.STRING, + placeholder='[{"role":"button","x":0,"y":0}]'), + FieldSpec("point", FieldType.STRING, placeholder="[50, 50]"), + FieldSpec("radius", FieldType.INT, optional=True, default=64), + ), + description="Did any before/after change land within radius of a point?", + )) + specs.append(CommandSpec( + "AC_check_postcondition", "Native UI", "Check Postcondition", + fields=( + FieldSpec("after", FieldType.STRING, + placeholder='[{"role":"dialog","name":"Saved"}]'), + FieldSpec("spec", FieldType.STRING, + placeholder='{"appears":{"role":"dialog"},' + '"disabled":{"name":"Submit"}}'), + FieldSpec("before", FieldType.STRING, optional=True, + placeholder='[{"role":"button","name":"Submit"}]'), + ), + description="Check expected outcome clauses against after/before frames.", + )) + specs.append(CommandSpec( + "AC_plan_repair", "Native UI", "Plan Repair Tactics", + fields=( + FieldSpec("verdict", FieldType.STRING, + placeholder="no_op / changed_elsewhere / changed"), + FieldSpec("max_attempts", FieldType.INT, optional=True, default=3), + ), + description="Ordered repair tactics for a failed/no-effect action verdict.", + )) specs.append(CommandSpec( "AC_validate_action", "Native UI", "Validate / Snap Action", fields=( diff --git a/je_auto_control/utils/action_effect/__init__.py b/je_auto_control/utils/action_effect/__init__.py new file mode 100644 index 00000000..fd6e3cbb --- /dev/null +++ b/je_auto_control/utils/action_effect/__init__.py @@ -0,0 +1,6 @@ +"""Classify whether an action did anything, with target-local attribution.""" +from je_auto_control.utils.action_effect.action_effect import ( + EffectVerdict, classify_effect, effect_near_point, is_no_op, +) + +__all__ = ["EffectVerdict", "classify_effect", "effect_near_point", "is_no_op"] diff --git a/je_auto_control/utils/action_effect/action_effect.py b/je_auto_control/utils/action_effect/action_effect.py new file mode 100644 index 00000000..61c6121b --- /dev/null +++ b/je_auto_control/utils/action_effect/action_effect.py @@ -0,0 +1,104 @@ +"""Classify whether an action actually did anything, with target-local attribution. + +After an agent clicks, the crucial question is "did that do anything, and was it the *right* +thing?" — but nothing answers it on the *first* step. ``screen_state.diff_snapshots`` and +``element_diff`` report what changed but never tie the change back to the action; ``loop_guard`` +only flags a no-op after the same digest repeats N times (so the agent loops 2-8 times first); +``actionability`` is purely a *pre*-action gate. ``action_effect`` closes the loop: it diffs the +before/after observation, and given the action's target point classifies the result as +``no_op`` (nothing changed), ``changed_near_target`` (the change happened where we acted — a +button depressed), ``changed_elsewhere`` (a surprise dialog popped somewhere else), or +``changed`` (something changed but the action had no point to attribute to). + +Pure-stdlib over element dicts + the action record; reuses ``element_diff.match_elements`` for +the overlap join and ``observation_delta``'s field-change check. Fully deterministic and +unit-testable with no device. Imports no ``PySide6``. +""" +from dataclasses import asdict, dataclass +from typing import Any, Dict, List, Optional, Sequence + +Element = Dict[str, Any] + + +@dataclass(frozen=True) +class EffectVerdict: + """The classified effect of an action plus its attribution evidence.""" + + effect: str + changed_near_target: bool + changed_count: int + changed_centers: List[List[int]] + reason: str + + def to_dict(self) -> Dict[str, Any]: + """Return the verdict as a plain dict.""" + return asdict(self) + + +def _center(element: Element) -> List[int]: + return [int(element.get("x", 0)) + int(element.get("width", 0)) // 2, + int(element.get("y", 0)) + int(element.get("height", 0)) // 2] + + +def _action_point(action: Any) -> Optional[List[int]]: + """Extract the (x, y) the action targets, or ``None`` if it has no coordinate.""" + if not isinstance(action, dict): + return None + if "x" in action and "y" in action: + return [int(action["x"]), int(action["y"])] + point = action.get("point") or action.get("center") + return [int(point[0]), int(point[1])] if point else None + + +def _changed_elements(before: Sequence[Element], after: Sequence[Element], + iou_threshold: float, move_threshold: int): + """Return every element that was added / removed / changed between two frames.""" + from je_auto_control.utils.element_diff import match_elements + from je_auto_control.utils.observation_delta.observation_delta import ( + _changed_fields) + diff = match_elements(list(before), list(after), iou_threshold=iou_threshold) + changed = [pair["after"] for pair in diff["matched"] + if _changed_fields(pair["before"], pair["after"], move_threshold)] + return diff["added"] + diff["removed"] + changed + + +def _near(centers: Sequence[List[int]], point: Sequence[int], radius: int) -> bool: + return any(abs(cx - point[0]) <= radius and abs(cy - point[1]) <= radius + for cx, cy in centers) + + +def classify_effect(before: Sequence[Element], after: Sequence[Element], action: Any, + *, radius: int = 64, iou_threshold: float = 0.5, + move_threshold: int = 5) -> EffectVerdict: + """Classify the effect of ``action`` from the before/after observation pair.""" + changed = _changed_elements(before, after, float(iou_threshold), + int(move_threshold)) + centers = [_center(element) for element in changed] + if not changed: + return EffectVerdict("no_op", False, 0, [], + "no element was added, removed or changed") + point = _action_point(action) + if point is None: + return EffectVerdict("changed", False, len(changed), centers, + "the screen changed but the action had no target point") + if _near(centers, point, int(radius)): + return EffectVerdict("changed_near_target", True, len(changed), centers, + "a change occurred within radius of the target point") + return EffectVerdict("changed_elsewhere", False, len(changed), centers, + "the screen changed, but away from the target point") + + +def effect_near_point(before: Sequence[Element], after: Sequence[Element], + point: Sequence[int], *, radius: int = 64, + iou_threshold: float = 0.5, move_threshold: int = 5) -> bool: + """Return whether any change between the frames lies within ``radius`` of ``point``.""" + changed = _changed_elements(before, after, float(iou_threshold), + int(move_threshold)) + return _near([_center(element) for element in changed], point, int(radius)) + + +def is_no_op(before: Sequence[Element], after: Sequence[Element], *, + iou_threshold: float = 0.5, move_threshold: int = 5) -> bool: + """Return whether the action produced no observable change at all.""" + return not _changed_elements(before, after, float(iou_threshold), + int(move_threshold)) diff --git a/je_auto_control/utils/column_layout/__init__.py b/je_auto_control/utils/column_layout/__init__.py new file mode 100644 index 00000000..962c9567 --- /dev/null +++ b/je_auto_control/utils/column_layout/__init__.py @@ -0,0 +1,9 @@ +"""Infer columns from vertical whitespace, for borderless tables.""" +from je_auto_control.utils.column_layout.column_layout import ( + assign_columns, column_gutters, detect_borderless_table, vertical_projection, +) + +__all__ = [ + "vertical_projection", "column_gutters", + "assign_columns", "detect_borderless_table", +] diff --git a/je_auto_control/utils/column_layout/column_layout.py b/je_auto_control/utils/column_layout/column_layout.py new file mode 100644 index 00000000..0de5f4c2 --- /dev/null +++ b/je_auto_control/utils/column_layout/column_layout.py @@ -0,0 +1,141 @@ +"""Infer columns from vertical whitespace, for borderless tables. + +``ocr/structure`` detects tables only when *every* row's cell-left-x matches within a +tolerance — it collapses on ragged or borderless tables, right-aligned numeric columns, or +any row with a missing cell. ``edge_lines.find_grid`` needs ruling lines, so a table drawn +purely with whitespace (no borders) has no grid at all. ``column_layout`` finds columns the +robust way the layout-analysis literature uses: by the *gaps*. It projects the OCR boxes onto +the x-axis (an ink-density profile), reads off the persistent empty vertical bands as column +gutters, and assigns each box a column index — then buckets rows by vertical spacing to emit a +borderless table. + +Pure-stdlib over plain box dicts (no numpy needed — a difference-array projection), so it is +fully unit-testable with no image and no OCR engine. Reuses ``table_grid_fill``'s box-bounds +reader. Imports no ``PySide6``. +""" +from typing import Any, Dict, List, Optional, Sequence + +from je_auto_control.utils.table_grid_fill.table_grid_fill import _box_bounds + +Box = Dict[str, Any] + + +def _center_x(box: Box) -> int: + left, _, right, _ = _box_bounds(box) + return (left + right) // 2 + + +def _center_y(box: Box) -> int: + _, top, _, bottom = _box_bounds(box) + return (top + bottom) // 2 + + +def vertical_projection(boxes: Sequence[Box], *, + page_width: Optional[int] = None) -> List[int]: + """Return the per-column ink-density profile (how many boxes cover each x).""" + bounds = [_box_bounds(box) for box in boxes] + width = int(page_width) if page_width else max((r for _, _, r, _ in bounds), + default=0) + diff = [0] * (width + 1) + for left, _, right, _ in bounds: + left, right = max(0, min(width, left)), max(0, min(width, right)) + if right > left: + diff[left] += 1 + diff[right] -= 1 + profile, running = [], 0 + for x in range(width): + running += diff[x] + profile.append(running) + return profile + + +def column_gutters(boxes: Sequence[Box], *, page_width: Optional[int] = None, + min_gap: int = 8) -> List[Dict[str, int]]: + """Return the interior empty vertical bands (column separators) >= ``min_gap`` wide.""" + profile = vertical_projection(boxes, page_width=page_width) + gutters: List[Dict[str, int]] = [] + start: Optional[int] = None + for x, value in enumerate(profile): + if value == 0: + start = x if start is None else start + continue + if start is not None and start > 0 and x - start >= int(min_gap): + gutters.append({"start": start, "end": x, "width": x - start}) + start = None + return gutters + + +def _column_cuts(gutters: Sequence[Dict[str, int]]) -> List[int]: + """Return the x mid-points of the gutters — the column boundaries.""" + return [(gutter["start"] + gutter["end"]) // 2 for gutter in gutters] + + +def _column_of(center_x: int, cuts: Sequence[int]) -> int: + """Return the 0-based column index of a centre-x given the boundary cuts.""" + index = 0 + for cut in cuts: + if center_x < cut: + break + index += 1 + return index + + +def assign_columns(boxes: Sequence[Box], *, page_width: Optional[int] = None, + min_gap: int = 8) -> List[Box]: + """Return each box tagged with a ``column`` index derived from the gutters.""" + cuts = _column_cuts(column_gutters(boxes, page_width=page_width, min_gap=min_gap)) + return [dict(box, column=_column_of(_center_x(box), cuts)) for box in boxes] + + +def _row_gap(boxes: Sequence[Box], row_gap: Optional[int]) -> int: + """Pick the row-split gap: caller value, or half the median box height.""" + if row_gap is not None: + return int(row_gap) + heights = sorted((_box_bounds(b)[3] - _box_bounds(b)[1]) for b in boxes) + return max(1, heights[len(heights) // 2] // 2) + + +def _bucket_rows(boxes: Sequence[Box], gap: int) -> List[List[Box]]: + """Group boxes into rows by vertical spacing; sort each row by column.""" + ordered = sorted(boxes, key=_center_y) + rows: List[List[Box]] = [[ordered[0]]] + last = _center_y(ordered[0]) + for box in ordered[1:]: + center = _center_y(box) + if center - last > gap: + rows.append([box]) + else: + rows[-1].append(box) + last = center + for row in rows: + row.sort(key=lambda item: item.get("column", 0)) + return rows + + +def detect_borderless_table(boxes: Sequence[Box], *, + page_width: Optional[int] = None, min_gap: int = 8, + row_gap: Optional[int] = None, + min_cols: int = 2, + min_rows: int = 2) -> Optional[Dict[str, Any]]: + """Infer a borderless table from OCR boxes, or ``None`` if it is not tabular. + + Columns come from whitespace gutters, rows from vertical spacing. Returns + ``{n_rows, n_cols, rows:[[text]], columns:[gutter]}`` when at least ``min_cols`` + columns and ``min_rows`` rows are found. + """ + if not boxes: + return None + tagged = assign_columns(boxes, page_width=page_width, min_gap=min_gap) + n_cols = max(box["column"] for box in tagged) + 1 + if n_cols < int(min_cols): + return None + rows = _bucket_rows(tagged, _row_gap(tagged, row_gap)) + if len(rows) < int(min_rows): + return None + table = [["" for _ in range(n_cols)] for _ in rows] + for r, row in enumerate(rows): + for box in row: + col = box["column"] + table[r][col] = f'{table[r][col]} {box.get("text", "")}'.strip() + return {"n_rows": len(rows), "n_cols": n_cols, "rows": table, + "columns": column_gutters(boxes, page_width=page_width, min_gap=min_gap)} diff --git a/je_auto_control/utils/edge_match/__init__.py b/je_auto_control/utils/edge_match/__init__.py new file mode 100644 index 00000000..a15de81c --- /dev/null +++ b/je_auto_control/utils/edge_match/__init__.py @@ -0,0 +1,6 @@ +"""Edge-shape (Chamfer / distance-transform) template matching.""" +from je_auto_control.utils.edge_match.edge_match import ( + chamfer_distance, edge_match, edge_match_all, +) + +__all__ = ["edge_match", "edge_match_all", "chamfer_distance"] diff --git a/je_auto_control/utils/edge_match/edge_match.py b/je_auto_control/utils/edge_match/edge_match.py new file mode 100644 index 00000000..2cfacded --- /dev/null +++ b/je_auto_control/utils/edge_match/edge_match.py @@ -0,0 +1,106 @@ +"""Edge-shape (Chamfer / distance-transform) template matching. + +Intensity correlation (``visual_match``) is dragged down when the same control is rendered +with a different fill, gradient, theme or anti-aliasing, and ORB feature matching +(``feature_match``) needs corner texture that flat-design glyphs — a hamburger menu, a plain +chevron — simply do not have. ``edge_match`` locates a template by its *edge shape* instead: +it runs Canny on both images, builds a distance transform of the scene edges, and slides the +template's edges over it, scoring each position by the mean distance from a template edge to +the nearest scene edge (Chamfer matching). A perfect alignment costs ~0 regardless of how the +shape is filled or shaded. + +It reuses ``visual_match``'s gray loaders / resize / NMS / ``Match`` and ``edge_lines``'s Canny +default, so no matching or geometry code is duplicated. The ``haystack`` is injectable +(ndarray / path / PIL); the search is unit-testable on synthetic arrays. OpenCV + NumPy are +imported lazily. Imports no ``PySide6``. +""" +from typing import Any, List, Optional, Sequence, Tuple + +from je_auto_control.utils.visual_match.visual_match import ( + Match, _haystack_gray, _nms, _resize, _to_gray, +) + +ImageSource = Any +_DEFAULT_CANNY = (50, 150) + + +def _edges(gray, canny: Sequence[int]): + import cv2 + return cv2.Canny(gray, int(canny[0]), int(canny[1])) + + +def _chamfer_score_map(template_gray, scene_gray, canny: Sequence[int]): + """Return ``(score_map, (w, h))`` where higher = better edge alignment, or ``(None, _)``.""" + import cv2 + import numpy as np + scene_edges = _edges(scene_gray, canny) + template_edges = _edges(template_gray, canny) + edge_count = int(np.count_nonzero(template_edges)) + if edge_count == 0 or template_edges.shape[0] > scene_edges.shape[0] \ + or template_edges.shape[1] > scene_edges.shape[1]: + return None, (0, 0) + distance = cv2.distanceTransform(cv2.bitwise_not(scene_edges), + cv2.DIST_L2, 3).astype(np.float32) + mask = (template_edges > 0).astype(np.float32) + total = cv2.matchTemplate(distance, mask, cv2.TM_CCORR) + score_map = 1.0 / (1.0 + total / edge_count) + return score_map, (template_edges.shape[1], template_edges.shape[0]) + + +def _best(score_map, size: Tuple[int, int], scale: float, min_score: float, + current: Optional[Match]) -> Optional[Match]: + import cv2 + _, max_val, _, max_loc = cv2.minMaxLoc(score_map) + if max_val >= min_score and (current is None or max_val > current.score): + return Match(int(max_loc[0]), int(max_loc[1]), size[0], size[1], + round(float(max_val), 4), float(scale)) + return current + + +def edge_match(template: ImageSource, *, haystack: Optional[ImageSource] = None, + region: Optional[Sequence[int]] = None, + scales: Sequence[float] = (1.0,), + canny: Sequence[int] = _DEFAULT_CANNY, + min_score: float = 0.7) -> Optional[Match]: + """Return the best edge-shape (Chamfer) match at or above ``min_score``, or ``None``.""" + template_gray = _to_gray(template) + scene_gray = _haystack_gray(haystack, region) + best: Optional[Match] = None + for scale in scales: + score_map, size = _chamfer_score_map(_resize(template_gray, float(scale)), + scene_gray, canny) + if score_map is not None: + best = _best(score_map, size, float(scale), float(min_score), best) + return best + + +def edge_match_all(template: ImageSource, *, + haystack: Optional[ImageSource] = None, + region: Optional[Sequence[int]] = None, + canny: Sequence[int] = _DEFAULT_CANNY, min_score: float = 0.7, + max_results: int = 20, nms_iou: float = 0.3) -> List[Match]: + """Return every edge-shape match >= ``min_score`` (scale 1.0), overlaps removed (NMS).""" + import numpy as np + score_map, size = _chamfer_score_map(_to_gray(template), + _haystack_gray(haystack, region), canny) + if score_map is None: + return [] + ys, xs = np.nonzero(score_map >= float(min_score)) + candidates = [Match(int(x), int(y), size[0], size[1], + round(float(score_map[y, x]), 4), 1.0) + for y, x in zip(ys, xs)] + return _nms(candidates, float(nms_iou))[:int(max_results)] + + +def chamfer_distance(template: ImageSource, *, + haystack: Optional[ImageSource] = None, + region: Optional[Sequence[int]] = None, + canny: Sequence[int] = _DEFAULT_CANNY) -> float: + """Return the mean edge-to-edge distance at the best alignment (0 = perfect).""" + import cv2 + score_map, _ = _chamfer_score_map(_to_gray(template), + _haystack_gray(haystack, region), canny) + if score_map is None: + return float("inf") + _, max_val, _, _ = cv2.minMaxLoc(score_map) + return round(1.0 / max_val - 1.0, 4) if max_val > 0 else float("inf") diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py index e67035c9..b49081c6 100644 --- a/je_auto_control/utils/executor/action_executor.py +++ b/je_auto_control/utils/executor/action_executor.py @@ -3323,6 +3323,70 @@ def _match_rotated_all(template: str, min_score: Any = 0.8, scales: Any = None, return {"count": len(matches), "matches": [m.to_dict() for m in matches]} +def _match_with_trust(template: str, min_score: Any = 0.0, scales: Any = None, + ambiguous_ratio: Any = 0.9, region: Any = None, + method: str = "ccoeff_normed") -> Dict[str, Any]: + """Adapter: best template match with trust metrics (ambiguity / PSR).""" + import json + from je_auto_control.utils.match_trust import match_with_trust + if isinstance(region, str): + region = json.loads(region) if region.strip() else None + match = match_with_trust(template, region=region, + scales=_seq_arg(scales, (1.0,)), + method=method, min_score=float(min_score), + ambiguous_ratio=float(ambiguous_ratio)) + return {"found": match is not None, + "match": match.to_dict() if match else None} + + +def _auto_threshold(template: str, region: Any = None, + method: str = "ccoeff_normed") -> Dict[str, Any]: + """Adapter: Otsu-derived accept threshold for a template (+ separability).""" + import json + from je_auto_control.utils.match_autothresh import auto_threshold + if isinstance(region, str): + region = json.loads(region) if region.strip() else None + info = auto_threshold(template, region=region, method=method) + return {"found": info is not None, "info": info} + + +def _match_auto(template: str, floor: Any = 0.5, max_results: Any = 20, + region: Any = None, method: str = "ccoeff_normed") -> Dict[str, Any]: + """Adapter: matches above the auto-derived (Otsu) threshold, one per region.""" + import json + from je_auto_control.utils.match_autothresh import match_auto + if isinstance(region, str): + region = json.loads(region) if region.strip() else None + matches = match_auto(template, region=region, floor=float(floor), + max_results=int(max_results), method=method) + return {"count": len(matches), "matches": [m.to_dict() for m in matches]} + + +def _edge_match(template: str, min_score: Any = 0.7, scales: Any = None, + region: Any = None) -> Dict[str, Any]: + """Adapter: best edge-shape (Chamfer) template match on the screen.""" + import json + from je_auto_control.utils.edge_match import edge_match + if isinstance(region, str): + region = json.loads(region) if region.strip() else None + match = edge_match(template, region=region, + scales=_seq_arg(scales, (1.0,)), min_score=float(min_score)) + return {"found": match is not None, + "match": match.to_dict() if match else None} + + +def _edge_match_all(template: str, min_score: Any = 0.7, max_results: Any = 20, + nms_iou: Any = 0.3, region: Any = None) -> Dict[str, Any]: + """Adapter: every edge-shape (Chamfer) match on the screen (NMS).""" + import json + from je_auto_control.utils.edge_match import edge_match_all + if isinstance(region, str): + region = json.loads(region) if region.strip() else None + matches = edge_match_all(template, region=region, min_score=float(min_score), + max_results=int(max_results), nms_iou=float(nms_iou)) + return {"count": len(matches), "matches": [m.to_dict() for m in matches]} + + def _region_arg(value: Any) -> Optional[List[int]]: """Coerce a JSON-string / list region arg into a list of ints, or None.""" import json @@ -3357,6 +3421,70 @@ def _point_for_cell(label: str, rows: Any, cols: Any, return {"point": point} +def _populate_table(grid: Any, text_boxes: Any, overlap: Any = 0.4) -> Dict[str, Any]: + """Adapter: fill a ruling-line grid with OCR text boxes → addressable table.""" + import json + from je_auto_control.utils.table_grid_fill import populate_table + if isinstance(grid, str): + grid = json.loads(grid) + if isinstance(text_boxes, str): + text_boxes = json.loads(text_boxes) + return populate_table(grid, text_boxes, overlap=float(overlap)) + + +def _column_gutters(boxes: Any, page_width: Any = None, + min_gap: Any = 8) -> Dict[str, Any]: + """Adapter: interior whitespace column gutters from OCR boxes.""" + import json + from je_auto_control.utils.column_layout import column_gutters + if isinstance(boxes, str): + boxes = json.loads(boxes) + gutters = column_gutters(boxes, page_width=int(page_width) if page_width + else None, min_gap=int(min_gap)) + return {"count": len(gutters), "gutters": gutters} + + +def _detect_borderless_table(boxes: Any, page_width: Any = None, min_gap: Any = 8, + min_cols: Any = 2, min_rows: Any = 2) -> Dict[str, Any]: + """Adapter: infer a borderless table from OCR boxes via whitespace columns.""" + import json + from je_auto_control.utils.column_layout import detect_borderless_table + if isinstance(boxes, str): + boxes = json.loads(boxes) + table = detect_borderless_table(boxes, + page_width=int(page_width) if page_width else None, + min_gap=int(min_gap), min_cols=int(min_cols), + min_rows=int(min_rows)) + return {"found": table is not None, "table": table} + + +def _associate_fields(text_boxes: Any, directions: Any = None, + max_gap: Any = 150) -> Dict[str, Any]: + """Adapter: pair form labels with their nearest aligned value boxes.""" + import json + from je_auto_control.utils.form_fields import associate_fields + if isinstance(text_boxes, str): + text_boxes = json.loads(text_boxes) + if isinstance(directions, str): + directions = json.loads(directions) if directions.strip() else None + fields = associate_fields(text_boxes, + directions=tuple(directions) if directions + else ("right", "below"), max_gap=int(max_gap)) + return {"count": len(fields), "fields": fields} + + +def _match_labels_to_widgets(labels: Any, widgets: Any) -> Dict[str, Any]: + """Adapter: match each widget (checkbox / radio / input) to its nearest label.""" + import json + from je_auto_control.utils.form_fields import match_labels_to_widgets + if isinstance(labels, str): + labels = json.loads(labels) + if isinstance(widgets, str): + widgets = json.loads(widgets) + pairs = match_labels_to_widgets(labels, widgets) + return {"count": len(pairs), "pairs": pairs} + + def _find_color_region(rgb: Any, tolerance: Any = 20, min_area: Any = 50, region: Any = None) -> Dict[str, Any]: """Adapter: locate coloured regions on the screen, largest first.""" @@ -3953,6 +4081,80 @@ def _observation_index(elements: Any, viewport: Any = None, return {"count": len(indexed), "elements": indexed} +def _delta_observation(prev: Any, curr: Any, viewport: Any = None, + max_elements: Any = 80, max_lines: Any = 40, + interactive_only: Any = True) -> Dict[str, Any]: + """Adapter: token-budgeted "what changed" delta between two element frames.""" + import json + from je_auto_control.utils.observation_delta import (delta_index, + delta_observation) + if isinstance(prev, str): + prev = json.loads(prev) + if isinstance(curr, str): + curr = json.loads(curr) + if isinstance(viewport, str): + viewport = json.loads(viewport) if viewport.strip() else None + text = delta_observation(list(prev), list(curr), viewport=viewport, + max_elements=int(max_elements), + interactive_only=bool(interactive_only), + max_lines=int(max_lines)) + delta = delta_index(list(prev), list(curr)) + return {"summary": text, "added": len(delta["added"]), + "removed": len(delta["removed"]), "changed": len(delta["changed"])} + + +def _classify_effect(before: Any, after: Any, action: Any, + radius: Any = 64) -> Dict[str, Any]: + """Adapter: classify whether an action changed the screen (target-local).""" + import json + from je_auto_control.utils.action_effect import classify_effect + if isinstance(before, str): + before = json.loads(before) + if isinstance(after, str): + after = json.loads(after) + if isinstance(action, str): + action = json.loads(action) + return classify_effect(before, after, action, radius=int(radius)).to_dict() + + +def _effect_near_point(before: Any, after: Any, point: Any, + radius: Any = 64) -> Dict[str, Any]: + """Adapter: did any before/after change land within radius of a point.""" + import json + from je_auto_control.utils.action_effect import effect_near_point + if isinstance(before, str): + before = json.loads(before) + if isinstance(after, str): + after = json.loads(after) + if isinstance(point, str): + point = json.loads(point) + return {"near": effect_near_point(before, after, point, radius=int(radius))} + + +def _check_postcondition(after: Any, spec: Any, before: Any = None) -> Dict[str, Any]: + """Adapter: evaluate a declarative postcondition spec against after/before frames.""" + import json + from je_auto_control.utils.postcondition import check_postcondition + if isinstance(after, str): + after = json.loads(after) + if isinstance(spec, str): + spec = json.loads(spec) + if isinstance(before, str): + before = json.loads(before) if before.strip() else None + return check_postcondition(after, spec, before=before).to_dict() + + +def _plan_repair(verdict: Any, max_attempts: Any = 3) -> Dict[str, Any]: + """Adapter: ordered repair tactics for an effect verdict (no_op / changed_…).""" + import json + from je_auto_control.utils.step_repair import RepairPolicy, plan_repair + if isinstance(verdict, str) and verdict.strip().startswith("{"): + verdict = json.loads(verdict) + tactics = plan_repair(verdict, + policy=RepairPolicy(max_attempts=int(max_attempts))) + return {"count": len(tactics), "tactics": tactics} + + def _validate_action(action: Any, screen: Any = None, targets: Any = None) -> Dict[str, Any]: """Adapter: validate a coordinate action (bounds + optional snap-to-target).""" @@ -5779,9 +5981,19 @@ def __init__(self): "AC_match_masked_all": _match_masked_all, "AC_match_rotated": _match_rotated, "AC_match_rotated_all": _match_rotated_all, + "AC_match_with_trust": _match_with_trust, + "AC_auto_threshold": _auto_threshold, + "AC_match_auto": _match_auto, + "AC_edge_match": _edge_match, + "AC_edge_match_all": _edge_match_all, "AC_grid_cells": _grid_cells, "AC_cell_for_point": _cell_for_point, "AC_point_for_cell": _point_for_cell, + "AC_populate_table": _populate_table, + "AC_column_gutters": _column_gutters, + "AC_detect_borderless_table": _detect_borderless_table, + "AC_associate_fields": _associate_fields, + "AC_match_labels_to_widgets": _match_labels_to_widgets, "AC_ssim_compare": _ssim_compare, "AC_ssim_changed_regions": _ssim_changed_regions, "AC_feature_match": _feature_match, @@ -5820,6 +6032,11 @@ def __init__(self): "AC_cua_command": _cua_command, "AC_serialize_observation": _serialize_observation, "AC_observation_index": _observation_index, + "AC_delta_observation": _delta_observation, + "AC_classify_effect": _classify_effect, + "AC_effect_near_point": _effect_near_point, + "AC_check_postcondition": _check_postcondition, + "AC_plan_repair": _plan_repair, "AC_validate_action": _validate_action, "AC_replay_trace": _replay_trace, "AC_match_elements": _match_elements, diff --git a/je_auto_control/utils/form_fields/__init__.py b/je_auto_control/utils/form_fields/__init__.py new file mode 100644 index 00000000..6a3b3274 --- /dev/null +++ b/je_auto_control/utils/form_fields/__init__.py @@ -0,0 +1,6 @@ +"""Associate form labels with values (multi-direction) and read checkbox state.""" +from je_auto_control.utils.form_fields.form_fields import ( + associate_fields, checkbox_state, match_labels_to_widgets, +) + +__all__ = ["associate_fields", "match_labels_to_widgets", "checkbox_state"] diff --git a/je_auto_control/utils/form_fields/form_fields.py b/je_auto_control/utils/form_fields/form_fields.py new file mode 100644 index 00000000..5fb646df --- /dev/null +++ b/je_auto_control/utils/form_fields/form_fields.py @@ -0,0 +1,122 @@ +"""Associate form labels with their values and read checkbox state. + +``ocr/structure`` recognises a label only if its text ends in ``:`` and pairs it with the +*immediately next* cell — it cannot handle a label sitting *above* its value, a two-column +key/value layout, right-aligned values, or any widget that isn't a text cell, and it has no +notion of checkbox / radio state at all. ``form_fields`` generalises this: it pairs each +label with the nearest aligned value in any of several *directions* (right, below), matches +free-standing widgets (checkboxes, radios, inputs) to their nearest label, and reads a +checkbox's checked state from its fill ratio. + +The association is pure-stdlib over plain box dicts (fully unit-testable, no image); only +``checkbox_state`` touches pixels, isolated behind the shared ``visual_match`` gray loader so +tests can pass a synthetic array. Reuses ``table_grid_fill``'s box-bounds reader. Imports no +``PySide6``. +""" +from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple + +from je_auto_control.utils.table_grid_fill.table_grid_fill import _box_bounds + +Box = Dict[str, Any] + + +def _overlap_1d(a0: int, a1: int, b0: int, b1: int) -> int: + return max(0, min(a1, b1) - max(a0, b0)) + + +def _right_value(label: Box, values: Sequence[Box], max_gap: int): + """Nearest value to the right of ``label`` that shares a row, or ``None``.""" + left, top, right, bottom = _box_bounds(label) + best: Optional[Tuple[Box, int]] = None + for value in values: + vl, vt, _, vb = _box_bounds(value) + gap = vl - right + if vl >= right and _overlap_1d(top, bottom, vt, vb) > 0 and 0 <= gap <= max_gap \ + and (best is None or gap < best[1]): + best = (value, gap) + return best + + +def _below_value(label: Box, values: Sequence[Box], max_gap: int): + """Nearest value below ``label`` that shares a column, or ``None``.""" + left, top, right, bottom = _box_bounds(label) + best: Optional[Tuple[Box, int]] = None + for value in values: + vl, vt, vr, _ = _box_bounds(value) + gap = vt - bottom + if vt >= bottom and _overlap_1d(left, right, vl, vr) > 0 and 0 <= gap <= max_gap \ + and (best is None or gap < best[1]): + best = (value, gap) + return best + + +_DIRECTIONS: Dict[str, Callable[[Box, Sequence[Box], int], Any]] = { + "right": _right_value, "below": _below_value} + + +def _clean_label(box: Box) -> str: + return str(box.get("text", "")).strip().rstrip(":").strip() + + +def _best_value(label: Box, values: Sequence[Box], directions: Sequence[str], + max_gap: int): + """Pick the nearest value across the requested directions, or ``None``.""" + best = None + for direction in directions: + candidate = _DIRECTIONS[direction](label, values, max_gap) + if candidate is not None and (best is None or candidate[1] < best[1]): + best = (candidate[0], candidate[1], direction) + return best + + +def associate_fields(text_boxes: Sequence[Box], *, + directions: Sequence[str] = ("right", "below"), + max_gap: int = 150) -> List[Dict[str, Any]]: + """Pair each ``label:`` box with its nearest aligned value box. + + Labels are boxes whose text ends in ``:``; every other box is a candidate value. + Returns ``{label, value, direction, gap, label_box, value_box}`` per matched label. + """ + labels = [b for b in text_boxes if str(b.get("text", "")).strip().endswith(":")] + label_ids = {id(b) for b in labels} + values = [b for b in text_boxes if id(b) not in label_ids] + fields: List[Dict[str, Any]] = [] + for label in labels: + best = _best_value(label, values, directions, int(max_gap)) + if best is not None: + fields.append({"label": _clean_label(label), + "value": str(best[0].get("text", "")), + "direction": best[2], "gap": best[1], + "label_box": label, "value_box": best[0]}) + return fields + + +def match_labels_to_widgets(labels: Sequence[Box], + widgets: Sequence[Box]) -> List[Dict[str, Any]]: + """Match each widget (checkbox / radio / input) to its nearest label by centre.""" + pairs: List[Dict[str, Any]] = [] + for widget in widgets: + wl, wt, wr, wb = _box_bounds(widget) + wcx, wcy = (wl + wr) // 2, (wt + wb) // 2 + best: Optional[Tuple[Box, int]] = None + for label in labels: + ll, lt, lr, lb = _box_bounds(label) + dist = abs(wcx - (ll + lr) // 2) + abs(wcy - (lt + lb) // 2) + if best is None or dist < best[1]: + best = (label, dist) + if best is not None: + pairs.append({"widget": widget, "label": str(best[0].get("text", "")), + "distance": best[1]}) + return pairs + + +def checkbox_state(image: Any, box: Box, *, fill_threshold: float = 0.15) -> str: + """Return ``"checked"`` / ``"unchecked"`` from the dark-pixel fill ratio of ``box``.""" + import numpy as np + from je_auto_control.utils.visual_match.visual_match import _to_gray + left, top, right, bottom = _box_bounds(box) + patch = _to_gray(image)[top:bottom, left:right] + if patch.size == 0: + return "unchecked" + filled = float(np.count_nonzero(patch < 128)) / patch.size + return "checked" if filled >= float(fill_threshold) else "unchecked" diff --git a/je_auto_control/utils/match_autothresh/__init__.py b/je_auto_control/utils/match_autothresh/__init__.py new file mode 100644 index 00000000..312f21d1 --- /dev/null +++ b/je_auto_control/utils/match_autothresh/__init__.py @@ -0,0 +1,6 @@ +"""Otsu auto-thresholding for template matching (no hand-tuned min_score).""" +from je_auto_control.utils.match_autothresh.match_autothresh import ( + auto_threshold, match_auto, +) + +__all__ = ["auto_threshold", "match_auto"] diff --git a/je_auto_control/utils/match_autothresh/match_autothresh.py b/je_auto_control/utils/match_autothresh/match_autothresh.py new file mode 100644 index 00000000..60b2a952 --- /dev/null +++ b/je_auto_control/utils/match_autothresh/match_autothresh.py @@ -0,0 +1,99 @@ +"""Auto-derive a template-match threshold from the score map (Otsu). + +Every call to ``match_template_all`` forces the caller to guess ``min_score``: too low +floods NMS with background noise, too high drops scaled / re-themed targets, and the right +value differs per asset and per screen. ``match_autothresh`` removes the magic number — it +runs Otsu's method on the *correlation score histogram* (not pixel intensities, the way +``preprocess.binarize`` does) to find the valley between the "background correlation" mass +and the "real match" mass, and returns that cut-off plus a *separability* number so the +caller knows when the histogram is unimodal (no clear match → don't trust the threshold). + +It reuses ``visual_match._score_map`` — the full ``matchTemplate`` surface the public +matchers discard — plus the shared ``Match`` / ``_nms``. The ``haystack`` is injectable +(ndarray / path / PIL); the analysis is unit-testable on synthetic arrays. OpenCV + NumPy +are imported lazily. Imports no ``PySide6``. +""" +from typing import Any, Dict, List, Optional, Sequence + +from je_auto_control.utils.visual_match.visual_match import Match, _score_map + +ImageSource = Any + + +def _separability(scaled, threshold: float) -> float: + """Otsu separability eta = between-class variance / total variance (0..1).""" + total_var = float(scaled.var()) + if total_var < 1e-9: + return 0.0 + below, above = scaled[scaled <= threshold], scaled[scaled > threshold] + if below.size == 0 or above.size == 0: + return 0.0 + weight0, weight1 = below.size / scaled.size, above.size / scaled.size + between = weight0 * weight1 * (float(below.mean()) - float(above.mean())) ** 2 + return max(0.0, min(1.0, between / total_var)) + + +def _otsu_on_scores(score_map): + """Return ``(threshold_in_score_units, separability)`` for a correlation surface.""" + import cv2 + import numpy as np + low, high = float(score_map.min()), float(score_map.max()) + if high - low < 1e-9: + return low, 0.0 + scaled = ((score_map - low) / (high - low) * 255.0).astype(np.uint8) + level, _ = cv2.threshold(scaled, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) + threshold = low + (float(level) / 255.0) * (high - low) + return threshold, _separability(scaled, float(level)) + + +def auto_threshold(template: ImageSource, *, haystack: Optional[ImageSource] = None, + region: Optional[Sequence[int]] = None, + method: str = "ccoeff_normed") -> Optional[Dict[str, Any]]: + """Return an Otsu-derived accept cut-off for ``template``, or ``None``. + + ``{threshold, separability, n_above}`` — ``separability`` near 0 means the score + histogram is unimodal (no clear match) and the threshold should not be trusted. + """ + import numpy as np + score_map, _ = _score_map(template, haystack, region=region, method=method) + if score_map is None: + return None + threshold, separability = _otsu_on_scores(score_map) + return {"threshold": round(threshold, 4), + "separability": round(separability, 4), + "n_above": int(np.count_nonzero(score_map >= threshold))} + + +def _peak_in_box(score_map, box: Dict[str, Any]): + """Return ``(x, y, score)`` of the highest score inside one connected blob box.""" + import numpy as np + x, y, width, height = box["x"], box["y"], box["width"], box["height"] + sub = score_map[y:y + height, x:x + width] + iy, ix = np.unravel_index(int(np.argmax(sub)), sub.shape) + return x + int(ix), y + int(iy), float(sub[iy, ix]) + + +def match_auto(template: ImageSource, *, haystack: Optional[ImageSource] = None, + region: Optional[Sequence[int]] = None, floor: float = 0.5, + method: str = "ccoeff_normed", max_results: int = 20) -> List[Match]: + """Return one match per above-threshold region (auto cut-off clamped by ``floor``). + + The cut-off is ``max(floor, otsu_threshold)`` so a unimodal / noisy surface cannot + drag the threshold below a sane floor. Each connected above-threshold region yields + a single peak (its highest score), avoiding the duplicate hits a raw pixel scan + + NMS leaves on a wide correlation peak. Ordered by score, capped at ``max_results``. + """ + import numpy as np + from je_auto_control.utils.cv2_utils.blobs import connected_boxes + score_map, tmpl = _score_map(template, haystack, region=region, method=method) + if score_map is None: + return [] + threshold, _ = _otsu_on_scores(score_map) + cutoff = max(float(floor), threshold) + mask = (score_map >= cutoff).astype(np.uint8) + height, width = tmpl.shape[:2] + matches = [Match(px, py, width, height, round(score, 4), 1.0) + for px, py, score in + (_peak_in_box(score_map, box) for box in connected_boxes(mask))] + matches.sort(key=lambda m: m.score, reverse=True) + return matches[:int(max_results)] diff --git a/je_auto_control/utils/match_trust/__init__.py b/je_auto_control/utils/match_trust/__init__.py new file mode 100644 index 00000000..413a5802 --- /dev/null +++ b/je_auto_control/utils/match_trust/__init__.py @@ -0,0 +1,6 @@ +"""Template-match trustworthiness scoring (second-peak ratio + peak-to-sidelobe).""" +from je_auto_control.utils.match_trust.match_trust import ( + TrustedMatch, match_with_trust, score_peaks, +) + +__all__ = ["TrustedMatch", "match_with_trust", "score_peaks"] diff --git a/je_auto_control/utils/match_trust/match_trust.py b/je_auto_control/utils/match_trust/match_trust.py new file mode 100644 index 00000000..8991288e --- /dev/null +++ b/je_auto_control/utils/match_trust/match_trust.py @@ -0,0 +1,130 @@ +"""Trustworthiness scoring for template matches (second-peak ratio + PSR). + +``visual_match.match_template`` returns the single best score and happily clicks it — +but a control repeated in a toolbar, or a near-identical sibling, correlates ~0.95 in +*two* places, so a high score does not mean an *unambiguous* match. This adds a Lowe-style +ratio test *for pixel templates* (``feature_match`` already does it for ORB keypoints, but +nothing did it for ``match_template``): it inspects the whole correlation surface, compares +the global peak against the next-best peak outside an exclusion window, and computes the +peak-to-sidelobe ratio (PSR), flagging matches that are strong-but-ambiguous. + +It reuses ``visual_match._score_map`` (the full ``matchTemplate`` surface the public matchers +discard) plus the shared gray loaders, so no matching code is duplicated. The ``haystack`` is +injectable (ndarray / path / PIL); the search is unit-testable on synthetic arrays. OpenCV + +NumPy are imported lazily. Imports no ``PySide6``. +""" +from dataclasses import asdict, dataclass +from typing import Any, Dict, List, Optional, Sequence + +from je_auto_control.utils.visual_match.visual_match import _score_map + +ImageSource = Any + + +@dataclass(frozen=True) +class TrustedMatch: + """A match plus its trust metrics: second-best score, peak ratio, PSR, ambiguity.""" + + x: int + y: int + width: int + height: int + score: float + scale: float + second_score: float + peak_ratio: float + psr: Optional[float] + is_ambiguous: bool + + @property + def center(self) -> List[int]: + """The match's centre point ``[x, y]`` (ready to click).""" + return [self.x + self.width // 2, self.y + self.height // 2] + + def to_dict(self) -> Dict[str, Any]: + """Return the match as a plain dict including the centre point.""" + data = asdict(self) + data["center"] = self.center + return data + + +def _safe_psr(value: float) -> Optional[float]: + """Round the PSR, or ``None`` when the sidelobe has no variance (a perfect peak).""" + import math + return round(value, 4) if math.isfinite(value) else None + + +def _peak_stats(score_map, exclude_radius: int): + """Return ``(loc, best, second, peak_ratio, psr)`` for one correlation surface.""" + import numpy as np + height, width = score_map.shape + best_y, best_x = divmod(int(np.argmax(score_map)), width) + best = float(score_map[best_y, best_x]) + mask = np.ones(score_map.shape, dtype=bool) + y0, y1 = max(0, best_y - exclude_radius), min(height, best_y + exclude_radius + 1) + x0, x1 = max(0, best_x - exclude_radius), min(width, best_x + exclude_radius + 1) + mask[y0:y1, x0:x1] = False + sidelobe = score_map[mask] + if sidelobe.size: + second, mean, std = (float(sidelobe.max()), float(sidelobe.mean()), + float(sidelobe.std())) + else: + second, mean, std = 0.0, 0.0, 0.0 + peak_ratio = second / best if abs(best) > 1e-9 else 1.0 + psr = (best - mean) / std if std > 1e-9 else float("inf") + return (best_x, best_y), best, second, peak_ratio, psr + + +def _default_radius(template_shape, exclude_radius: Optional[int]) -> int: + """Pick the peak-exclusion radius: caller value, or a quarter of the smaller side.""" + if exclude_radius: + return int(exclude_radius) + return max(3, min(template_shape[:2]) // 4) + + +def score_peaks(template: ImageSource, *, haystack: Optional[ImageSource] = None, + region: Optional[Sequence[int]] = None, + exclude_radius: Optional[int] = None, method: str = "ccoeff_normed", + ambiguous_ratio: float = 0.9) -> Optional[Dict[str, Any]]: + """Return peak/sidelobe trust metrics for ``template`` at scale 1.0, or ``None``. + + ``{best, second, peak_ratio, psr, ambiguous, location}`` — ``peak_ratio`` near 1 + means a second place scored almost as high (ambiguous); ``psr`` is the + peak-to-sidelobe ratio (``None`` when the sidelobe is flat). + """ + score_map, tmpl = _score_map(template, haystack, region=region, method=method) + if score_map is None: + return None + radius = _default_radius(tmpl.shape, exclude_radius) + (peak_x, peak_y), best, second, ratio, psr = _peak_stats(score_map, radius) + return {"best": round(best, 4), "second": round(second, 4), + "peak_ratio": round(ratio, 4), "psr": _safe_psr(psr), + "ambiguous": ratio >= ambiguous_ratio, "location": [peak_x, peak_y]} + + +def match_with_trust(template: ImageSource, *, + haystack: Optional[ImageSource] = None, + region: Optional[Sequence[int]] = None, + scales: Sequence[float] = (1.0,), method: str = "ccoeff_normed", + min_score: float = 0.0, ambiguous_ratio: float = 0.9, + exclude_radius: Optional[int] = None) -> Optional[TrustedMatch]: + """Return the best match (over ``scales``) with trust metrics attached, or ``None``. + + ``is_ambiguous`` is set when the next-best peak scores at least ``ambiguous_ratio`` + times the best — a strong but untrustworthy match the caller should not blindly click. + """ + best_match: Optional[TrustedMatch] = None + for scale in scales: + score_map, tmpl = _score_map(template, haystack, region=region, + method=method, scale=float(scale)) + if score_map is None: + continue + radius = _default_radius(tmpl.shape, exclude_radius) + (peak_x, peak_y), best, second, ratio, psr = _peak_stats(score_map, radius) + if best < min_score or (best_match is not None and best <= best_match.score): + continue + best_match = TrustedMatch(int(peak_x), int(peak_y), tmpl.shape[1], + tmpl.shape[0], round(best, 4), float(scale), + round(second, 4), round(ratio, 4), _safe_psr(psr), + ratio >= ambiguous_ratio) + return best_match diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py index 315ae3e9..e76c5451 100644 --- a/je_auto_control/utils/mcp_server/tools/_factories.py +++ b/je_auto_control/utils/mcp_server/tools/_factories.py @@ -3331,6 +3331,84 @@ def observation_tools() -> List[MCPTool]: handler=h.observation_index, annotations=READ_ONLY, ), + MCPTool( + name="ac_delta_observation", + description=("Token-budgeted 'what changed since last step': diff 'prev' " + "vs 'curr' element lists and render ONLY the churn — " + "'+ [i] role \"name\"' (appeared) / '~ [i] …' (changed) / " + "'- …' (vanished), stable omitted, capped at 'max_lines'. " + "Returns {summary, added, removed, changed}. Feed this each " + "turn instead of the whole screen."), + input_schema=schema({ + "prev": {"type": "array", "items": {"type": "object"}}, + "curr": {"type": "array", "items": {"type": "object"}}, + "viewport": {"type": "array", "items": {"type": "integer"}}, + "max_elements": {"type": "integer"}, + "max_lines": {"type": "integer"}, + "interactive_only": {"type": "boolean"}}, + required=["prev", "curr"]), + handler=h.delta_observation, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_classify_effect", + description=("Did my action do anything? Diff 'before' vs 'after' element " + "lists and classify the result given the 'action' (with x,y): " + "no_op / changed_near_target / changed_elsewhere / changed. " + "Returns {effect, changed_near_target, changed_count, " + "changed_centers, reason}. 'radius' for target attribution."), + input_schema=schema({ + "before": {"type": "array", "items": {"type": "object"}}, + "after": {"type": "array", "items": {"type": "object"}}, + "action": {"type": "object"}, + "radius": {"type": "integer"}}, + required=["before", "after", "action"]), + handler=h.classify_effect, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_effect_near_point", + description=("Did any change between 'before' and 'after' land within " + "'radius' of 'point' [x,y]? Returns {near}."), + input_schema=schema({ + "before": {"type": "array", "items": {"type": "object"}}, + "after": {"type": "array", "items": {"type": "object"}}, + "point": {"type": "array", "items": {"type": "integer"}}, + "radius": {"type": "integer"}}, + required=["before", "after", "point"]), + handler=h.effect_near_point, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_check_postcondition", + description=("Check a declarative postcondition 'spec' against the " + "'after' element list (optionally diffed vs 'before'). " + "Clauses: appears / disappears / enabled / disabled / " + "text_present / text_absent / count. Returns {ok, clauses:" + "[{type,ok,detail}], failed}. e.g. spec {\"appears\": " + "{\"role\":\"dialog\"}, \"disabled\": {\"name\":\"Submit\"}}."), + input_schema=schema({ + "after": {"type": "array", "items": {"type": "object"}}, + "spec": {"type": "object"}, + "before": {"type": "array", "items": {"type": "object"}}}, + required=["after", "spec"]), + handler=h.check_postcondition, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_plan_repair", + description=("Given an effect 'verdict' (string like 'no_op' / " + "'changed_elsewhere', or an EffectVerdict dict), return the " + "ordered repair tactics to try — wait_retry / relocate / " + "nudge / scroll_into_view / escalate — capped at " + "'max_attempts'. Returns {count, tactics}."), + input_schema=schema({ + "verdict": {"type": "string"}, + "max_attempts": {"type": "integer"}}, + required=["verdict"]), + handler=h.plan_repair, + annotations=READ_ONLY, + ), ] @@ -3600,6 +3678,87 @@ def rotated_match_tools() -> List[MCPTool]: handler=h.match_rotated_all, annotations=READ_ONLY, ), + MCPTool( + name="ac_match_with_trust", + description=("Find 'template' AND judge whether the match is trustworthy " + "vs ambiguous: returns {found, match:{...,score,second_score," + "peak_ratio,psr,is_ambiguous,center}}. is_ambiguous=true means " + "a second place scored ~as high (e.g. a duplicate toolbar " + "button) - do NOT blindly click. 'ambiguous_ratio' (default " + "0.9), 'min_score', 'scales', 'region', 'method'."), + input_schema=schema({ + "template": {"type": "string"}, + "min_score": {"type": "number"}, + "scales": {"type": "array", "items": {"type": "number"}}, + "ambiguous_ratio": {"type": "number"}, + "region": {"type": "array", "items": {"type": "integer"}}, + "method": {"type": "string"}}, + required=["template"]), + handler=h.match_with_trust, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_auto_threshold", + description=("Derive an accept threshold for 'template' by Otsu on the " + "match score map (no hand-tuned min_score). Returns " + "{found, info:{threshold, separability, n_above}}. " + "separability near 0 = unimodal (no clear match) - do NOT " + "trust the threshold. 'region', 'method'."), + input_schema=schema({ + "template": {"type": "string"}, + "region": {"type": "array", "items": {"type": "integer"}}, + "method": {"type": "string"}}, + required=["template"]), + handler=h.auto_threshold, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_match_auto", + description=("Find every occurrence of 'template' above an AUTO-derived " + "(Otsu) threshold - one peak per region, no min_score to " + "tune. 'floor' (default 0.5) clamps the threshold so a noisy " + "surface can't match junk. Returns {count, matches}."), + input_schema=schema({ + "template": {"type": "string"}, + "floor": {"type": "number"}, + "max_results": {"type": "integer"}, + "region": {"type": "array", "items": {"type": "integer"}}, + "method": {"type": "string"}}, + required=["template"]), + handler=h.match_auto, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_edge_match", + description=("Find 'template' by its EDGE SHAPE (Chamfer / distance " + "transform), robust to fill / gradient / theme / " + "anti-aliasing where intensity NCC fails and to flat icons " + "ORB can't key on. Returns {found, match}. 'min_score', " + "'scales', 'region'."), + input_schema=schema({ + "template": {"type": "string"}, + "min_score": {"type": "number"}, + "scales": {"type": "array", "items": {"type": "number"}}, + "region": {"type": "array", "items": {"type": "integer"}}}, + required=["template"]), + handler=h.edge_match, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_edge_match_all", + description=("Find EVERY edge-shape (Chamfer) match of 'template' >= " + "'min_score', overlaps removed by NMS. Returns " + "{count, matches}."), + input_schema=schema({ + "template": {"type": "string"}, + "min_score": {"type": "number"}, + "max_results": {"type": "integer"}, + "nms_iou": {"type": "number"}, + "region": {"type": "array", "items": {"type": "integer"}}}, + required=["template"]), + handler=h.edge_match_all, + annotations=READ_ONLY, + ), ] @@ -3646,6 +3805,79 @@ def screen_grid_tools() -> List[MCPTool]: handler=h.point_for_cell, annotations=READ_ONLY, ), + MCPTool( + name="ac_populate_table", + description=("Fill a ruling-line 'grid' (from find_grid: {rows:[y..]," + "cols:[x..]}) with OCR 'text_boxes' ([{x,y,width,height," + "text}]) into an addressable table. Returns {n_rows, n_cols, " + "cells:[{row,col,text}], spans:[merged-cell candidates]}. " + "'overlap' (default 0.4) gates a box straddling a rule."), + input_schema=schema({ + "grid": {"type": "object"}, + "text_boxes": {"type": "array", "items": {"type": "object"}}, + "overlap": {"type": "number"}}, + required=["grid", "text_boxes"]), + handler=h.populate_table, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_column_gutters", + description=("Find borderless-table COLUMNS by whitespace: project OCR " + "'boxes' ([{x,y,width,height,text}]) onto the x-axis and " + "return the interior empty vertical bands >= 'min_gap' wide. " + "Returns {count, gutters:[{start,end,width}]}."), + input_schema=schema({ + "boxes": {"type": "array", "items": {"type": "object"}}, + "page_width": {"type": "integer"}, + "min_gap": {"type": "integer"}}, + required=["boxes"]), + handler=h.column_gutters, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_detect_borderless_table", + description=("Infer a BORDERLESS table from OCR 'boxes' (no ruling lines): " + "columns from whitespace gutters, rows from vertical spacing. " + "Returns {found, table:{n_rows,n_cols,rows:[[text]],columns}}. " + "Use when find_grid finds no lines. 'min_gap', 'min_cols', " + "'min_rows'."), + input_schema=schema({ + "boxes": {"type": "array", "items": {"type": "object"}}, + "page_width": {"type": "integer"}, + "min_gap": {"type": "integer"}, + "min_cols": {"type": "integer"}, + "min_rows": {"type": "integer"}}, + required=["boxes"]), + handler=h.detect_borderless_table, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_associate_fields", + description=("Pair form labels with their values: for each 'text_boxes' " + "entry whose text ends in ':', find the nearest aligned value " + "box in 'directions' (right / below). Returns {count, fields:" + "[{label,value,direction,gap,label_box,value_box}]}. " + "'max_gap' caps the distance."), + input_schema=schema({ + "text_boxes": {"type": "array", "items": {"type": "object"}}, + "directions": {"type": "array", "items": {"type": "string"}}, + "max_gap": {"type": "integer"}}, + required=["text_boxes"]), + handler=h.associate_fields, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_match_labels_to_widgets", + description=("Match each widget (checkbox / radio / input box) to its " + "nearest 'labels' entry by centre distance. Returns {count, " + "pairs:[{widget,label,distance}]}."), + input_schema=schema({ + "labels": {"type": "array", "items": {"type": "object"}}, + "widgets": {"type": "array", "items": {"type": "object"}}}, + required=["labels", "widgets"]), + handler=h.match_labels_to_widgets, + annotations=READ_ONLY, + ), ] diff --git a/je_auto_control/utils/mcp_server/tools/_handlers.py b/je_auto_control/utils/mcp_server/tools/_handlers.py index 6242ff03..bec7c7fd 100644 --- a/je_auto_control/utils/mcp_server/tools/_handlers.py +++ b/je_auto_control/utils/mcp_server/tools/_handlers.py @@ -2108,6 +2108,35 @@ def match_rotated_all(template, min_score=0.8, scales=None, angles=None, nms_iou, region) +def match_with_trust(template, min_score=0.0, scales=None, ambiguous_ratio=0.9, + region=None, method="ccoeff_normed"): + from je_auto_control.utils.executor.action_executor import _match_with_trust + return _match_with_trust(template, min_score, scales, ambiguous_ratio, + region, method) + + +def auto_threshold(template, region=None, method="ccoeff_normed"): + from je_auto_control.utils.executor.action_executor import _auto_threshold + return _auto_threshold(template, region, method) + + +def match_auto(template, floor=0.5, max_results=20, region=None, + method="ccoeff_normed"): + from je_auto_control.utils.executor.action_executor import _match_auto + return _match_auto(template, floor, max_results, region, method) + + +def edge_match(template, min_score=0.7, scales=None, region=None): + from je_auto_control.utils.executor.action_executor import _edge_match + return _edge_match(template, min_score, scales, region) + + +def edge_match_all(template, min_score=0.7, max_results=20, nms_iou=0.3, + region=None): + from je_auto_control.utils.executor.action_executor import _edge_match_all + return _edge_match_all(template, min_score, max_results, nms_iou, region) + + def grid_cells(rows, cols, region=None): from je_auto_control.utils.executor.action_executor import _grid_cells return _grid_cells(rows, cols, region) @@ -2123,6 +2152,34 @@ def point_for_cell(label, rows, cols, region=None): return _point_for_cell(label, rows, cols, region) +def populate_table(grid, text_boxes, overlap=0.4): + from je_auto_control.utils.executor.action_executor import _populate_table + return _populate_table(grid, text_boxes, overlap) + + +def column_gutters(boxes, page_width=None, min_gap=8): + from je_auto_control.utils.executor.action_executor import _column_gutters + return _column_gutters(boxes, page_width, min_gap) + + +def detect_borderless_table(boxes, page_width=None, min_gap=8, min_cols=2, + min_rows=2): + from je_auto_control.utils.executor.action_executor import ( + _detect_borderless_table) + return _detect_borderless_table(boxes, page_width, min_gap, min_cols, min_rows) + + +def associate_fields(text_boxes, directions=None, max_gap=150): + from je_auto_control.utils.executor.action_executor import _associate_fields + return _associate_fields(text_boxes, directions, max_gap) + + +def match_labels_to_widgets(labels, widgets): + from je_auto_control.utils.executor.action_executor import ( + _match_labels_to_widgets) + return _match_labels_to_widgets(labels, widgets) + + def find_color_region(rgb, tolerance=20, min_area=50, region=None): from je_auto_control.utils.executor.action_executor import ( _find_color_region) @@ -2359,6 +2416,33 @@ def observation_index(elements, viewport=None, max_elements=80): return _observation_index(elements, viewport, max_elements) +def delta_observation(prev, curr, viewport=None, max_elements=80, max_lines=40, + interactive_only=True): + from je_auto_control.utils.executor.action_executor import _delta_observation + return _delta_observation(prev, curr, viewport, max_elements, max_lines, + interactive_only) + + +def classify_effect(before, after, action, radius=64): + from je_auto_control.utils.executor.action_executor import _classify_effect + return _classify_effect(before, after, action, radius) + + +def effect_near_point(before, after, point, radius=64): + from je_auto_control.utils.executor.action_executor import _effect_near_point + return _effect_near_point(before, after, point, radius) + + +def check_postcondition(after, spec, before=None): + from je_auto_control.utils.executor.action_executor import _check_postcondition + return _check_postcondition(after, spec, before) + + +def plan_repair(verdict, max_attempts=3): + from je_auto_control.utils.executor.action_executor import _plan_repair + return _plan_repair(verdict, max_attempts) + + def validate_action(action, screen=None, targets=None): from je_auto_control.utils.executor.action_executor import _validate_action return _validate_action(action, screen, targets) diff --git a/je_auto_control/utils/observation_delta/__init__.py b/je_auto_control/utils/observation_delta/__init__.py new file mode 100644 index 00000000..c729d898 --- /dev/null +++ b/je_auto_control/utils/observation_delta/__init__.py @@ -0,0 +1,6 @@ +"""Token-budgeted observation delta: what changed between two UI frames.""" +from je_auto_control.utils.observation_delta.observation_delta import ( + delta_index, delta_observation, summarize_delta, +) + +__all__ = ["delta_index", "delta_observation", "summarize_delta"] diff --git a/je_auto_control/utils/observation_delta/observation_delta.py b/je_auto_control/utils/observation_delta/observation_delta.py new file mode 100644 index 00000000..4b23d112 --- /dev/null +++ b/je_auto_control/utils/observation_delta/observation_delta.py @@ -0,0 +1,97 @@ +"""Token-budgeted "what changed since last step" observation delta. + +``observation.serialize_observation`` renders *one full frame* of the UI — feeding it to a +model every turn blows the very token budget that module was built to respect, and forces the +model to re-read the whole screen to spot the one new dialog. ``element_diff`` gives the +stable-ID correspondence between two frames but stops at matched/added/removed *element pairs* +— it does not render a compact, indexed, budget-capped delta the model can act on. + +This is the missing serializer: it diffs the previous and current observation, classifies each +matched element as *changed* (role / name / enabled / moved) or *stable*, and renders only the +churn as ``+ [i] role "name"`` (appeared) / ``- role "name"`` (vanished) / ``~ [i] role "name"`` +(changed) lines — added and changed first, stable dropped — capped at ``max_lines``. The model +sees "what changed" instead of the whole screen again. + +Pure-stdlib over element dicts; reuses ``element_diff.match_elements`` for the overlap join and +``observation.observation_index`` for reading-order indexing. Imports no ``PySide6``. +""" +from typing import Any, Dict, List, Optional, Sequence + +Element = Dict[str, Any] +_CHANGE_KEYS = ("role", "name", "enabled", "value") + + +def _center(element: Element) -> List[int]: + return [int(element.get("x", 0)) + int(element.get("width", 0)) // 2, + int(element.get("y", 0)) + int(element.get("height", 0)) // 2] + + +def _changed_fields(before: Element, after: Element, move_threshold: int) -> List[str]: + """Return the attribute names that differ between two matched elements.""" + fields = [key for key in _CHANGE_KEYS if before.get(key) != after.get(key)] + bx, by = _center(before) + ax, ay = _center(after) + if abs(ax - bx) > move_threshold or abs(ay - by) > move_threshold: + fields.append("moved") + return fields + + +def delta_index(prev: Sequence[Element], curr: Sequence[Element], *, + iou_threshold: float = 0.5, + move_threshold: int = 5) -> Dict[str, List[Any]]: + """Diff two element lists into ``{added, removed, changed, stable}``. + + ``changed`` items are ``{"after", "fields"}`` (the differing attribute names); + ``added`` / ``removed`` / ``stable`` are element lists. Matching is by IoU overlap. + """ + from je_auto_control.utils.element_diff import match_elements + result = match_elements(list(prev), list(curr), iou_threshold=float(iou_threshold)) + changed, stable = [], [] + for pair in result["matched"]: + fields = _changed_fields(pair["before"], pair["after"], int(move_threshold)) + if fields: + changed.append({"after": pair["after"], "fields": fields}) + else: + stable.append(pair["after"]) + return {"added": result["added"], "removed": result["removed"], + "changed": changed, "stable": stable} + + +def _line(prefix: str, element: Element, suffix: str = "") -> str: + cx, cy = _center(element) + index = element.get("index") + marker = f"[{index}] " if index is not None else "" + return (f'{prefix} {marker}{element.get("role", "element")} ' + f'"{element.get("name", "")}" @({cx},{cy}){suffix}') + + +def summarize_delta(delta: Dict[str, List[Any]], *, max_lines: int = 40) -> str: + """Render a ``delta_index`` result as budget-capped ``+`` / ``~`` / ``-`` lines. + + Added and changed elements come first (most actionable), then removed; stable + elements are omitted. Overflow past ``max_lines`` is summarised as ``… (+N more)``. + """ + lines = [_line("+", element) for element in delta["added"]] + lines += [_line("~", item["after"], f' ({"/".join(item["fields"])})') + for item in delta["changed"]] + lines += [_line("-", element) for element in delta["removed"]] + if len(lines) > max_lines: + hidden = len(lines) - max_lines + lines = lines[:max_lines] + [f"… (+{hidden} more)"] + return "\n".join(lines) + + +def delta_observation(prev: Sequence[Element], curr: Sequence[Element], *, + viewport: Optional[Sequence[int]] = None, + max_elements: int = 80, interactive_only: bool = True, + iou_threshold: float = 0.5, move_threshold: int = 5, + max_lines: int = 40) -> str: + """Index both frames, diff them and render the budget-capped change summary.""" + from je_auto_control.utils.observation import observation_index + prev_idx = observation_index(prev, viewport=viewport, max_elements=max_elements, + interactive_only=interactive_only) + curr_idx = observation_index(curr, viewport=viewport, max_elements=max_elements, + interactive_only=interactive_only) + delta = delta_index(prev_idx, curr_idx, iou_threshold=iou_threshold, + move_threshold=move_threshold) + return summarize_delta(delta, max_lines=max_lines) diff --git a/je_auto_control/utils/postcondition/__init__.py b/je_auto_control/utils/postcondition/__init__.py new file mode 100644 index 00000000..9f298068 --- /dev/null +++ b/je_auto_control/utils/postcondition/__init__.py @@ -0,0 +1,6 @@ +"""Declarative expected-outcome specs for an action, checked against the screen.""" +from je_auto_control.utils.postcondition.postcondition import ( + PostconditionReport, check_postcondition, compile_postcondition, +) + +__all__ = ["PostconditionReport", "check_postcondition", "compile_postcondition"] diff --git a/je_auto_control/utils/postcondition/postcondition.py b/je_auto_control/utils/postcondition/postcondition.py new file mode 100644 index 00000000..0a1e469b --- /dev/null +++ b/je_auto_control/utils/postcondition/postcondition.py @@ -0,0 +1,132 @@ +"""Declarative expected-outcome specs for an action, checked against the screen. + +After an action an agent (or a replay harness) usually has a concrete expectation: "a dialog +saying 'Saved' should appear AND the Submit button should disable". ``expect_poll`` / +``assert_eventually`` poll a *single condition* but have no notion of an action-bound +*postcondition spec*, and they don't diff against a *before* baseline (so they cannot express +"a NEW dialog appeared" — only "a dialog exists"). ``trajectory_eval`` rubrics are +whole-trajectory, not per-step screen state. ``postcondition`` fills the gap: a small JSON spec +of clauses — ``appears`` / ``disappears`` / ``enabled`` / ``disabled`` / ``text_present`` / +``text_absent`` / ``count`` — evaluated against the after-observation (optionally diffed against +the before-observation), returning a per-clause pass/fail report. + +Pure-stdlib over element dicts; deterministic and unit-testable with no device. The spec is +plain JSON so it rides into action files / MCP / the scheduler. Imports no ``PySide6``. +""" +from dataclasses import asdict, dataclass +from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple + +Element = Dict[str, Any] +Report = Tuple[bool, str] + + +@dataclass(frozen=True) +class PostconditionReport: + """The result of evaluating a postcondition spec: overall ok + per-clause detail.""" + + ok: bool + clauses: List[Dict[str, Any]] + failed: List[str] + + def to_dict(self) -> Dict[str, Any]: + """Return the report as a plain dict.""" + return asdict(self) + + +def _matches(element: Element, criteria: Dict[str, Any]) -> bool: + """Whether an element matches a ``{role?, name?, name_contains?}`` criteria dict.""" + if "role" in criteria and element.get("role") != criteria["role"]: + return False + if "name" in criteria and element.get("name") != criteria["name"]: + return False + contains = criteria.get("name_contains") + return not contains or contains in str(element.get("name", "")) + + +def _appears(after: Sequence[Element], before: Optional[Sequence[Element]], + param: Dict[str, Any]) -> Report: + in_after = any(_matches(e, param) for e in after) + in_before = before is not None and any(_matches(e, param) for e in before) + ok = in_after and not in_before + return ok, "appeared" if ok else "not a new appearance" + + +def _disappears(after: Sequence[Element], before: Optional[Sequence[Element]], + param: Dict[str, Any]) -> Report: + if before is None: + return False, "needs a before frame" + ok = any(_matches(e, param) for e in before) and \ + not any(_matches(e, param) for e in after) + return ok, "disappeared" if ok else "still present or never there" + + +def _enabled_state(after: Sequence[Element], param: Dict[str, Any], + want: bool) -> Report: + for element in after: + if _matches(element, param): + return bool(element.get("enabled", True)) == want, \ + f"enabled={element.get('enabled', True)}" + return False, "element not found" + + +def _enabled(after, before, param): + return _enabled_state(after, param, True) + + +def _disabled(after, before, param): + return _enabled_state(after, param, False) + + +def _text(after: Sequence[Element], text: Any, want_present: bool) -> Report: + found = any(str(text) in str(e.get("name", "")) for e in after) + return found == want_present, "present" if found else "absent" + + +def _text_present(after, before, param): + return _text(after, param, True) + + +def _text_absent(after, before, param): + return _text(after, param, False) + + +def _count(after, before, param: Dict[str, Any]) -> Report: + number = sum(1 for e in after if _matches(e, param.get("match", {}))) + if "equals" in param: + return number == int(param["equals"]), f"count={number}" + if "min" in param: + return number >= int(param["min"]), f"count={number}" + return False, "count clause needs 'equals' or 'min'" + + +_CLAUSES: Dict[str, Callable[[Sequence[Element], Optional[Sequence[Element]], + Dict[str, Any]], Report]] = { + "appears": _appears, "disappears": _disappears, + "enabled": _enabled, "disabled": _disabled, + "text_present": _text_present, "text_absent": _text_absent, "count": _count, +} + + +def check_postcondition(after: Sequence[Element], spec: Dict[str, Any], *, + before: Optional[Sequence[Element]] = None + ) -> PostconditionReport: + """Evaluate a postcondition ``spec`` against the after-frame (optional before-frame).""" + after_list = list(after) + before_list = list(before) if before is not None else None + clauses: List[Dict[str, Any]] = [] + for key, param in spec.items(): + checker = _CLAUSES.get(key) + if checker is None: + clauses.append({"type": key, "ok": False, "detail": "unknown clause"}) + continue + ok, detail = checker(after_list, before_list, param) + clauses.append({"type": key, "ok": ok, "detail": detail}) + failed = [clause["type"] for clause in clauses if not clause["ok"]] + return PostconditionReport(ok=not failed, clauses=clauses, failed=failed) + + +def compile_postcondition(spec: Dict[str, Any]) -> Callable[[Sequence[Element]], bool]: + """Return a predicate ``after -> bool`` for the spec (for use with ``expect_poll``).""" + def predicate(after: Sequence[Element]) -> bool: + return check_postcondition(after, spec).ok + return predicate diff --git a/je_auto_control/utils/step_repair/__init__.py b/je_auto_control/utils/step_repair/__init__.py new file mode 100644 index 00000000..09875ea7 --- /dev/null +++ b/je_auto_control/utils/step_repair/__init__.py @@ -0,0 +1,9 @@ +"""Repair-tactic policy for failed / no-effect actions (self-correction loop).""" +from je_auto_control.utils.step_repair.step_repair import ( + RepairOutcome, RepairPolicy, next_tactic, plan_repair, run_with_repair, +) + +__all__ = [ + "RepairPolicy", "RepairOutcome", + "plan_repair", "next_tactic", "run_with_repair", +] diff --git a/je_auto_control/utils/step_repair/step_repair.py b/je_auto_control/utils/step_repair/step_repair.py new file mode 100644 index 00000000..66d06b49 --- /dev/null +++ b/je_auto_control/utils/step_repair/step_repair.py @@ -0,0 +1,105 @@ +"""Repair-tactic policy for failed / no-effect actions (self-correction loop). + +When an action does nothing or lands wrong, the agent needs a *policy* for what to try next — +re-locate and retry, nudge the coordinate, scroll the target into view, wait and retry, or give +up and escalate. ``self_healing`` / ``locator_repair`` only repair a locator that *did not +resolve* (element not found); they do nothing when the element was found and clicked but had no +effect. ``loop_guard`` only *detects* a stuck loop — it has no tactic selection or backoff. +``step_repair`` is that missing controller: it consumes an effect verdict (e.g. from +``action_effect``) and drives a bounded retry loop, choosing the next untried tactic each round. + +Pure-stdlib state machine; every side effect — performing the action, verifying it, applying a +tactic, sleeping — is an injected callable, so the loop is fully deterministic and +unit-testable with no device. Imports no ``PySide6``. +""" +from dataclasses import asdict, dataclass, field +from typing import Any, Callable, Dict, List, Optional, Tuple + +_DEFAULT_TACTICS = ("wait_retry", "relocate", "nudge", "scroll_into_view", "escalate") + +# which tactics make sense for a given effect verdict, best-first +_VERDICT_TACTICS: Dict[str, Tuple[str, ...]] = { + "no_op": ("wait_retry", "relocate", "nudge", "scroll_into_view"), + "changed_elsewhere": ("escalate",), + "changed": ("wait_retry", "relocate"), +} + + +@dataclass(frozen=True) +class RepairPolicy: + """How hard to try: a cap on attempts and the allowed tactics, in priority order.""" + + max_attempts: int = 3 + tactics: Tuple[str, ...] = _DEFAULT_TACTICS + + +@dataclass(frozen=True) +class RepairOutcome: + """The result of a repair loop: success, attempt count, tactics used, detail.""" + + ok: bool + attempts: int + tactics_used: List[str] = field(default_factory=list) + detail: str = "" + + def to_dict(self) -> Dict[str, Any]: + """Return the outcome as a plain dict.""" + return asdict(self) + + +def _effect_of(verdict: Any) -> str: + """Extract the effect string from a dict / EffectVerdict / bare string.""" + if isinstance(verdict, dict): + return str(verdict.get("effect", "no_op")) + return str(getattr(verdict, "effect", verdict)) + + +def plan_repair(verdict: Any, *, policy: Optional[RepairPolicy] = None) -> List[str]: + """Return the ordered repair tactics to try for ``verdict``, capped at ``max_attempts``.""" + policy = policy or RepairPolicy() + preferred = _VERDICT_TACTICS.get(_effect_of(verdict), policy.tactics) + ordered = [tactic for tactic in preferred if tactic in policy.tactics] + return (ordered or list(policy.tactics))[:int(policy.max_attempts)] + + +def next_tactic(verdict: Any, used: List[str], *, + policy: Optional[RepairPolicy] = None) -> Optional[str]: + """Return the next untried tactic for ``verdict``, or ``None`` when exhausted.""" + for tactic in plan_repair(verdict, policy=policy): + if tactic not in used: + return tactic + return None + + +def run_with_repair(act: Callable[[], Any], verify: Callable[[], bool], *, + policy: Optional[RepairPolicy] = None, + apply_tactic: Optional[Callable[[str], Any]] = None, + verdict_for: Optional[Callable[[], Any]] = None, + sleep: Optional[Callable[[float], Any]] = None) -> RepairOutcome: + """Run ``act`` then ``verify``; on failure apply repair tactics until ok or exhausted. + + Every effect is injected: ``act`` performs the action, ``verify`` returns success, + ``apply_tactic`` mutates state for a named tactic, ``verdict_for`` supplies the current + effect verdict, ``sleep`` backs off. Returns a :class:`RepairOutcome`. + """ + policy = policy or RepairPolicy() + sleeper = sleep or (lambda _seconds: None) + act() + if verify(): + return RepairOutcome(True, 1, [], "ok on first try") + used: List[str] = [] + while len(used) < int(policy.max_attempts): + tactic = next_tactic(verdict_for() if verdict_for else "no_op", used, + policy=policy) + if tactic is None: + break + used.append(tactic) + if apply_tactic is not None: + apply_tactic(tactic) + act() + if verify(): + return RepairOutcome(True, len(used) + 1, list(used), + f"recovered via {tactic}") + sleeper(0) + return RepairOutcome(False, len(used) + 1, list(used), + "exhausted repair tactics") diff --git a/je_auto_control/utils/table_grid_fill/__init__.py b/je_auto_control/utils/table_grid_fill/__init__.py new file mode 100644 index 00000000..7d5d42b8 --- /dev/null +++ b/je_auto_control/utils/table_grid_fill/__init__.py @@ -0,0 +1,9 @@ +"""Fill a ruling-line grid with OCR text to get an addressable table.""" +from je_auto_control.utils.table_grid_fill.table_grid_fill import ( + assign_text_to_grid, populate_table, table_to_csv, table_to_records, +) + +__all__ = [ + "assign_text_to_grid", "populate_table", + "table_to_records", "table_to_csv", +] diff --git a/je_auto_control/utils/table_grid_fill/table_grid_fill.py b/je_auto_control/utils/table_grid_fill/table_grid_fill.py new file mode 100644 index 00000000..d5ca4534 --- /dev/null +++ b/je_auto_control/utils/table_grid_fill/table_grid_fill.py @@ -0,0 +1,132 @@ +"""Fill a ruling-line grid with OCR text to get an addressable table. + +``edge_lines.find_grid`` recovers a bordered table's geometry — ``{rows: [y…], +cols: [x…], cells: […]}`` — but the cells come back *empty* (pure rectangles from the +ruling lines). ``ocr`` / OCR word boxes give the text but no table structure. Nothing +joined the two, so reading a bordered table meant hand-rolling the box→cell assignment. + +This drops OCR text boxes into the grid: each box is assigned to the cell its centre +falls in (gated by an overlap fraction so a box straddling a thin rule is not double +counted), text within a cell is concatenated in reading order, and boxes that span +multiple cells are reported separately. The result is an ``R x C`` text table that +converts straight to records / CSV. + +Pure-stdlib geometry over plain dicts (the grid + the boxes); fully unit-testable with +no image, no OCR engine, no device. Imports no ``PySide6``. +""" +import csv +import io +from typing import Any, Dict, List, Optional, Sequence, Tuple + +Box = Dict[str, Any] +Bounds = Tuple[int, int, int, int] + + +def _box_bounds(box: Box) -> Bounds: + """Return ``(left, top, right, bottom)`` from an ``x/y/w/h`` or ``l/t/r/b`` box.""" + if "width" in box and "height" in box: + left, top = int(box["x"]), int(box["y"]) + return left, top, left + int(box["width"]), top + int(box["height"]) + if {"left", "top", "right", "bottom"} <= box.keys(): + return int(box["left"]), int(box["top"]), int(box["right"]), int(box["bottom"]) + raise ValueError("box needs x/y/width/height or left/top/right/bottom") + + +def _intervals(edges: Sequence[int]) -> List[Tuple[int, int]]: + """Turn sorted edge coordinates into consecutive ``(start, end)`` spans.""" + ordered = sorted(int(e) for e in edges) + return [(ordered[i], ordered[i + 1]) for i in range(len(ordered) - 1)] + + +def _index_of(value: float, spans: Sequence[Tuple[int, int]]) -> Optional[int]: + """Return the index of the span containing ``value``, else ``None``.""" + for i, (start, end) in enumerate(spans): + if start <= value < end: + return i + return None + + +def _overlap_fraction(bounds: Bounds, cell: Tuple[Tuple[int, int], Tuple[int, int]]) -> float: + """Intersection area of a box and a cell, divided by the box area.""" + (left, top, right, bottom) = bounds + (cx0, cx1), (cy0, cy1) = cell + inter = max(0, min(right, cx1) - max(left, cx0)) * max(0, min(bottom, cy1) - max(top, cy0)) + area = max(1, (right - left) * (bottom - top)) + return inter / area + + +def _grid_spans(grid: Dict[str, Any]) -> Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]: + """Return ``(column_spans, row_spans)`` from a grid's ``cols`` / ``rows`` edges.""" + return _intervals(grid.get("cols", [])), _intervals(grid.get("rows", [])) + + +def _placed(box: Box, col_spans, row_spans, overlap: float): + """Return ``(row, col)`` for a box, or ``None`` if it misses every cell.""" + left, top, right, bottom = _box_bounds(box) + col = _index_of((left + right) / 2, col_spans) + row = _index_of((top + bottom) / 2, row_spans) + if row is None or col is None: + return None + cell = (col_spans[col], row_spans[row]) + if _overlap_fraction((left, top, right, bottom), cell) < overlap: + return None + return row, col + + +def assign_text_to_grid(grid: Dict[str, Any], text_boxes: Sequence[Box], *, + overlap: float = 0.4) -> List[List[str]]: + """Return an ``R x C`` table of cell text from a grid + OCR boxes (reading order).""" + col_spans, row_spans = _grid_spans(grid) + buckets: Dict[Tuple[int, int], List[Box]] = {} + for box in text_boxes: + placed = _placed(box, col_spans, row_spans, float(overlap)) + if placed is not None: + buckets.setdefault(placed, []).append(box) + table: List[List[str]] = [] + for row in range(len(row_spans)): + cells = [] + for col in range(len(col_spans)): + ordered = sorted(buckets.get((row, col), []), key=_box_bounds) + cells.append(" ".join(str(b.get("text", "")) for b in ordered).strip()) + table.append(cells) + return table + + +def _spans(grid: Dict[str, Any], text_boxes: Sequence[Box]) -> List[Dict[str, Any]]: + """Return boxes that straddle more than one cell (merged-cell candidates).""" + col_spans, row_spans = _grid_spans(grid) + found: List[Dict[str, Any]] = [] + for box in text_boxes: + left, top, right, bottom = _box_bounds(box) + c0, c1 = _index_of(left, col_spans), _index_of(right - 1, col_spans) + r0, r1 = _index_of(top, row_spans), _index_of(bottom - 1, row_spans) + if None in (c0, c1, r0, r1) or (c0 == c1 and r0 == r1): + continue + found.append({"row": r0, "col": c0, "row_span": r1 - r0 + 1, + "col_span": c1 - c0 + 1, "text": str(box.get("text", ""))}) + return found + + +def populate_table(grid: Dict[str, Any], text_boxes: Sequence[Box], *, + overlap: float = 0.4) -> Dict[str, Any]: + """Fill ``grid`` with ``text_boxes`` → ``{n_rows, n_cols, cells, spans}``.""" + table = assign_text_to_grid(grid, text_boxes, overlap=overlap) + cells = [{"row": r, "col": c, "text": table[r][c]} + for r in range(len(table)) for c in range(len(table[r]))] + return {"n_rows": len(table), "n_cols": len(table[0]) if table else 0, + "cells": cells, "spans": _spans(grid, text_boxes)} + + +def table_to_records(rows: Sequence[Sequence[str]]) -> List[Dict[str, str]]: + """Use the first row as headers; return the remaining rows as dicts.""" + if not rows: + return [] + header = list(rows[0]) + return [dict(zip(header, row)) for row in rows[1:]] + + +def table_to_csv(rows: Sequence[Sequence[str]]) -> str: + """Render a 2-D text table as a CSV string.""" + buffer = io.StringIO() + csv.writer(buffer).writerows(rows) + return buffer.getvalue() diff --git a/je_auto_control/utils/visual_match/visual_match.py b/je_auto_control/utils/visual_match/visual_match.py index a4919c75..c58bfada 100644 --- a/je_auto_control/utils/visual_match/visual_match.py +++ b/je_auto_control/utils/visual_match/visual_match.py @@ -92,6 +92,27 @@ def _resize(template, scale: float): return cv2.resize(template, new_size) +def _score_map(template: ImageSource, haystack: Optional[ImageSource] = None, *, + region: Optional[Sequence[int]] = None, + method: str = "ccoeff_normed", scale: float = 1.0): + """Return ``(full correlation score map, scaled gray template)``. + + The map is oriented so higher = better for every method (``sqdiff_normed`` + is inverted). Returns ``(None, template)`` when the template is larger than + the haystack at this scale. This exposes the whole ``matchTemplate`` surface + that the public matchers discard, for trust / threshold / sub-pixel analysis. + """ + import cv2 + tmpl = _resize(_to_gray(template), float(scale)) + hay = _haystack_gray(haystack, region) + if tmpl.shape[0] > hay.shape[0] or tmpl.shape[1] > hay.shape[1]: + return None, tmpl + result = cv2.matchTemplate(hay, tmpl, _method(method)) + if method == "sqdiff_normed": + result = 1.0 - result + return result, tmpl + + def match_template(template: ImageSource, *, haystack: Optional[ImageSource] = None, region: Optional[Sequence[int]] = None, scales: Sequence[float] = (1.0,), min_score: float = 0.8, diff --git a/test/unit_test/headless/test_action_effect_batch.py b/test/unit_test/headless/test_action_effect_batch.py new file mode 100644 index 00000000..cf691b73 --- /dev/null +++ b/test/unit_test/headless/test_action_effect_batch.py @@ -0,0 +1,72 @@ +"""Headless tests for action-effect classification (pure stdlib).""" +import je_auto_control as ac +from je_auto_control.utils.action_effect import ( + classify_effect, effect_near_point, is_no_op, +) + + +def _el(x, y, name="", role="button"): + return dict(x=x, y=y, width=40, height=20, role=role, name=name) + + +def test_no_op_when_nothing_changes(): + frame = [_el(0, 0, "A"), _el(0, 40, "B")] + verdict = classify_effect(frame, list(frame), {"x": 10, "y": 10}) + assert verdict.effect == "no_op" + assert is_no_op(frame, list(frame)) is True + + +def test_changed_near_target(): + before = [_el(0, 0, "A")] + after = [_el(0, 0, "A"), _el(40, 40, "Popup")] # new element near (50,50) + verdict = classify_effect(before, after, {"x": 50, "y": 50}, radius=64) + assert verdict.effect == "changed_near_target" + assert verdict.changed_near_target is True + assert verdict.changed_count == 1 + + +def test_changed_elsewhere(): + before = [_el(0, 0, "A")] + after = [_el(0, 0, "A"), _el(500, 500, "FarDialog")] + verdict = classify_effect(before, after, {"x": 10, "y": 10}, radius=64) + assert verdict.effect == "changed_elsewhere" + assert verdict.changed_near_target is False + + +def test_changed_without_target_point(): + before = [_el(0, 0, "A")] + after = [_el(0, 0, "A"), _el(80, 80, "New")] + verdict = classify_effect(before, after, {"type": "key", "keys": "enter"}) + assert verdict.effect == "changed" + + +def test_effect_near_point_helper(): + before = [_el(0, 0, "A")] + after = [_el(0, 0, "A"), _el(40, 40, "X")] + assert effect_near_point(before, after, [50, 50], radius=64) is True + assert effect_near_point(before, after, [500, 500], radius=64) is False + + +def test_rename_counts_as_change(): + before = [_el(0, 0, "Submit")] + after = [_el(0, 0, "Submitting")] + assert is_no_op(before, after) is False + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + known = set(ac.executor.known_commands()) + assert {"AC_classify_effect", "AC_effect_near_point"} <= known + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert {"ac_classify_effect", "ac_effect_near_point"} <= names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert {"AC_classify_effect", "AC_effect_near_point"} <= specs + + +def test_facade_exports(): + for name in ("classify_effect", "effect_near_point", "is_no_op", + "EffectVerdict"): + assert hasattr(ac, name) and name in ac.__all__ diff --git a/test/unit_test/headless/test_column_layout_batch.py b/test/unit_test/headless/test_column_layout_batch.py new file mode 100644 index 00000000..c704fcfd --- /dev/null +++ b/test/unit_test/headless/test_column_layout_batch.py @@ -0,0 +1,69 @@ +"""Headless tests for whitespace-projection column inference (pure stdlib).""" +import je_auto_control as ac +from je_auto_control.utils.column_layout import ( + assign_columns, column_gutters, detect_borderless_table, vertical_projection, +) + + +def _box(x, y, text, w=60, h=18): + return {"x": x, "y": y, "width": w, "height": h, "text": text} + + +def _two_column_three_rows(): + # column 1 spans x[10,70], column 2 spans x[120,180]; gutter x[70,120] + return [_box(10, 0, "Name"), _box(120, 0, "Age"), + _box(10, 30, "Ann"), _box(120, 30, "30"), + _box(10, 60, "Bob"), _box(120, 60, "25")] + + +def test_vertical_projection_has_a_zero_gutter(): + profile = vertical_projection(_two_column_three_rows()) + assert profile[40] > 0 and profile[150] > 0 # inside the two columns + assert profile[95] == 0 # the gutter band is empty + + +def test_column_gutters_finds_the_interior_band(): + gutters = column_gutters(_two_column_three_rows()) + assert len(gutters) == 1 + assert gutters[0]["start"] == 70 and gutters[0]["end"] == 120 + + +def test_assign_columns_tags_each_box(): + tagged = {b["text"]: b["column"] for b in assign_columns(_two_column_three_rows())} + assert tagged["Name"] == 0 and tagged["Ann"] == 0 + assert tagged["Age"] == 1 and tagged["30"] == 1 + + +def test_detect_borderless_table(): + table = detect_borderless_table(_two_column_three_rows()) + assert table is not None + assert table["n_rows"] == 3 and table["n_cols"] == 2 + assert table["rows"] == [["Name", "Age"], ["Ann", "30"], ["Bob", "25"]] + + +def test_single_column_is_not_a_table(): + boxes = [_box(10, 0, "A"), _box(10, 30, "B"), _box(10, 60, "C")] + assert detect_borderless_table(boxes) is None + + +def test_empty_returns_none(): + assert detect_borderless_table([]) is None + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + known = set(ac.executor.known_commands()) + assert {"AC_detect_borderless_table", "AC_column_gutters"} <= known + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert {"ac_detect_borderless_table", "ac_column_gutters"} <= names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert {"AC_detect_borderless_table", "AC_column_gutters"} <= specs + + +def test_facade_exports(): + for name in ("vertical_projection", "column_gutters", "assign_columns", + "detect_borderless_table"): + assert hasattr(ac, name) and name in ac.__all__ diff --git a/test/unit_test/headless/test_edge_match_batch.py b/test/unit_test/headless/test_edge_match_batch.py new file mode 100644 index 00000000..0e7d3f23 --- /dev/null +++ b/test/unit_test/headless/test_edge_match_batch.py @@ -0,0 +1,66 @@ +"""Headless tests for edge-shape (Chamfer) template matching.""" +import pytest + +import je_auto_control as ac + +np = pytest.importorskip("numpy") +cv2 = pytest.importorskip("cv2") + +from je_auto_control.utils.edge_match import ( # noqa: E402 + chamfer_distance, edge_match, edge_match_all, +) + + +def _scene(): + hay = np.zeros((200, 200), dtype=np.uint8) + cv2.rectangle(hay, (50, 50), (110, 90), 255, 3) + return hay + + +def _template_different_fill(): + # same outline as the scene rectangle, but drawn at a different grey level + tmpl = np.zeros((50, 70), dtype=np.uint8) + cv2.rectangle(tmpl, (5, 5), (65, 45), 150, 3) + return tmpl + + +def test_finds_shape_despite_different_fill(): + match = edge_match(_template_different_fill(), haystack=_scene(), min_score=0.5) + assert match is not None + assert match.score >= 0.9 # edges align even at a different grey + assert abs(match.x - 45) <= 2 and abs(match.y - 45) <= 2 + + +def test_chamfer_distance_near_zero_on_alignment(): + dist = chamfer_distance(_template_different_fill(), haystack=_scene()) + assert dist < 1.0 # ~0 = edges coincide + + +def test_absent_returns_none(): + blank = np.zeros((200, 200), dtype=np.uint8) + assert edge_match(_template_different_fill(), haystack=blank, + min_score=0.5) is None + + +def test_edge_match_all_dedupes(): + hits = edge_match_all(_template_different_fill(), haystack=_scene(), + min_score=0.85) + assert len(hits) == 1 + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + known = set(ac.executor.known_commands()) + assert {"AC_edge_match", "AC_edge_match_all"} <= known + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert {"ac_edge_match", "ac_edge_match_all"} <= names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert {"AC_edge_match", "AC_edge_match_all"} <= specs + + +def test_facade_exports(): + for name in ("edge_match", "edge_match_all", "chamfer_distance"): + assert hasattr(ac, name) and name in ac.__all__ diff --git a/test/unit_test/headless/test_form_fields_batch.py b/test/unit_test/headless/test_form_fields_batch.py new file mode 100644 index 00000000..b8726498 --- /dev/null +++ b/test/unit_test/headless/test_form_fields_batch.py @@ -0,0 +1,71 @@ +"""Headless tests for form label/value association + checkbox state.""" +import pytest + +import je_auto_control as ac +from je_auto_control.utils.form_fields import ( + associate_fields, checkbox_state, match_labels_to_widgets, +) + + +def _box(x, y, text, w=60, h=20): + return {"x": x, "y": y, "width": w, "height": h, "text": text} + + +def test_label_to_value_on_the_right(): + boxes = [_box(0, 0, "Name:"), _box(80, 0, "Ann")] + fields = associate_fields(boxes) + assert len(fields) == 1 + assert fields[0]["label"] == "Name" + assert fields[0]["value"] == "Ann" and fields[0]["direction"] == "right" + + +def test_label_above_value_below(): + # value directly below the label, nothing to the right + boxes = [_box(0, 0, "Email:"), _box(0, 40, "a@b.com")] + fields = associate_fields(boxes, directions=("right", "below")) + assert fields[0]["value"] == "a@b.com" and fields[0]["direction"] == "below" + + +def test_nearest_value_wins_and_gap_within_max(): + boxes = [_box(0, 0, "City:"), _box(70, 0, "Taipei"), _box(400, 0, "Far")] + fields = associate_fields(boxes, max_gap=150) + assert fields[0]["value"] == "Taipei" + + +def test_value_beyond_max_gap_unmatched(): + boxes = [_box(0, 0, "Lonely:"), _box(900, 0, "TooFar")] + assert associate_fields(boxes, max_gap=150) == [] + + +def test_match_labels_to_widgets_nearest(): + labels = [_box(0, 0, "Accept"), _box(0, 100, "Decline")] + widgets = [_box(120, 0, "", w=16, h=16)] + pairs = match_labels_to_widgets(labels, widgets) + assert pairs[0]["label"] == "Accept" + + +def test_checkbox_state_from_fill(): + np = pytest.importorskip("numpy") + image = np.full((40, 40), 255, dtype=np.uint8) # white + image[5:35, 5:35] = 0 # dark fill inside the box + box = {"x": 0, "y": 0, "width": 40, "height": 40} + assert checkbox_state(image, box) == "checked" + assert checkbox_state(np.full((40, 40), 255, np.uint8), box) == "unchecked" + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + known = set(ac.executor.known_commands()) + assert {"AC_associate_fields", "AC_match_labels_to_widgets"} <= known + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert {"ac_associate_fields", "ac_match_labels_to_widgets"} <= names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert {"AC_associate_fields", "AC_match_labels_to_widgets"} <= specs + + +def test_facade_exports(): + for name in ("associate_fields", "match_labels_to_widgets", "checkbox_state"): + assert hasattr(ac, name) and name in ac.__all__ diff --git a/test/unit_test/headless/test_match_autothresh_batch.py b/test/unit_test/headless/test_match_autothresh_batch.py new file mode 100644 index 00000000..ffd852fd --- /dev/null +++ b/test/unit_test/headless/test_match_autothresh_batch.py @@ -0,0 +1,76 @@ +"""Headless tests for Otsu auto-thresholded template matching.""" +import pytest + +import je_auto_control as ac + +np = pytest.importorskip("numpy") +pytest.importorskip("cv2") + +from je_auto_control.utils.match_autothresh import ( # noqa: E402 + auto_threshold, match_auto, +) + + +def _template(): + tmpl = np.zeros((24, 24), dtype=np.uint8) + tmpl[:, :12] = 200 + return tmpl + + +def _haystack(*tops_lefts): + hay = np.zeros((160, 220), dtype=np.uint8) + tmpl = _template() + for top, left in tops_lefts: + hay[top:top + 24, left:left + 24] = tmpl + return hay + + +def test_auto_threshold_reports_metrics(): + info = auto_threshold(_template(), haystack=_haystack((20, 30), (20, 150))) + assert set(info) == {"threshold", "separability", "n_above"} + # ccoeff_normed spans [-1, 1]; the cut-off just has to sit below a perfect + # match (its exact value depends on the OpenCV build's score distribution). + assert info["threshold"] < 1.0 + assert info["n_above"] >= 2 + # a clearly bimodal surface (matches vs background) is more separable than a + # flat one — a relative check that is stable across OpenCV builds + blank = auto_threshold(_template(), haystack=np.zeros((160, 220), np.uint8)) + assert info["separability"] > blank["separability"] + + +def test_match_auto_finds_both_occurrences(): + matches = match_auto(_template(), haystack=_haystack((20, 30), (20, 150)), + floor=0.5) + assert len(matches) == 2 + xs = sorted(m.x for m in matches) + assert abs(xs[0] - 30) <= 1 and abs(xs[1] - 150) <= 1 + + +def test_floor_prevents_noise_on_blank(): + # a blank haystack has no bimodal split; floor keeps it from matching noise + matches = match_auto(_template(), haystack=np.zeros((160, 220), np.uint8), + floor=0.6) + assert matches == [] + + +def test_blank_separability_is_low(): + info = auto_threshold(_template(), haystack=np.zeros((160, 220), np.uint8)) + assert info["separability"] < 0.3 # unimodal → do not trust the threshold + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + known = set(ac.executor.known_commands()) + assert {"AC_match_auto", "AC_auto_threshold"} <= known + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert {"ac_match_auto", "ac_auto_threshold"} <= names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert {"AC_match_auto", "AC_auto_threshold"} <= specs + + +def test_facade_exports(): + for name in ("match_auto", "auto_threshold"): + assert hasattr(ac, name) and name in ac.__all__ diff --git a/test/unit_test/headless/test_match_trust_batch.py b/test/unit_test/headless/test_match_trust_batch.py new file mode 100644 index 00000000..9e10a767 --- /dev/null +++ b/test/unit_test/headless/test_match_trust_batch.py @@ -0,0 +1,81 @@ +"""Headless tests for template-match trust scoring on synthetic arrays.""" +import pytest + +import je_auto_control as ac + +np = pytest.importorskip("numpy") +pytest.importorskip("cv2") + +from je_auto_control.utils.match_trust import ( # noqa: E402 + match_with_trust, score_peaks, +) + + +def _template(): + tmpl = np.zeros((24, 24), dtype=np.uint8) + tmpl[:, :12] = 200 # asymmetric so correlation has variance + return tmpl + + +def _haystack(*tops_lefts): + hay = np.zeros((160, 200), dtype=np.uint8) + tmpl = _template() + for top, left in tops_lefts: + hay[top:top + 24, left:left + 24] = tmpl + return hay + + +def test_single_occurrence_is_unambiguous(): + result = match_with_trust(_template(), haystack=_haystack((20, 30)), + min_score=0.8) + assert result is not None + assert result.is_ambiguous is False + assert result.peak_ratio < 0.9 + assert abs(result.x - 30) <= 1 and abs(result.y - 20) <= 1 + + +def test_duplicate_occurrence_is_ambiguous(): + # the same template twice → a near-equal second peak → flagged ambiguous + result = match_with_trust(_template(), haystack=_haystack((20, 30), (20, 140)), + min_score=0.8) + assert result is not None + assert result.is_ambiguous is True + assert result.peak_ratio >= 0.9 + assert result.second_score >= 0.9 + + +def test_score_peaks_reports_metrics(): + peaks = score_peaks(_template(), haystack=_haystack((20, 30))) + assert peaks is not None + assert set(peaks) == {"best", "second", "peak_ratio", "psr", "ambiguous", + "location"} + assert peaks["best"] >= 0.99 + assert peaks["ambiguous"] is False + + +def test_min_score_filters_out(): + assert match_with_trust(_template(), haystack=np.zeros((160, 200), np.uint8), + min_score=0.95) is None + + +def test_psr_present_for_real_peak(): + result = match_with_trust(_template(), haystack=_haystack((20, 30)), + min_score=0.8) + assert result.psr is None or result.psr > 0 + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + assert "AC_match_with_trust" in set(ac.executor.known_commands()) + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert "ac_match_with_trust" in names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert "AC_match_with_trust" in specs + + +def test_facade_exports(): + for name in ("match_with_trust", "score_peaks", "TrustedMatch"): + assert hasattr(ac, name) and name in ac.__all__ diff --git a/test/unit_test/headless/test_observation_delta_batch.py b/test/unit_test/headless/test_observation_delta_batch.py new file mode 100644 index 00000000..b01b2df3 --- /dev/null +++ b/test/unit_test/headless/test_observation_delta_batch.py @@ -0,0 +1,68 @@ +"""Headless tests for the token-budgeted observation delta (pure stdlib).""" +import je_auto_control as ac +from je_auto_control.utils.observation_delta import ( + delta_index, delta_observation, summarize_delta, +) + + +def _el(x, y, role="button", name="", **extra): + return dict(x=x, y=y, width=40, height=20, role=role, name=name, **extra) + + +def test_added_removed_changed_stable_classification(): + prev = [_el(0, 0, name="A"), _el(0, 40, name="B"), _el(0, 80, name="Spin")] + # A unchanged; B renamed; Spin gone; C new + curr = [_el(0, 0, name="A"), _el(0, 40, name="B2"), _el(0, 120, name="C")] + delta = delta_index(prev, curr) + assert [e["name"] for e in delta["added"]] == ["C"] + assert [e["name"] for e in delta["removed"]] == ["Spin"] + assert delta["changed"][0]["after"]["name"] == "B2" + assert "name" in delta["changed"][0]["fields"] + assert [e["name"] for e in delta["stable"]] == ["A"] + + +def test_moved_is_a_change(): + prev = [_el(0, 0, name="X")] + curr = [_el(50, 0, name="X")] # same identity (overlap), moved > threshold + delta = delta_index(prev, curr, iou_threshold=0.0) + assert delta["changed"] and "moved" in delta["changed"][0]["fields"] + + +def test_summarize_renders_markers_and_drops_stable(): + prev = [_el(0, 0, name="A"), _el(0, 40, name="B")] + curr = [_el(0, 0, name="A"), _el(0, 40, name="B2"), _el(0, 80, name="C")] + text = summarize_delta(delta_index(prev, curr)) + assert "+ " in text and "~ " in text # C added, B changed + assert '"A"' not in text # stable A is omitted + + +def test_max_lines_budget_truncates(): + prev = [] + curr = [_el(0, i * 25, name=f"n{i}") for i in range(10)] + text = summarize_delta(delta_index(prev, curr), max_lines=3) + assert text.count("\n") == 3 # 3 lines + the "(+N more)" line + assert "more)" in text + + +def test_delta_observation_end_to_end(): + prev = [_el(0, 0, name="Old")] + curr = [_el(0, 0, name="Old"), _el(0, 50, name="New")] + text = delta_observation(prev, curr, interactive_only=False) + assert '+ [' in text and '"New"' in text + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + assert "AC_delta_observation" in set(ac.executor.known_commands()) + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert "ac_delta_observation" in names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert "AC_delta_observation" in specs + + +def test_facade_exports(): + for name in ("delta_index", "delta_observation", "summarize_delta"): + assert hasattr(ac, name) and name in ac.__all__ diff --git a/test/unit_test/headless/test_postcondition_batch.py b/test/unit_test/headless/test_postcondition_batch.py new file mode 100644 index 00000000..4d2dfb24 --- /dev/null +++ b/test/unit_test/headless/test_postcondition_batch.py @@ -0,0 +1,80 @@ +"""Headless tests for declarative action postconditions (pure stdlib).""" +import je_auto_control as ac +from je_auto_control.utils.postcondition import ( + check_postcondition, compile_postcondition, +) + + +def _el(name, role="button", enabled=True): + return {"role": role, "name": name, "enabled": enabled, "x": 0, "y": 0, + "width": 10, "height": 10} + + +def test_new_dialog_appears_against_before(): + before = [_el("Save", role="button")] + after = [_el("Save"), _el("Saved", role="dialog")] + report = check_postcondition(after, {"appears": {"role": "dialog"}}, + before=before) + assert report.ok is True + + +def test_appears_fails_if_already_present(): + before = [_el("Saved", role="dialog")] + after = [_el("Saved", role="dialog")] + report = check_postcondition(after, {"appears": {"role": "dialog"}}, + before=before) + assert report.ok is False and "appears" in report.failed + + +def test_disabled_and_text_present_clauses(): + after = [_el("Submit", enabled=False), _el("Saved", role="dialog")] + report = check_postcondition(after, {"disabled": {"name": "Submit"}, + "text_present": "Saved"}) + assert report.ok is True + assert all(c["ok"] for c in report.clauses) + + +def test_count_clause(): + after = [_el(f"r{i}", role="row") for i in range(5)] + assert check_postcondition(after, {"count": {"match": {"role": "row"}, + "equals": 5}}).ok is True + assert check_postcondition(after, {"count": {"match": {"role": "row"}, + "min": 6}}).ok is False + + +def test_disappears_needs_before_and_works(): + before = [_el("Spinner", role="img")] + after = [_el("Done", role="dialog")] + assert check_postcondition(after, {"disappears": {"role": "img"}}, + before=before).ok is True + # without a before frame, disappears cannot be judged → fails + assert check_postcondition(after, {"disappears": {"role": "img"}}).ok is False + + +def test_unknown_clause_fails_cleanly(): + report = check_postcondition([_el("X")], {"levitates": {"name": "X"}}) + assert report.ok is False and "levitates" in report.failed + + +def test_compile_postcondition_predicate(): + predicate = compile_postcondition({"text_present": "OK"}) + assert predicate([_el("OK dialog", role="dialog")]) is True + assert predicate([_el("Nope")]) is False + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + assert "AC_check_postcondition" in set(ac.executor.known_commands()) + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert "ac_check_postcondition" in names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert "AC_check_postcondition" in specs + + +def test_facade_exports(): + for name in ("check_postcondition", "compile_postcondition", + "PostconditionReport"): + assert hasattr(ac, name) and name in ac.__all__ diff --git a/test/unit_test/headless/test_step_repair_batch.py b/test/unit_test/headless/test_step_repair_batch.py new file mode 100644 index 00000000..eadadfe9 --- /dev/null +++ b/test/unit_test/headless/test_step_repair_batch.py @@ -0,0 +1,68 @@ +"""Headless tests for the repair-tactic policy / loop (pure stdlib, injected seams).""" +import je_auto_control as ac +from je_auto_control.utils.step_repair import ( + RepairPolicy, next_tactic, plan_repair, run_with_repair, +) + + +def test_plan_repair_orders_tactics_for_no_op(): + plan = plan_repair("no_op", policy=RepairPolicy(max_attempts=3)) + assert plan == ["wait_retry", "relocate", "nudge"] + + +def test_plan_repair_escalates_on_changed_elsewhere(): + assert plan_repair("changed_elsewhere") == ["escalate"] + + +def test_plan_repair_accepts_effect_verdict_dict(): + assert plan_repair({"effect": "changed_elsewhere"}) == ["escalate"] + + +def test_next_tactic_skips_used(): + assert next_tactic("no_op", ["wait_retry"]) == "relocate" + assert next_tactic("changed_elsewhere", ["escalate"]) is None + + +def test_run_with_repair_recovers_after_a_tactic(): + calls = {"act": 0} + # verify succeeds only on the 3rd act (i.e. after two repair tactics) + def act(): + calls["act"] += 1 + def verify(): + return calls["act"] >= 3 + used = [] + outcome = run_with_repair(act, verify, apply_tactic=used.append) + assert outcome.ok is True + assert outcome.attempts == 3 + assert used == ["wait_retry", "relocate"] + + +def test_run_with_repair_exhausts_and_fails(): + outcome = run_with_repair(lambda: None, lambda: False, + policy=RepairPolicy(max_attempts=2)) + assert outcome.ok is False + assert outcome.attempts == 3 and len(outcome.tactics_used) == 2 + + +def test_run_with_repair_ok_first_try(): + outcome = run_with_repair(lambda: None, lambda: True) + assert outcome.ok is True and outcome.attempts == 1 + assert outcome.tactics_used == [] + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + assert "AC_plan_repair" in set(ac.executor.known_commands()) + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert "ac_plan_repair" in names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert "AC_plan_repair" in specs + + +def test_facade_exports(): + for name in ("RepairPolicy", "RepairOutcome", "plan_repair", "next_tactic", + "run_with_repair"): + assert hasattr(ac, name) and name in ac.__all__ diff --git a/test/unit_test/headless/test_table_grid_fill_batch.py b/test/unit_test/headless/test_table_grid_fill_batch.py new file mode 100644 index 00000000..794c9a84 --- /dev/null +++ b/test/unit_test/headless/test_table_grid_fill_batch.py @@ -0,0 +1,72 @@ +"""Headless tests for filling a ruling-line grid with OCR text (pure stdlib).""" +import je_auto_control as ac +from je_auto_control.utils.table_grid_fill import ( + assign_text_to_grid, populate_table, table_to_csv, table_to_records, +) + +# a 2-row x 2-col grid: x edges 0,100,200 ; y edges 0,30,60 +GRID = {"cols": [0, 100, 200], "rows": [0, 30, 60], + "cells": [{"x": 0, "y": 0, "width": 100, "height": 30}]} + + +def _box(x, y, w, h, text): + return {"x": x, "y": y, "width": w, "height": h, "text": text} + + +def _header_grid_boxes(): + return [_box(10, 5, 60, 20, "Name"), _box(110, 5, 60, 20, "Age"), + _box(10, 35, 60, 20, "Ann"), _box(110, 35, 40, 20, "30")] + + +def test_assign_fills_cells_row_major(): + table = assign_text_to_grid(GRID, _header_grid_boxes()) + assert table == [["Name", "Age"], ["Ann", "30"]] + + +def test_words_in_one_cell_join_in_reading_order(): + boxes = [_box(60, 5, 30, 20, "Doe"), _box(10, 5, 40, 20, "John")] + table = assign_text_to_grid(GRID, boxes) + assert table[0][0] == "John Doe" # left-to-right despite input order + + +def test_box_outside_grid_is_dropped(): + table = assign_text_to_grid(GRID, [_box(500, 500, 20, 20, "nope")]) + assert table == [["", ""], ["", ""]] + + +def test_populate_table_reports_dims_and_cells(): + result = populate_table(GRID, _header_grid_boxes()) + assert result["n_rows"] == 2 and result["n_cols"] == 2 + assert {"row": 1, "col": 1, "text": "30"} in result["cells"] + assert result["spans"] == [] + + +def test_span_detection_flags_merged_cell(): + # a wide box covering both columns of row 0 + result = populate_table(GRID, [_box(10, 5, 180, 20, "Merged Header")]) + assert len(result["spans"]) == 1 + assert result["spans"][0]["col_span"] == 2 + + +def test_records_and_csv(): + table = assign_text_to_grid(GRID, _header_grid_boxes()) + assert table_to_records(table) == [{"Name": "Ann", "Age": "30"}] + assert table_to_csv(table).replace("\r\n", "\n") == "Name,Age\nAnn,30\n" + + +# --- wiring --------------------------------------------------------------- + +def test_wiring(): + assert "AC_populate_table" in set(ac.executor.known_commands()) + from je_auto_control.utils.mcp_server.tools import build_default_tool_registry + names = {t.name for t in build_default_tool_registry()} + assert "ac_populate_table" in names + from je_auto_control.gui.script_builder.command_schema import _build_specs + specs = {s.command for s in _build_specs()} + assert "AC_populate_table" in specs + + +def test_facade_exports(): + for name in ("assign_text_to_grid", "populate_table", + "table_to_records", "table_to_csv"): + assert hasattr(ac, name) and name in ac.__all__