Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions WHATS_NEW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# What's New — AutoControl

## What's new (2026-06-25) — Run-Trace Diff (what changed between two executions)

See exactly what changed between a passing run and a failing one. Full reference: [`docs/source/Eng/doc/new_features/v192_features_doc.rst`](docs/source/Eng/doc/new_features/v192_features_doc.rst).

- **`diff_runs` / `summarize_run_diff`** (`AC_diff_runs`): a run history says a run *failed* but not *what changed* from the run that passed. This aligns two step sequences with a longest-common-subsequence walk (so an inserted/removed step shifts the rest into place instead of mis-pairing everything) and classifies the differences: **added**/**removed** steps, **status_flips** (an aligned step that changed status — with the new failure's `failure_signature` when it carries an error), and **timing_regressions** (a step that got `regress_factor`× slower). `summarize_run_diff` renders a one-line summary. Pure stdlib over lists of `{name,status,duration,error}` step dicts. No `PySide6`.

## What's new (2026-06-25) — Stable Failure Signatures

Match the *same kind* of failure across runs, despite differing paths and ids. Full reference: [`docs/source/Eng/doc/new_features/v191_features_doc.rst`](docs/source/Eng/doc/new_features/v191_features_doc.rst).
Expand Down
48 changes: 48 additions & 0 deletions docs/source/Eng/doc/new_features/v192_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
Run-Trace Diff (what changed between two executions)
====================================================

A run history tells you a run *failed*, but not *what changed* from the run that
passed: which step was added or dropped, which step flipped pass→fail, which step
got slower. ``run_diff`` aligns the two step sequences with a longest-common-
subsequence walk — so an inserted or removed step shifts the rest into place
instead of mis-pairing everything — and classifies the differences:

* **added** / **removed** — steps present in only one run,
* **status_flips** — an aligned step whose status changed, with the new failure's
:func:`failure_signature` when it carries an ``error``,
* **timing_regressions** — an aligned step that got ``regress_factor`` x slower.

A step is any dict with a name key (default ``"name"``) and optional ``status`` /
``duration`` / ``error``. Pure standard library; no device, no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import diff_runs, summarize_run_diff

before = [{"name": "login", "status": "ok", "duration": 1.0},
{"name": "submit", "status": "ok", "duration": 1.0}]
after = [{"name": "login", "status": "ok", "duration": 1.1},
{"name": "accept_cookies", "status": "ok"}, # inserted
{"name": "submit", "status": "error", "error": "Timeout ..."}]

diff = diff_runs(before, after)
# {"added": [accept_cookies], "removed": [],
# "status_flips": [{"name": "submit", "from": "ok", "to": "error",
# "signature": "..."}],
# "timing_regressions": [], "aligned": 2, "identical": False}

summarize_run_diff(diff) # "+1 added, 1 status flip(s)"

``regress_factor`` (default ``1.5``) is the slowdown ratio that counts as a
regression; ``key`` selects the field steps are aligned on. ``summarize_run_diff``
renders a one-line summary (``"no change"`` when identical).

Executor commands
-----------------

``AC_diff_runs`` (``before`` / ``after`` / ``key`` / ``regress_factor``) returns
the diff plus a ``summary`` field. It is exposed as the read-only ``ac_diff_runs``
MCP tool and as a Script Builder command under **Testing**.
45 changes: 45 additions & 0 deletions docs/source/Zh/doc/new_features/v192_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
執行軌跡比較(兩次執行之間改變了什麼)
======================================

執行歷史告訴你某次執行*失敗*了,卻不告訴你相較於通過的那次*改變了什麼*:哪個步驟被加入或移除、
哪個步驟由通過翻轉成失敗、哪個步驟變慢了。``run_diff`` 以最長共同子序列(LCS)走訪對齊兩個步驟
序列——這樣插入或移除一個步驟會把其餘步驟順移到位,而非整個錯位配對——並將差異分類:

* **added** / **removed** ——只存在於其中一次執行的步驟,
* **status_flips** ——某個已對齊步驟的狀態改變,若帶有 ``error`` 則附上新失敗的
:func:`failure_signature`,
* **timing_regressions** ——某個已對齊步驟變慢了 ``regress_factor`` 倍。

步驟可為任何帶有名稱鍵(預設 ``"name"``)與選填 ``status`` / ``duration`` / ``error`` 的字典。
純標準庫;不涉及裝置,不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import diff_runs, summarize_run_diff

before = [{"name": "login", "status": "ok", "duration": 1.0},
{"name": "submit", "status": "ok", "duration": 1.0}]
after = [{"name": "login", "status": "ok", "duration": 1.1},
{"name": "accept_cookies", "status": "ok"}, # 插入
{"name": "submit", "status": "error", "error": "Timeout ..."}]

diff = diff_runs(before, after)
# {"added": [accept_cookies], "removed": [],
# "status_flips": [{"name": "submit", "from": "ok", "to": "error",
# "signature": "..."}],
# "timing_regressions": [], "aligned": 2, "identical": False}

summarize_run_diff(diff) # "+1 added, 1 status flip(s)"

``regress_factor``(預設 ``1.5``)是算作退化的變慢比率;``key`` 選擇步驟對齊所依據的欄位。
``summarize_run_diff`` 產生一行摘要(相同時為 ``"no change"``)。

執行器指令
----------

``AC_diff_runs``(``before`` / ``after`` / ``key`` / ``regress_factor``)回傳該差異並附帶
``summary`` 欄位。以唯讀 ``ac_diff_runs`` MCP 工具及 Script Builder 指令(位於 **Testing**
分類下)形式提供。
3 changes: 3 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@
from je_auto_control.utils.failure_signature import (
failure_signature, group_failures, normalize_error,
)
# Run-trace diff (LCS-aligned: added/removed steps, status flips, regressions)
from je_auto_control.utils.run_diff import diff_runs, summarize_run_diff
# VLM element locator (headless)
from je_auto_control.utils.vision import (
VLMNotAvailableError, click_by_description, locate_by_description,
Expand Down Expand Up @@ -1670,6 +1672,7 @@ def start_autocontrol_gui(*args, **kwargs):
"detect_scale", "scale_sweep",
"saliency_map", "salient_regions", "most_salient",
"normalize_error", "failure_signature", "group_failures",
"diff_runs", "summarize_run_diff",
# VLM locator
"VLMNotAvailableError", "locate_by_description", "click_by_description",
"verify_description",
Expand Down
13 changes: 13 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -2723,6 +2723,19 @@ def _add_audit_specs(specs: List[CommandSpec]) -> None:
placeholder='["err one", "err two"]'),),
description="Group error messages by failure signature (most frequent).",
))
specs.append(CommandSpec(
"AC_diff_runs", "Testing", "Diff Two Run Traces",
fields=(
FieldSpec("before", FieldType.STRING,
placeholder='[{"name": "login", "status": "ok"}]'),
FieldSpec("after", FieldType.STRING,
placeholder='[{"name": "login", "status": "error"}]'),
FieldSpec("key", FieldType.STRING, optional=True, default="name"),
FieldSpec("regress_factor", FieldType.FLOAT, optional=True,
default=1.5),
),
description="LCS-align two run step-traces: added/removed/flips/regress.",
))
specs.append(CommandSpec(
"AC_scan_secrets", "Tools", "Scan for Hardcoded Secrets",
description="Scan 'data' (JSON view) for hardcoded secrets that "
Expand Down
15 changes: 15 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -4366,6 +4366,20 @@ def _group_failures(errors: Any) -> Dict[str, Any]:
return {"groups": groups, "count": len(groups)}


def _diff_runs(before: Any, after: Any, key: str = "name",
regress_factor: Any = 1.5) -> Dict[str, Any]:
"""Adapter: diff two run step-traces (added/removed/flips/regressions)."""
import json
from je_auto_control.utils.run_diff import diff_runs, summarize_run_diff
if isinstance(before, str):
before = json.loads(before)
if isinstance(after, str):
after = json.loads(after)
diff = diff_runs(before, after, key=str(key),
regress_factor=float(regress_factor))
return {**diff, "summary": summarize_run_diff(diff)}


def _image_histogram(source: Any = None, bins: Any = 32, space: str = "hsv",
region: Any = None) -> Dict[str, Any]:
"""Adapter: per-channel colour histogram of an image / the screen."""
Expand Down Expand Up @@ -6596,6 +6610,7 @@ def __init__(self):
"AC_most_salient": _most_salient,
"AC_failure_signature": _failure_signature,
"AC_group_failures": _group_failures,
"AC_diff_runs": _diff_runs,
"AC_image_histogram": _image_histogram,
"AC_histogram_changed": _histogram_changed,
"AC_changed_regions": _changed_regions,
Expand Down
16 changes: 16 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -7675,6 +7675,22 @@ def flakiness_tools() -> List[MCPTool]:
handler=h.group_failures,
annotations=READ_ONLY,
),
MCPTool(
name="ac_diff_runs",
description=("Diff two run step-traces (lists of {name,status,"
"duration,error}) — LCS-aligned so inserts shift rather "
"than mis-pair. Returns {added, removed, status_flips "
"(with failure signature), timing_regressions, aligned, "
"identical, summary}. 'regress_factor' = slowdown ratio."),
input_schema=schema({
"before": {"type": "array", "items": {"type": "object"}},
"after": {"type": "array", "items": {"type": "object"}},
"key": {"type": "string"},
"regress_factor": {"type": "number"}},
required=["before", "after"]),
handler=h.diff_runs,
annotations=READ_ONLY,
),
]


Expand Down
5 changes: 5 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2553,6 +2553,11 @@ def group_failures(errors):
return _group_failures(errors)


def diff_runs(before, after, key="name", regress_factor=1.5):
from je_auto_control.utils.executor.action_executor import _diff_runs
return _diff_runs(before, after, key, regress_factor)


def image_histogram(source=None, bins=32, space="hsv", region=None):
from je_auto_control.utils.executor.action_executor import _image_histogram
return _image_histogram(source, bins, space, region)
Expand Down
4 changes: 4 additions & 0 deletions je_auto_control/utils/run_diff/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""Diff two run traces (LCS-aligned step diff: added/removed/flips/regressions)."""
from je_auto_control.utils.run_diff.run_diff import diff_runs, summarize_run_diff

__all__ = ["diff_runs", "summarize_run_diff"]
119 changes: 119 additions & 0 deletions je_auto_control/utils/run_diff/run_diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
"""Diff two run traces — what changed between two executions of a flow.

A run history tells you a run *failed*, but not *what changed* from the run that
passed: which step was added or dropped, which step flipped pass→fail, which step
got slower. ``run_diff`` aligns the two step sequences with a longest-common-
subsequence walk (so an inserted / removed step shifts the rest into place instead
of mis-pairing everything) and classifies the differences:

* **added** / **removed** steps (present in only one run),
* **status flips** (an aligned step whose status changed — with the new failure's
:func:`failure_signature` when it carries an ``error``),
* **timing regressions** (an aligned step that got ``regress_factor`` x slower).

A step is any dict with a name key (default ``"name"``) and optional ``status`` /
``duration`` / ``error``. Pure standard library; no device, no ``PySide6``.
"""
from typing import Any, Dict, List, Sequence

Step = Dict[str, Any]


def _lcs_pairs(left: Sequence[str], right: Sequence[str]) -> List[tuple]:
"""Return aligned ``(i, j)`` index pairs of the longest common subsequence."""
n, m = len(left), len(right)
table = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(n - 1, -1, -1):
for j in range(m - 1, -1, -1):
if left[i] == right[j]:
table[i][j] = table[i + 1][j + 1] + 1
else:
table[i][j] = max(table[i + 1][j], table[i][j + 1])
pairs, i, j = [], 0, 0
while i < n and j < m:
if left[i] == right[j]:
pairs.append((i, j))
i, j = i + 1, j + 1
elif table[i + 1][j] >= table[i][j + 1]:
i += 1
else:
j += 1
return pairs


def _status_flip(before: Step, after: Step, name: str) -> Dict[str, Any]:
"""Build a status-flip record, attaching a signature for a new error."""
flip = {"name": name, "from": before.get("status"),
"to": after.get("status")}
error = after.get("error")
if error:
from je_auto_control.utils.failure_signature import failure_signature
flip["signature"] = failure_signature(str(error))
return flip


def _regression(before: Step, after: Step, name: str,
factor: float) -> Dict[str, Any]:
"""Return a timing-regression record, or ``{}`` if not a regression."""
prev, curr = before.get("duration"), after.get("duration")
if not isinstance(prev, (int, float)) or not isinstance(curr, (int, float)):
return {}
if prev > 0 and curr >= prev * float(factor):
return {"name": name, "before": float(prev), "after": float(curr),
"ratio": round(curr / prev, 3)}
return {}


def _aligned_changes(before: Sequence[Step], after: Sequence[Step],
names: Sequence[str], pairs: List[tuple],
factor: float) -> tuple:
"""Classify the LCS-aligned pairs into (status flips, timing regressions)."""
flips, regressions = [], []
for i, j in pairs:
if before[i].get("status") != after[j].get("status"):
flips.append(_status_flip(before[i], after[j], names[i]))
regression = _regression(before[i], after[j], names[i], factor)
if regression:
regressions.append(regression)
return flips, regressions


def _keys(steps: Sequence[Step], key: str) -> List[str]:
return [str(step.get(key, "")) for step in steps]


def _unmatched(steps: Sequence[Step], matched: set) -> List[Step]:
return [steps[k] for k in range(len(steps)) if k not in matched]


def diff_runs(before: Sequence[Step], after: Sequence[Step], *,
key: str = "name", regress_factor: float = 1.5) -> Dict[str, Any]:
"""Diff two step sequences into ``{added, removed, status_flips,
timing_regressions, aligned, identical}``.

Steps are aligned by their ``key`` value via LCS; ``regress_factor`` is the
slowdown ratio that counts as a timing regression.
"""
left, right = _keys(before, key), _keys(after, key)
pairs = _lcs_pairs(left, right)
flips, regressions = _aligned_changes(before, after, left, pairs,
regress_factor)
added = _unmatched(after, {j for _, j in pairs})
removed = _unmatched(before, {i for i, _ in pairs})
return {"added": added, "removed": removed, "status_flips": flips,
"timing_regressions": regressions, "aligned": len(pairs),
"identical": not any((added, removed, flips, regressions))}


def summarize_run_diff(diff: Dict[str, Any]) -> str:
"""Render a one-line human summary of a :func:`diff_runs` result."""
if diff.get("identical"):
return "no change"
parts = []
for label, field in (("+{} added", "added"), ("-{} removed", "removed"),
("{} status flip(s)", "status_flips"),
("{} regression(s)", "timing_regressions")):
count = len(diff.get(field, []))
if count:
parts.append(label.format(count))
return ", ".join(parts)
Loading
Loading