Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions WHATS_NEW.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# What's New — AutoControl

## What's new (2026-06-25) — Per-Run Step Timeline (waterfall + bottleneck steps)

Read why *this* run was slow — a step waterfall and its bottlenecks. Full reference: [`docs/source/Eng/doc/new_features/v194_features_doc.rst`](docs/source/Eng/doc/new_features/v194_features_doc.rst).

- **`build_timeline` / `critical_steps`** (`AC_build_timeline`, `AC_critical_steps`): the action profiler aggregates timings by step *name* across runs — useless for "why was *this* run slow". This turns one run's ordered steps into a waterfall (each step's offset, duration, and `pct` share of the total) with the `bottleneck` step and a `parallelism` ratio (`> 1` when steps overlap via explicit `start` times); `critical_steps` ranks the dominant steps to optimise. A step is any `{name, duration, start?}` dict. Pure stdlib. No `PySide6`.

## What's new (2026-06-25) — Flaky-Test Co-Failure Clustering

Find the tests that flake *together* — and the shared root cause behind them. Full reference: [`docs/source/Eng/doc/new_features/v193_features_doc.rst`](docs/source/Eng/doc/new_features/v193_features_doc.rst).
Expand Down
51 changes: 51 additions & 0 deletions docs/source/Eng/doc/new_features/v194_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Per-Run Step Timeline (waterfall + bottleneck steps)
====================================================

The action profiler aggregates timings by step *name* across many runs — great
for "which action is slow on average", useless for "why was *this* run slow". A
single run is an ordered timeline: step A ran, then B, then C, and one of them
dominated. ``step_timeline`` turns one run's steps into a waterfall (each step's
offset from the start, its duration and its share of the total) and ranks the
bottleneck steps, so you can read a single slow run instead of an average.

* :func:`build_timeline` — the waterfall + total / busy / bottleneck /
parallelism,
* :func:`critical_steps` — the steps that dominate the run, longest first.

A step is any dict with a name (default ``"name"``) and a ``duration``; an
optional ``start`` places it on an absolute timeline (overlapping / parallel
steps), else steps are laid out back-to-back. Pure standard library; no device,
no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import build_timeline, critical_steps

steps = [{"name": "login", "duration": 1.0},
{"name": "load_dashboard", "duration": 4.0},
{"name": "submit", "duration": 1.0}]

build_timeline(steps)
# {"steps": [{"name": "login", "offset": 0.0, "duration": 1.0, "pct": 16.7},
# {"name": "load_dashboard", "offset": 1.0, ..., "pct": 66.7}, ...],
# "total": 6.0, "busy": 6.0,
# "bottleneck": {"name": "load_dashboard", "duration": 4.0},
# "parallelism": 1.0}

critical_steps(steps, top=2)
# [{"name": "load_dashboard", "duration": 4.0, "pct": 66.7},
# {"name": "login", "duration": 1.0, "pct": 16.7}]

``total`` is the wall-clock span, ``busy`` the summed step time; ``parallelism`` =
busy / total is ``1.0`` for a purely sequential run and ``> 1`` when steps overlap
(supply ``start`` times). ``pct`` is each step's share of the total time.

Executor commands
-----------------

``AC_build_timeline`` (``steps``) and ``AC_critical_steps`` (``steps`` / ``top``).
They are exposed as read-only ``ac_*`` MCP tools and as Script Builder commands
under **Testing**.
44 changes: 44 additions & 0 deletions docs/source/Zh/doc/new_features/v194_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
單次執行的步驟時間軸(瀑布圖 + 瓶頸步驟)
==========================================

動作 profiler 把計時按步驟*名稱*跨多次執行聚合——很適合「哪個動作平均較慢」,卻無助於「為什麼
*這一次*執行很慢」。單次執行是一條有序時間軸:步驟 A 跑完、接著 B、再 C,其中某一步主導了時間。
``step_timeline`` 把一次執行的步驟轉成瀑布圖(每步距起點的偏移、其時長、其占總時間的比例),並
排名瓶頸步驟,讓你能讀懂單一慢執行,而非平均值。

* :func:`build_timeline` ——瀑布圖加上 total / busy / bottleneck / parallelism,
* :func:`critical_steps` ——主導該次執行的步驟,最長者在前。

步驟可為任何帶名稱(預設 ``"name"``)與 ``duration`` 的字典;選填 ``start`` 會把它放到絕對
時間軸上(重疊 / 平行步驟),否則步驟會背靠背排列。純標準庫;不涉及裝置,不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import build_timeline, critical_steps

steps = [{"name": "login", "duration": 1.0},
{"name": "load_dashboard", "duration": 4.0},
{"name": "submit", "duration": 1.0}]

build_timeline(steps)
# {"steps": [{"name": "login", "offset": 0.0, "duration": 1.0, "pct": 16.7},
# {"name": "load_dashboard", "offset": 1.0, ..., "pct": 66.7}, ...],
# "total": 6.0, "busy": 6.0,
# "bottleneck": {"name": "load_dashboard", "duration": 4.0},
# "parallelism": 1.0}

critical_steps(steps, top=2)
# [{"name": "load_dashboard", "duration": 4.0, "pct": 66.7},
# {"name": "login", "duration": 1.0, "pct": 16.7}]

``total`` 是牆鐘時間跨度,``busy`` 是各步驟時長總和;``parallelism`` = busy / total,純序列執行
為 ``1.0``,步驟重疊時 ``> 1``(需提供 ``start`` 時間)。``pct`` 是每步占總時間的比例。

執行器指令
----------

``AC_build_timeline``(``steps``)與 ``AC_critical_steps``(``steps`` / ``top``)。皆以唯讀
``ac_*`` MCP 工具及 Script Builder 指令(位於 **Testing** 分類下)形式提供。
3 changes: 3 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,8 @@
from je_auto_control.utils.run_diff import diff_runs, summarize_run_diff
# Flaky-test co-failure clustering (Jaccard over shared failing runs)
from je_auto_control.utils.flake_cluster import cofailure_pairs, failure_clusters
# Per-run step waterfall + bottleneck (critical) steps
from je_auto_control.utils.step_timeline import build_timeline, critical_steps
# VLM element locator (headless)
from je_auto_control.utils.vision import (
VLMNotAvailableError, click_by_description, locate_by_description,
Expand Down Expand Up @@ -1676,6 +1678,7 @@ def start_autocontrol_gui(*args, **kwargs):
"normalize_error", "failure_signature", "group_failures",
"diff_runs", "summarize_run_diff",
"cofailure_pairs", "failure_clusters",
"build_timeline", "critical_steps",
# VLM locator
"VLMNotAvailableError", "locate_by_description", "click_by_description",
"verify_description",
Expand Down
15 changes: 15 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -2755,6 +2755,21 @@ def _add_audit_specs(specs: List[CommandSpec]) -> None:
),
description="Test pairs that fail together above a Jaccard threshold.",
))
specs.append(CommandSpec(
"AC_build_timeline", "Testing", "Step Timeline (waterfall)",
fields=(FieldSpec("steps", FieldType.STRING,
placeholder='[{"name": "login", "duration": 1.2}]'),),
description="Per-run step waterfall: offsets, durations, bottleneck.",
))
specs.append(CommandSpec(
"AC_critical_steps", "Testing", "Critical (Bottleneck) Steps",
fields=(
FieldSpec("steps", FieldType.STRING,
placeholder='[{"name": "login", "duration": 1.2}]'),
FieldSpec("top", FieldType.INT, optional=True, default=3),
),
description="The steps that dominate a run's time, longest first.",
))
specs.append(CommandSpec(
"AC_scan_secrets", "Tools", "Scan for Hardcoded Secrets",
description="Scan 'data' (JSON view) for hardcoded secrets that "
Expand Down
20 changes: 20 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -4402,6 +4402,24 @@ def _cofailure_pairs(runs: Any, threshold: Any = 0.5) -> Dict[str, Any]:
return {"pairs": pairs, "count": len(pairs)}


def _build_timeline(steps: Any) -> Dict[str, Any]:
"""Adapter: a per-run step waterfall (offsets / durations / bottleneck)."""
import json
from je_auto_control.utils.step_timeline import build_timeline
if isinstance(steps, str):
steps = json.loads(steps)
return build_timeline(steps)


def _critical_steps(steps: Any, top: Any = 3) -> Dict[str, Any]:
"""Adapter: the steps that dominate a run's time (bottlenecks)."""
import json
from je_auto_control.utils.step_timeline import critical_steps
if isinstance(steps, str):
steps = json.loads(steps)
return {"steps": critical_steps(steps, top=int(top))}


def _image_histogram(source: Any = None, bins: Any = 32, space: str = "hsv",
region: Any = None) -> Dict[str, Any]:
"""Adapter: per-channel colour histogram of an image / the screen."""
Expand Down Expand Up @@ -6635,6 +6653,8 @@ def __init__(self):
"AC_diff_runs": _diff_runs,
"AC_failure_clusters": _failure_clusters,
"AC_cofailure_pairs": _cofailure_pairs,
"AC_build_timeline": _build_timeline,
"AC_critical_steps": _critical_steps,
"AC_image_histogram": _image_histogram,
"AC_histogram_changed": _histogram_changed,
"AC_changed_regions": _changed_regions,
Expand Down
24 changes: 24 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -7720,6 +7720,30 @@ def flakiness_tools() -> List[MCPTool]:
handler=h.cofailure_pairs,
annotations=READ_ONLY,
),
MCPTool(
name="ac_build_timeline",
description=("Per-run step waterfall from 'steps' (list of {name,"
"duration,start?}): {steps:[{name,offset,duration,pct}], "
"total, busy, bottleneck, parallelism}. Reads ONE slow "
"run, not a per-name average."),
input_schema=schema({
"steps": {"type": "array", "items": {"type": "object"}}},
required=["steps"]),
handler=h.build_timeline,
annotations=READ_ONLY,
),
MCPTool(
name="ac_critical_steps",
description=("The 'top' steps that dominate a run's time (bottlenecks "
"to optimise): {steps:[{name,duration,pct}]}, longest "
"first."),
input_schema=schema({
"steps": {"type": "array", "items": {"type": "object"}},
"top": {"type": "integer"}},
required=["steps"]),
handler=h.critical_steps,
annotations=READ_ONLY,
),
]


Expand Down
10 changes: 10 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2568,6 +2568,16 @@ def cofailure_pairs(runs, threshold=0.5):
return _cofailure_pairs(runs, threshold)


def build_timeline(steps):
from je_auto_control.utils.executor.action_executor import _build_timeline
return _build_timeline(steps)


def critical_steps(steps, top=3):
from je_auto_control.utils.executor.action_executor import _critical_steps
return _critical_steps(steps, top)


def image_histogram(source=None, bins=32, space="hsv", region=None):
from je_auto_control.utils.executor.action_executor import _image_histogram
return _image_histogram(source, bins, space, region)
Expand Down
6 changes: 6 additions & 0 deletions je_auto_control/utils/step_timeline/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Per-run step waterfall timeline + bottleneck (critical) step ranking."""
from je_auto_control.utils.step_timeline.step_timeline import (
build_timeline, critical_steps,
)

__all__ = ["build_timeline", "critical_steps"]
75 changes: 75 additions & 0 deletions je_auto_control/utils/step_timeline/step_timeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
"""Build a per-run step waterfall and find the run's bottleneck steps.

The action profiler aggregates timings by step *name* across many runs — great for
"which action is slow on average", useless for "why was *this* run slow". A single
run is an ordered timeline: step A ran, then B, then C, and one of them dominated.
``step_timeline`` turns one run's steps into a waterfall (each step's offset from
the start, duration and share of the total) and ranks the bottleneck steps, so you
can read a single slow run instead of an average.

A step is any dict with a name (default ``"name"``) and a ``duration``; an optional
``start`` places it on an absolute timeline (overlapping / parallel steps), else
steps are laid out back-to-back. Pure standard library; no device, no ``PySide6``.
"""
from typing import Any, Dict, List, Sequence

Step = Dict[str, Any]


def _normalize(steps: Sequence[Step], name_key: str, start_key: str,
duration_key: str) -> List[Dict[str, Any]]:
"""Resolve each step to ``{name, start, end, duration}`` (sequential if no start)."""
resolved, cursor = [], 0.0
for step in steps:
duration = float(step.get(duration_key, 0.0) or 0.0)
raw_start = step.get(start_key)
start = float(raw_start) if raw_start is not None else cursor
end = start + duration
cursor = max(cursor, end)
resolved.append({"name": str(step.get(name_key, "")), "start": start,
"end": end, "duration": duration})
return resolved


def build_timeline(steps: Sequence[Step], *, name_key: str = "name",
start_key: str = "start",
duration_key: str = "duration") -> Dict[str, Any]:
"""Return a waterfall timeline for one run.

``{steps:[{name, offset, duration, pct}], total, busy, bottleneck,
parallelism}`` — ``total`` is the wall-clock span, ``busy`` the summed step
time, ``parallelism`` = busy / total (1.0 for a purely sequential run),
``bottleneck`` the longest single step.
"""
resolved = _normalize(steps, name_key, start_key, duration_key)
if not resolved:
return {"steps": [], "total": 0.0, "busy": 0.0, "bottleneck": None,
"parallelism": 0.0}
base = min(step["start"] for step in resolved)
span = max(step["end"] for step in resolved) - base
busy = sum(step["duration"] for step in resolved)
rows = [{"name": step["name"], "offset": round(step["start"] - base, 6),
"duration": step["duration"],
"pct": round(step["duration"] / span * 100, 1) if span > 0 else 0.0}
for step in resolved]
bottleneck = max(resolved, key=lambda step: step["duration"])
return {"steps": rows, "total": round(span, 6), "busy": round(busy, 6),
"bottleneck": {"name": bottleneck["name"],
"duration": bottleneck["duration"]},
"parallelism": round(busy / span, 3) if span > 0 else 1.0}


def critical_steps(steps: Sequence[Step], *, name_key: str = "name",
start_key: str = "start", duration_key: str = "duration",
top: int = 3) -> List[Dict[str, Any]]:
"""Return the ``top`` steps that dominate the run, longest first.

Each entry is ``{name, duration, pct}`` where ``pct`` is the step's share of
the total step time — the bottlenecks worth optimising.
"""
resolved = _normalize(steps, name_key, start_key, duration_key)
busy = sum(step["duration"] for step in resolved) or 1.0
ranked = sorted(resolved, key=lambda step: step["duration"], reverse=True)
return [{"name": step["name"], "duration": step["duration"],
"pct": round(step["duration"] / busy * 100, 1)}
for step in ranked[:max(1, int(top))]]
75 changes: 75 additions & 0 deletions test/unit_test/headless/test_step_timeline_batch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
"""Headless tests for per-run step timeline (waterfall + bottleneck steps)."""
import pytest

import je_auto_control as ac
from je_auto_control.utils.step_timeline import build_timeline, critical_steps


def _sequential():
return [{"name": "login", "duration": 1.0},
{"name": "load", "duration": 4.0},
{"name": "submit", "duration": 1.0}]


def test_sequential_waterfall_offsets_and_bottleneck():
tl = build_timeline(_sequential())
offsets = {s["name"]: s["offset"] for s in tl["steps"]}
assert offsets == {"login": 0.0, "load": 1.0, "submit": 5.0}
assert tl["total"] == pytest.approx(6.0)
assert tl["busy"] == pytest.approx(6.0)
assert tl["parallelism"] == pytest.approx(1.0) # purely sequential
assert tl["bottleneck"] == {"name": "load", "duration": 4.0}


def test_pct_share_of_total():
pct = {s["name"]: s["pct"] for s in build_timeline(_sequential())["steps"]}
assert pct["load"] == pytest.approx(66.7, abs=0.1)


def test_overlapping_run_reports_parallelism():
par = [{"name": "a", "start": 0.0, "duration": 3.0},
{"name": "b", "start": 1.0, "duration": 3.0}]
tl = build_timeline(par)
assert tl["total"] == pytest.approx(4.0) # span 0..4
assert tl["busy"] == pytest.approx(6.0) # 3 + 3
assert tl["parallelism"] == pytest.approx(1.5) # overlap detected


def test_critical_steps_ranked_with_pct():
crit = critical_steps(_sequential(), top=2)
assert [s["name"] for s in crit] == ["load", "login"] # longest first
assert crit[0]["pct"] == pytest.approx(66.7, abs=0.1)
assert len(critical_steps(_sequential(), top=1)) == 1


def test_empty_run():
assert build_timeline([]) == {"steps": [], "total": 0.0, "busy": 0.0,
"bottleneck": None, "parallelism": 0.0}
assert critical_steps([]) == []


# --- wiring ---------------------------------------------------------------

def test_executor_paths():
import json
from je_auto_control.utils.executor.action_executor import (
_build_timeline, _critical_steps)
steps_json = json.dumps(_sequential())
assert _build_timeline(steps_json)["bottleneck"]["name"] == "load"
assert _critical_steps(steps_json, top=1)["steps"][0]["name"] == "load"


def test_wiring():
known = set(ac.executor.known_commands())
assert {"AC_build_timeline", "AC_critical_steps"} <= known
from je_auto_control.utils.mcp_server.tools import build_default_tool_registry
names = {t.name for t in build_default_tool_registry()}
assert {"ac_build_timeline", "ac_critical_steps"} <= names
from je_auto_control.gui.script_builder.command_schema import _build_specs
specs = {s.command for s in _build_specs()}
assert {"AC_build_timeline", "AC_critical_steps"} <= specs


def test_facade_exports():
for name in ("build_timeline", "critical_steps"):
assert hasattr(ac, name) and name in ac.__all__
Loading