Skip to content

Release: round-13 batch 3 (3 features, v175-v177)#392

Merged
JE-Chen merged 6 commits into
mainfrom
dev
Jun 23, 2026
Merged

Release: round-13 batch 3 (3 features, v175-v177)#392
JE-Chen merged 6 commits into
mainfrom
dev

Conversation

@JE-Chen

@JE-Chen JE-Chen commented Jun 23, 2026

Copy link
Copy Markdown
Member

Release — settle / outline / critic

Ships 3 net-new features (#389#391, docs v175–v177) to main, all merged to dev CI-green (SonarCloud + Codacy issues=0 + Actions matrices + Docker headless). Each ships the full 5-layer surface + headless test + EN/Zh docs + changelog.

  • settle_detector (v175) — settle decision as a pure seam over a churn series (offline-testable, vs the loop-bound wait_until_screen_stable).
  • heading_segment (v176) — heading vs body classification by line height + document outline.
  • critic_features (v177) — per-step critic feature bundle (effect + delta + postcondition) with a rule-based step scorer.

Merge with --merge (no branch delete; dev stays the working branch).

JE-Chen added 6 commits June 24, 2026 06:13
smart_waits.wait_until_screen_stable bakes the settle logic inside a time.sleep
loop over live frames, so you can't feed it a recorded series or unit-test the
decision. Extract it: given a stream of churn values, report when churn stayed
<= max_churn for quiet_samples in a row (a spike resets the run). SettleTracker
is the incremental form. Pure-stdlib, no clock, no capture.
…tor-batch

Add settle_detector: settle decision as a pure seam over a churn series
Nothing mapped line height to heading levels or built a section outline;
ocr/structure and element_parse are positional and text_blocks doesn't rank.
Apply the standard heuristic: a line taller than heading_ratio x the median
line height is a heading, and distinct heading heights become levels (tallest =
1). classify_lines tags each line; outline returns the headings in order.
…ent-batch

Add heading_segment: heading vs body classification + document outline
trajectory_eval scores a whole trajectory with no per-step evidence;
agent_trace emits spans not quality; agent_replay stores steps but doesn't
score. Compose action_effect + observation_delta + postcondition into one
per-step record, then a deterministic rule-based scorer gives
{outcome, process_score, reasons} (no model), and to_judge_prompt renders it for
an optional LLM-as-judge.
…res-batch

Add critic_features: per-step critic bundle + rule-based scorer
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 74 complexity · 0 duplication

Metric Results
Complexity 74
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@JE-Chen JE-Chen merged commit 7d2604f into main Jun 23, 2026
31 checks passed
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant