Skip to content

Iterative answer pipeline#1

Merged
bazarkua merged 6 commits into
mainfrom
iterative-answer-pipeline
Jun 8, 2026
Merged

Iterative answer pipeline#1
bazarkua merged 6 commits into
mainfrom
iterative-answer-pipeline

Conversation

@bazarkua

@bazarkua bazarkua commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings June 8, 2026 16:32
@bazarkua bazarkua merged commit 5e2e3ad into main Jun 8, 2026
@bazarkua bazarkua deleted the iterative-answer-pipeline branch June 8, 2026 16:33

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an iterative (chunked) Stage 11 answer-selection mode so the LLM can read the full sorted PloverDB response in token-budgeted chunks instead of relying solely on top‑N truncation, alongside related guardrails and ranking improvements to reduce hallucinations and improve answer quality.

Changes:

  • Add chunking infrastructure and an iterative Stage 11 loop that reads the sorted (untruncated) Plover response in chunks and validates carry-forward picks.
  • Strengthen ranking/faithfulness signals: demote text-mined edges via a new source tier, incorporate per-candidate graph coverage into NameRes reranking, and restrict answer graph views by cited supporting edges.
  • Add new tests and a golden_eval CLI for running/scoring gold questions; extend configuration to include per-model context windows and new feature flags.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pipeline/config.yaml Adds per-model context_window, Stage 11 iterative config, and scope-check toggle.
pipeline/code/tests/test_smoke.py Updates expected pipeline status set (adds arity failure, removes low-confidence hard-fail).
pipeline/code/tests/test_response_reduction.py Updates/extends reduction sorting tests for source-tier demotion behavior.
pipeline/code/tests/test_rerank_coverage_tier.py New unit tests pinning coverage-tier ordering in candidate reranking.
pipeline/code/tests/test_query_arity.py New unit tests for one-hop query-graph arity gate.
pipeline/code/tests/test_load_questions.py New test ensuring gold question IDs are normalized for runner usage.
pipeline/code/tests/test_iterative_answer_pick.py New unit tests for iterative pick validation (carry-forward + dedupe).
pipeline/code/tests/test_chunking.py New unit tests for chunking correctness and truncation behavior.
pipeline/code/tests/test_answer_graph_view.py Adds tests ensuring supporting-edge IDs restrict the graph view (faithfulness boundary).
pipeline/code/runner.py Records context_window in run metadata.
pipeline/code/reduction.py Adds source-tier demotion, refactors parsing/rebuild for reuse, and implements response chunking.
pipeline/code/prompts.py Updates NameRes coverage field naming and adds the iterative Stage 11 system prompt; adjusts explainer guidance.
pipeline/code/pipeline.py Implements optional scope-check, query arity gate, coverage-aware rerank wiring, iterative Stage 11 loop, and supporting-edge filtering for graph view.
pipeline/code/golden_eval.py New CLI tool to run gold questions and score recall/correctness signals.
pipeline/code/config.py Adds config schema for context_window, Stage 11 iterative config, scope-check toggle, and normalizes loaded gold question IDs.
.gitignore Ignores additional local planning/audit docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -341,123 +374,171 @@ def reduce_plover_response(
if not isinstance(aux_graphs_in, dict):
aux_graphs_in = {}
query_graph = message.get("query_graph") or {"nodes": {}, "edges": {}}
Comment thread pipeline/code/pipeline.py
Comment on lines +1943 to +1949
picked_edge_ids: set[str] = {
eid
for answer in (answer_obj.get("answers") or [])
if isinstance(answer, dict)
for eid in (answer.get("supporting_edge_ids") or [])
if isinstance(eid, str)
}
Comment thread pipeline/config.yaml
# ever sees them).
stage11:
iterative:
enabled: true
Comment thread pipeline/config.yaml
# to entity extraction; scope is checked manually. re-enable for the
# irrelevant-probe benchmark or to refuse chit-chat in the live UI.
scope_check:
enabled: true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants