Iterative answer pipeline#1
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an iterative (chunked) Stage 11 answer-selection mode so the LLM can read the full sorted PloverDB response in token-budgeted chunks instead of relying solely on top‑N truncation, alongside related guardrails and ranking improvements to reduce hallucinations and improve answer quality.
Changes:
- Add chunking infrastructure and an iterative Stage 11 loop that reads the sorted (untruncated) Plover response in chunks and validates carry-forward picks.
- Strengthen ranking/faithfulness signals: demote text-mined edges via a new source tier, incorporate per-candidate graph coverage into NameRes reranking, and restrict answer graph views by cited supporting edges.
- Add new tests and a
golden_evalCLI for running/scoring gold questions; extend configuration to include per-model context windows and new feature flags.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
pipeline/config.yaml |
Adds per-model context_window, Stage 11 iterative config, and scope-check toggle. |
pipeline/code/tests/test_smoke.py |
Updates expected pipeline status set (adds arity failure, removes low-confidence hard-fail). |
pipeline/code/tests/test_response_reduction.py |
Updates/extends reduction sorting tests for source-tier demotion behavior. |
pipeline/code/tests/test_rerank_coverage_tier.py |
New unit tests pinning coverage-tier ordering in candidate reranking. |
pipeline/code/tests/test_query_arity.py |
New unit tests for one-hop query-graph arity gate. |
pipeline/code/tests/test_load_questions.py |
New test ensuring gold question IDs are normalized for runner usage. |
pipeline/code/tests/test_iterative_answer_pick.py |
New unit tests for iterative pick validation (carry-forward + dedupe). |
pipeline/code/tests/test_chunking.py |
New unit tests for chunking correctness and truncation behavior. |
pipeline/code/tests/test_answer_graph_view.py |
Adds tests ensuring supporting-edge IDs restrict the graph view (faithfulness boundary). |
pipeline/code/runner.py |
Records context_window in run metadata. |
pipeline/code/reduction.py |
Adds source-tier demotion, refactors parsing/rebuild for reuse, and implements response chunking. |
pipeline/code/prompts.py |
Updates NameRes coverage field naming and adds the iterative Stage 11 system prompt; adjusts explainer guidance. |
pipeline/code/pipeline.py |
Implements optional scope-check, query arity gate, coverage-aware rerank wiring, iterative Stage 11 loop, and supporting-edge filtering for graph view. |
pipeline/code/golden_eval.py |
New CLI tool to run gold questions and score recall/correctness signals. |
pipeline/code/config.py |
Adds config schema for context_window, Stage 11 iterative config, scope-check toggle, and normalizes loaded gold question IDs. |
.gitignore |
Ignores additional local planning/audit docs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -341,123 +374,171 @@ def reduce_plover_response( | |||
| if not isinstance(aux_graphs_in, dict): | |||
| aux_graphs_in = {} | |||
| query_graph = message.get("query_graph") or {"nodes": {}, "edges": {}} | |||
Comment on lines
+1943
to
+1949
| picked_edge_ids: set[str] = { | ||
| eid | ||
| for answer in (answer_obj.get("answers") or []) | ||
| if isinstance(answer, dict) | ||
| for eid in (answer.get("supporting_edge_ids") or []) | ||
| if isinstance(eid, str) | ||
| } |
| # ever sees them). | ||
| stage11: | ||
| iterative: | ||
| enabled: true |
| # to entity extraction; scope is checked manually. re-enable for the | ||
| # irrelevant-probe benchmark or to refuse chit-chat in the live UI. | ||
| scope_check: | ||
| enabled: true |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.