Skip to content

Add LLM as a judge and additional evaluation methods to voice agents page#788

Merged
a-dinoto merged 1 commit intomainfrom
add-llm-judge-and-eval-updates
Mar 19, 2026
Merged

Add LLM as a judge and additional evaluation methods to voice agents page#788
a-dinoto merged 1 commit intomainfrom
add-llm-judge-and-eval-updates

Conversation

@dylan-duan-aai
Copy link
Contributor

@dylan-duan-aai dylan-duan-aai commented Mar 19, 2026

Summary

  • Add LASER score subsection with penalty tiers, EMNLP 2025 paper link, and aai-cli link
  • Add ground truth quality section with common issues and Truth File Corrector tool link
  • Consolidate qualitative analysis (side-by-side, LLM as judge, A/B testing) into "How to do a vibe-eval" to remove duplication
  • Brings voice agents eval page to parity with pre-recorded eval page

Test plan

  • Preview at /docs/evaluations/voice-agents
  • Verify LASER formula renders correctly
  • Verify links: Truth File Corrector, Diffchecker, aai-cli, LASER anchor, LiveKit
  • Confirm no duplicate A/B testing content in vibes vs metrics section
  • Compare against /docs/evaluations/pre-recorded-audio for parity

Open with Devin

- Add LASER score subsection with penalty tiers and aai-cli link
- Add ground truth quality section with Truth File Corrector tool
- Consolidate qualitative analysis into vibes vs metrics section
- Remove duplicate A/B testing content
@github-actions
Copy link

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@a-dinoto a-dinoto merged commit af16e6f into main Mar 19, 2026
5 checks passed
@a-dinoto a-dinoto deleted the add-llm-judge-and-eval-updates branch March 19, 2026 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants