Skip to content

[Feature Request] Add a small "FreshStack RAG evaluation failure modes" checklist doc (docs only) #9

@onestardao

Description

@onestardao

Hi FreshStack team,

Thank you for releasing FreshStack. A benchmark and framework for RAG over technical documentation is extremely useful for both research and industry.

I have been working on 16-mode failure maps for RAG systems and recently contributed a robustness-related entry to Harvard MIMS Lab’s ToolUniverse. In FreshStack-style settings, I often see repeated issues:

  • retrieval that focuses on popular pages rather than the correct ones
  • confusion between similar APIs or versions in the documentation
  • answer evaluation that does not fully reflect grounding in the retrieved docs
  • experiments that are hard to reproduce because configuration details are not recorded

I would like to propose a small, documentation-only evaluation checklist for FreshStack users.

Proposed feature

Add a short markdown page under the repo, for example:

freshstack_rag_evaluation_failure_modes_and_checklist.md

The page could:

  1. List typical RAG failure modes specific to technical docs (API confusion, versioning, incomplete snippets).
  2. For each, describe:
    • symptoms in FreshStack evaluations
    • likely causes (retrieval settings, corpus preparation, query formulation).
  3. Provide a short checklist for running and reporting FreshStack experiments:
    • corpus version, retrieval configuration, model, and key evaluation settings.

Motivation

  • FreshStack is likely to become a standard reference for technical-doc RAG.
  • A small failure-mode and reporting checklist would help ensure that evaluations are interpretable and comparable across systems.
  • This is a docs-only change and should be straightforward to maintain.

If this is aligned with your goals for FreshStack, I would be glad to propose a concise initial draft in a PR.

Thank you for considering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions