Skip to content

[BUG] Missing backtracking in POSSIBLE_SILENCE state leads to redundant silence tails in VAD segments #47

@gpww

Description

@gpww

Type: Bug Report / Logic Flaw

Description (描述)

In the VadPostprocessor._smooth_preds_with_state_machine method, when the state transitions from POSSIBLE_SILENCE to SILENCE (after confirming a silence period of min_silence_frame), the current implementation fails to backtrack and reset the decisions made during the observation period.

VadPostprocessor._smooth_preds_with_state_machine 方法中,当状态从 POSSIBLE_SILENCE 切换到 SILENCE(即确认了达到 min_silence_frame 长度的静音)时,目前的实现没有对观察期内的决策进行回溯重置。

Impact (影响)

Every speech segment identified by the VAD will have a redundant "silence tail" with a length exactly equal to min_silence_frame.

  • Acoustic Noise: This introduces unnecessary background noise to the ASR decoder.
  • ASR Hallucination: For short utterances (e.g., < 3s), these extra 200ms+ of silence/noise significantly increase the risk of the ASR model generating hallucinated tokens (like "的", "了", or filler words) at the end of the sentence.

这导致 VAD 识别出的每个语音段末尾都会带有一个长度等于 min_silence_frame 的冗余“静音尾巴”。

  • 声学噪声:给 ASR 解码器引入了不必要的背景噪声。
  • ASR 幻觉:对于短音频(如 < 3s),这额外的 200ms+ 静音/噪声显著增加了 ASR 模型在句末产生幻觉词(如“的”、“了”或语气词)的风险。

Location (代码位置)

File: fireredasr2s/runtime/vad_postprocessor.py (or equivalent path)
Method: _smooth_preds_with_state_machine

Suggested Fix (修复建议)

Add a backtracking line to reset the observation period frames to 0 (silence) when the silence is confirmed.
在确认静音后,增加一行回溯逻辑,将观察期内的帧重置为 0

# ... inside _smooth_preds_with_state_machine ...
            elif state == VadState.POSSIBLE_SILENCE:
                if not is_speech:
                    assert silence_start != -1
                    if t - silence_start >= self.min_silence_frame:
                        state = VadState.SILENCE
                        speech_start = -1
                        # --- FIX START ---
                        # Backtrack: Reset the observation period frames to silence (0)
                        decisions[silence_start:t] = [0] * (t - silence_start)
                        # --- FIX END ---
                else:
                    state = VadState.SPEECH
                    silence_start = -1

Environment (环境)

  • Model: FireRedASR2S (VAD module)
  • Version: Latest main branch (as of April 2026)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions