Skip to content

Latest commit

 

History

History
94 lines (68 loc) · 6.33 KB

File metadata and controls

94 lines (68 loc) · 6.33 KB

FunASR Use-case Showcase

FunASR is useful far beyond a single offline transcription command. This page collects the fastest paths for developers who want to evaluate, deploy, or integrate speech understanding in real products.

Choose the right path

Goal Start here Why it matters
Try FunASR in a browser Colab quickstart Run a public sample and upload your own audio before setting up a local environment.
Transcribe one file locally README quick start and model selection guide Verify install, model choice, and model download in minutes.
Compare accuracy and speed Benchmark report Reproduce the 184-file long-audio benchmark before choosing a model.
Migrate from Whisper/cloud ASR Migration guide Map existing pipelines to FunASR, benchmark representative audio, and plan a safe rollout.
Build a private speech API OpenAI-compatible API example, Gradio browser demo, client recipes, JavaScript/TypeScript recipes, and workflow recipes Reuse LangChain, Dify, n8n, AutoGen, and other OpenAI-style clients without sending audio to a cloud ASR provider.
Add speech input to agents MCP server and voice input Connect local ASR to Claude, Cursor, and desktop agent workflows.
Choose a deployment path Deployment matrix Compare Python API, OpenAI API, Docker Compose, Kubernetes, WebSocket, vLLM, MCP, batch, subtitles, and Triton.
Serve streaming ASR Runtime service docs Run WebSocket or service-mode ASR for live captioning and call-center style workloads.
Accelerate LLM-based ASR vLLM guide Use tensor parallel decoding and streaming service support for Fun-ASR-Nano.
Generate subtitles Subtitle example Turn long audio or video into subtitle files for media workflows.
Process many recordings Batch ASR example Build repeatable offline jobs for archives, meetings, and datasets.

Production-oriented recipes

Private transcription API

Use this path when an application already speaks OpenAI-style APIs or when audio cannot leave your environment.

pip install funasr fastapi uvicorn python-multipart
funasr-server --model sensevoice --device cuda
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Recommended next steps:

Agent speech input

Use this path when you want to talk to coding agents, internal assistants, or workflow tools.

  • Start with the MCP server example for Claude/Cursor-style tools.
  • Use the voice input example for desktop speech input experiments.
  • Keep latency visible: log audio duration, processing time, and selected model for each request.

Streaming and call-center workloads

Use this path when partial results and low perceived latency matter more than a single final transcript.

  • Start from the runtime service docs.
  • Pair ASR with VAD, punctuation, and speaker diarization when the transcript needs to be readable by humans.
  • Validate with realistic audio: background noise, long silence, overlapping speakers, and different microphone quality.

Benchmark before migrating from Whisper

Use this path when deciding whether FunASR is a good replacement for Whisper or a cloud ASR provider.

  • Follow the migration guide to map features and benchmark representative audio.
  • Read the public benchmark report.
  • Benchmark your own sample set before migration; include both short clips and long-form recordings.
  • Track cost and throughput together: GPU speed, CPU viability, model download size, and deployment complexity.

Model selection hints

For a deeper comparison of SenseVoice, Paraformer, Fun-ASR-Nano, streaming runtime, and OpenAI API aliases, use the model selection guide.

Need Good first choice Notes
Fast multilingual transcription SenseVoice-Small Strong default for local demos and private APIs.
Mandarin production ASR Paraformer-Large Mature choice for Chinese speech recognition.
LLM-based ASR experiments Fun-ASR-Nano Pair with the vLLM guide when throughput matters.
Speaker-aware transcripts SenseVoice or Paraformer with spk_model="cam++" Useful for meetings, interviews, and customer calls.
Live audio Runtime WebSocket service Validate chunking, VAD, and endpointing with real traffic.

Share your result

If FunASR works well in your project, consider opening a showcase issue, Migration Benchmark Report, or GitHub Discussion with:

  • Use case and deployment mode.
  • Model, device, and processing speed.
  • Audio domain, language, and rough duration.
  • A public demo, screenshot, benchmark summary, or integration link when available.

Concrete usage reports help new users choose the right path and help maintainers prioritize the next round of docs and examples.