FunASR is useful far beyond a single offline transcription command. This page collects the fastest paths for developers who want to evaluate, deploy, or integrate speech understanding in real products.
| Goal | Start here | Why it matters |
|---|---|---|
| Try FunASR in a browser | Colab quickstart | Run a public sample and upload your own audio before setting up a local environment. |
| Transcribe one file locally | README quick start and model selection guide | Verify install, model choice, and model download in minutes. |
| Compare accuracy and speed | Benchmark report | Reproduce the 184-file long-audio benchmark before choosing a model. |
| Migrate from Whisper/cloud ASR | Migration guide | Map existing pipelines to FunASR, benchmark representative audio, and plan a safe rollout. |
| Build a private speech API | OpenAI-compatible API example, Gradio browser demo, client recipes, JavaScript/TypeScript recipes, and workflow recipes | Reuse LangChain, Dify, n8n, AutoGen, and other OpenAI-style clients without sending audio to a cloud ASR provider. |
| Add speech input to agents | MCP server and voice input | Connect local ASR to Claude, Cursor, and desktop agent workflows. |
| Choose a deployment path | Deployment matrix | Compare Python API, OpenAI API, Docker Compose, Kubernetes, WebSocket, vLLM, MCP, batch, subtitles, and Triton. |
| Serve streaming ASR | Runtime service docs | Run WebSocket or service-mode ASR for live captioning and call-center style workloads. |
| Accelerate LLM-based ASR | vLLM guide | Use tensor parallel decoding and streaming service support for Fun-ASR-Nano. |
| Generate subtitles | Subtitle example | Turn long audio or video into subtitle files for media workflows. |
| Process many recordings | Batch ASR example | Build repeatable offline jobs for archives, meetings, and datasets. |
Use this path when an application already speaks OpenAI-style APIs or when audio cannot leave your environment.
pip install funasr fastapi uvicorn python-multipart
funasr-server --model sensevoice --device cudacurl http://localhost:8000/v1/audio/transcriptions \
-F file=@sample.wav \
-F model=sensevoice \
-F response_format=verbose_jsonRecommended next steps:
- Run the OpenAI-compatible API smoke test or the cross-platform Python smoke test.
- For browser upload or microphone demos, start from the Gradio browser demo.
- For Node.js or Next.js services, start from the JavaScript/TypeScript recipes.
- For cluster services, start from the Kubernetes deployment template.
- Add authentication and network controls at your service boundary; start from the security and gateway guide.
- Record model name, device, driver, and audio duration in bug reports and benchmarks.
Use this path when you want to talk to coding agents, internal assistants, or workflow tools.
- Start with the MCP server example for Claude/Cursor-style tools.
- Use the voice input example for desktop speech input experiments.
- Keep latency visible: log audio duration, processing time, and selected model for each request.
Use this path when partial results and low perceived latency matter more than a single final transcript.
- Start from the runtime service docs.
- Pair ASR with VAD, punctuation, and speaker diarization when the transcript needs to be readable by humans.
- Validate with realistic audio: background noise, long silence, overlapping speakers, and different microphone quality.
Use this path when deciding whether FunASR is a good replacement for Whisper or a cloud ASR provider.
- Follow the migration guide to map features and benchmark representative audio.
- Read the public benchmark report.
- Benchmark your own sample set before migration; include both short clips and long-form recordings.
- Track cost and throughput together: GPU speed, CPU viability, model download size, and deployment complexity.
For a deeper comparison of SenseVoice, Paraformer, Fun-ASR-Nano, streaming runtime, and OpenAI API aliases, use the model selection guide.
| Need | Good first choice | Notes |
|---|---|---|
| Fast multilingual transcription | SenseVoice-Small | Strong default for local demos and private APIs. |
| Mandarin production ASR | Paraformer-Large | Mature choice for Chinese speech recognition. |
| LLM-based ASR experiments | Fun-ASR-Nano | Pair with the vLLM guide when throughput matters. |
| Speaker-aware transcripts | SenseVoice or Paraformer with spk_model="cam++" |
Useful for meetings, interviews, and customer calls. |
| Live audio | Runtime WebSocket service | Validate chunking, VAD, and endpointing with real traffic. |
If FunASR works well in your project, consider opening a showcase issue, Migration Benchmark Report, or GitHub Discussion with:
- Use case and deployment mode.
- Model, device, and processing speed.
- Audio domain, language, and rough duration.
- A public demo, screenshot, benchmark summary, or integration link when available.
Concrete usage reports help new users choose the right path and help maintainers prioritize the next round of docs and examples.