Drop-in replacement for OpenAI's /v1/audio/transcriptions endpoint. Works with any agent framework that supports OpenAI audio API.
pip install funasr fastapi uvicorn python-multipart
python server.py --model sensevoice --device cuda --port 8000Server starts in ~20s (model loading). Health check: GET /health
Need copy-paste integration snippets for Python SDK, JavaScript/TypeScript, HTTP clients, agent tools, a browser demo, Postman, OpenAPI imports, Kubernetes deployment, or Dify/n8n-style workflows? See Client recipes, JavaScript/TypeScript recipes, Gradio browser demo, workflow recipes, the Chinese workflow recipes, the Postman collection, the OpenAPI spec, the security and gateway guide, and the Kubernetes deployment template.
In another terminal, download a public sample and verify both health and transcription:
bash smoke_test.sh
# Cross-platform alternative without curl/bash:
python smoke_test.pyEquivalent manual commands:
curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/health
curl http://localhost:8000/v1/audio/transcriptions \
-F file=@sample.wav \
-F model=sensevoice \
-F response_format=verbose_jsonIf you want a local browser UI for upload or microphone testing, run the API server first and then launch the optional Gradio frontend:
pip install gradio
python gradio_app.py --base-url http://localhost:8000The browser demo calls the same OpenAI-compatible API endpoints as the smoke tests. See Gradio browser demo for Docker, Kubernetes, and production notes.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
# Basic transcription
result = client.audio.transcriptions.create(
model="sensevoice", # or "paraformer", "paraformer-en", "fun-asr-nano"
file=open("meeting.wav", "rb"),
)
print(result.text)
# With timestamps/segments
result = client.audio.transcriptions.create(
model="sensevoice",
file=open("meeting.wav", "rb"),
response_format="verbose_json",
)
# Returns: text, segments (with start/end/speaker), durationcurl http://localhost:8000/v1/audio/transcriptions \
-F file=@audio.wav \
-F model=sensevoice
# With verbose output
curl http://localhost:8000/v1/audio/transcriptions \
-F file=@audio.wav \
-F model=sensevoice \
-F response_format=verbose_json| Model | Speed (GPU) | Speed (CPU) | Languages | Features |
|---|---|---|---|---|
sensevoice |
170x realtime | 17x realtime | zh/en/ja/ko/yue | Emotion detection |
paraformer |
120x realtime | 15x realtime | zh/en | Punctuation |
paraformer-en |
120x realtime | 15x realtime | en | English only |
fun-asr-nano |
17x realtime | 3.6x realtime | 31 languages | LLM-based, timestamps |
| Endpoint | Method | Description |
|---|---|---|
/v1/audio/transcriptions |
POST | Transcribe audio (OpenAI-compatible) |
/v1/models |
GET | List available models |
/health |
GET | Health check + loaded models |
/docs |
GET | Interactive API documentation (Swagger) |
Prefer no-code API checks? Use the Gradio browser demo for local upload or microphone testing, or import the Postman collection and run health, model-list, and transcription requests from Postman. For API gateways, developer portals, or client generation, use the OpenAPI spec.
Works with: LangChain, LlamaIndex, AutoGen, CrewAI, Semantic Kernel, Dify, n8n, or any framework using OpenAI audio API. See Client recipes and JavaScript/TypeScript recipes for SDK and agent-tool patterns, plus workflow recipes for low-code HTTP nodes and webhook workers (中文).
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
def transcribe_for_agent(audio_path: str) -> str:
"""Tool function for LangChain agent."""
result = client.audio.transcriptions.create(
model="sensevoice", file=open(audio_path, "rb")
)
return result.textBuild the example image from this directory. The default image starts in CPU mode so it can be used as a portable smoke test.
cd examples/openai_api
cp .env.example .env
docker compose up --buildEquivalent one-off docker run command:
docker build -t funasr-api .
docker run --rm -p 8000:8000 \
-e FUNASR_DEVICE=cpu \
-e FUNASR_MODEL=sensevoice \
funasr-apiFor GPU hosts, use NVIDIA Container Toolkit and a CUDA-capable PyTorch/FunASR image. After adapting the image dependencies for CUDA, run the same server with FUNASR_DEVICE=cuda:
docker run --rm --gpus all -p 8000:8000 \
-e FUNASR_DEVICE=cuda \
-e FUNASR_MODEL=sensevoice \
funasr-apiVerify the container from another terminal:
BASE_URL=http://localhost:8000 bash smoke_test.sh
python smoke_test.py --base-url http://localhost:8000Before sharing the service across a team or exposing it through a gateway, review the security and gateway guide for TLS, authentication, upload limits, rate limits, and logging.
For an internal cluster service with persistent model cache, health probes, and a private ClusterIP, start from the Kubernetes deployment template. Build and push the example image, apply the manifests, then verify through kubectl port-forward with python smoke_test.py --base-url http://localhost:8000.
Keep the default CPU mode until you have built a CUDA-capable image and configured GPU scheduling for your cluster.
| Arg | Default | Description |
|---|---|---|
--host |
0.0.0.0 | Bind address |
--port |
8000 | Port |
--device |
cuda | Device (cuda/cpu/mps) |
--model |
sensevoice | Pre-load model at startup |
Docker environment variables:
| Env | Default | Description |
|---|---|---|
FUNASR_PORT |
8000 | Container port passed to server.py |
FUNASR_DEVICE |
cpu | Container device mode; set to cuda only when the image has CUDA-capable dependencies |
FUNASR_MODEL |
sensevoice | Model alias loaded at container startup |
- If CUDA is unavailable, use
--device cpufor a slower but simple smoke test. - If port 8000 is occupied, start with
--port 9000and runBASE_URL=http://localhost:9000 bash smoke_test.shorpython smoke_test.py --base-url http://localhost:9000. - If model download is slow, retry with a stable network or pre-download the model from ModelScope/Hugging Face.