Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

(English|简体中文|日本語|한국어)

FunASR OpenAI-Compatible API Server

Drop-in replacement for OpenAI's /v1/audio/transcriptions endpoint. Works with any agent framework that supports OpenAI audio API.

Quick Start

pip install funasr fastapi uvicorn python-multipart
python server.py --model sensevoice --device cuda --port 8000

Server starts in ~20s (model loading). Health check: GET /health

Need copy-paste integration snippets for Python SDK, JavaScript/TypeScript, HTTP clients, agent tools, a browser demo, Postman, OpenAPI imports, Kubernetes deployment, or Dify/n8n-style workflows? See Client recipes, JavaScript/TypeScript recipes, Gradio browser demo, workflow recipes, the Chinese workflow recipes, the Postman collection, the OpenAPI spec, the security and gateway guide, and the Kubernetes deployment template.

End-to-end smoke test

In another terminal, download a public sample and verify both health and transcription:

bash smoke_test.sh
# Cross-platform alternative without curl/bash:
python smoke_test.py

Equivalent manual commands:

curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/health
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Browser demo with Gradio

If you want a local browser UI for upload or microphone testing, run the API server first and then launch the optional Gradio frontend:

pip install gradio
python gradio_app.py --base-url http://localhost:8000

The browser demo calls the same OpenAI-compatible API endpoints as the smoke tests. See Gradio browser demo for Docker, Kubernetes, and production notes.

Usage with OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

# Basic transcription
result = client.audio.transcriptions.create(
    model="sensevoice",  # or "paraformer", "paraformer-en", "fun-asr-nano"
    file=open("meeting.wav", "rb"),
)
print(result.text)

# With timestamps/segments
result = client.audio.transcriptions.create(
    model="sensevoice",
    file=open("meeting.wav", "rb"),
    response_format="verbose_json",
)
# Returns: text, segments (with start/end/speaker), duration

Usage with curl

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=sensevoice

# With verbose output
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Available Models

Model Speed (GPU) Speed (CPU) Languages Features
sensevoice 170x realtime 17x realtime zh/en/ja/ko/yue Emotion detection
paraformer 120x realtime 15x realtime zh/en Punctuation
paraformer-en 120x realtime 15x realtime en English only
fun-asr-nano 17x realtime 3.6x realtime 31 languages LLM-based, timestamps

API Endpoints

Endpoint Method Description
/v1/audio/transcriptions POST Transcribe audio (OpenAI-compatible)
/v1/models GET List available models
/health GET Health check + loaded models
/docs GET Interactive API documentation (Swagger)

Prefer no-code API checks? Use the Gradio browser demo for local upload or microphone testing, or import the Postman collection and run health, model-list, and transcription requests from Postman. For API gateways, developer portals, or client generation, use the OpenAPI spec.

Agent Framework Integration

Works with: LangChain, LlamaIndex, AutoGen, CrewAI, Semantic Kernel, Dify, n8n, or any framework using OpenAI audio API. See Client recipes and JavaScript/TypeScript recipes for SDK and agent-tool patterns, plus workflow recipes for low-code HTTP nodes and webhook workers (中文).

LangChain Example

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")

def transcribe_for_agent(audio_path: str) -> str:
    """Tool function for LangChain agent."""
    result = client.audio.transcriptions.create(
        model="sensevoice", file=open(audio_path, "rb")
    )
    return result.text

Docker Deployment

Build the example image from this directory. The default image starts in CPU mode so it can be used as a portable smoke test.

cd examples/openai_api
cp .env.example .env

docker compose up --build

Equivalent one-off docker run command:

docker build -t funasr-api .

docker run --rm -p 8000:8000 \
  -e FUNASR_DEVICE=cpu \
  -e FUNASR_MODEL=sensevoice \
  funasr-api

For GPU hosts, use NVIDIA Container Toolkit and a CUDA-capable PyTorch/FunASR image. After adapting the image dependencies for CUDA, run the same server with FUNASR_DEVICE=cuda:

docker run --rm --gpus all -p 8000:8000 \
  -e FUNASR_DEVICE=cuda \
  -e FUNASR_MODEL=sensevoice \
  funasr-api

Verify the container from another terminal:

BASE_URL=http://localhost:8000 bash smoke_test.sh
python smoke_test.py --base-url http://localhost:8000

Kubernetes Deployment

Before sharing the service across a team or exposing it through a gateway, review the security and gateway guide for TLS, authentication, upload limits, rate limits, and logging.

For an internal cluster service with persistent model cache, health probes, and a private ClusterIP, start from the Kubernetes deployment template. Build and push the example image, apply the manifests, then verify through kubectl port-forward with python smoke_test.py --base-url http://localhost:8000.

Keep the default CPU mode until you have built a CUDA-capable image and configured GPU scheduling for your cluster.

Configuration

Arg Default Description
--host 0.0.0.0 Bind address
--port 8000 Port
--device cuda Device (cuda/cpu/mps)
--model sensevoice Pre-load model at startup

Docker environment variables:

Env Default Description
FUNASR_PORT 8000 Container port passed to server.py
FUNASR_DEVICE cpu Container device mode; set to cuda only when the image has CUDA-capable dependencies
FUNASR_MODEL sensevoice Model alias loaded at container startup

Troubleshooting

  • If CUDA is unavailable, use --device cpu for a slower but simple smoke test.
  • If port 8000 is occupied, start with --port 9000 and run BASE_URL=http://localhost:9000 bash smoke_test.sh or python smoke_test.py --base-url http://localhost:9000.
  • If model download is slow, retry with a stable network or pre-download the model from ModelScope/Hugging Face.