Name	Name	Last commit message	Last commit date
parent directory ..
kubernetes	kubernetes
.dockerignore	.dockerignore
.env.example	.env.example
BLOG_POST.md	BLOG_POST.md
CLIENTS.md	CLIENTS.md
Dockerfile	Dockerfile
GRADIO.md	GRADIO.md
GRADIO_zh.md	GRADIO_zh.md
JAVASCRIPT.md	JAVASCRIPT.md
JAVASCRIPT_zh.md	JAVASCRIPT_zh.md
OPENAPI.md	OPENAPI.md
OPENAPI_zh.md	OPENAPI_zh.md
POSTMAN.md	POSTMAN.md
POSTMAN_zh.md	POSTMAN_zh.md
README.md	README.md
README_ja.md	README_ja.md
README_ko.md	README_ko.md
README_zh.md	README_zh.md
SECURITY.md	SECURITY.md
SECURITY_zh.md	SECURITY_zh.md
WORKFLOWS.md	WORKFLOWS.md
WORKFLOWS_zh.md	WORKFLOWS_zh.md
docker-compose.yml	docker-compose.yml
funasr-openai-api.postman_collection.json	funasr-openai-api.postman_collection.json
gradio_app.py	gradio_app.py
openapi.json	openapi.json
server.py	server.py
smoke_test.py	smoke_test.py
smoke_test.sh	smoke_test.sh

(English|简体中文|日本語|한국어)

FunASR OpenAI-Compatible API Server

Drop-in replacement for OpenAI's /v1/audio/transcriptions endpoint. Works with any agent framework that supports OpenAI audio API.

Quick Start

pip install funasr fastapi uvicorn python-multipart
python server.py --model sensevoice --device cuda --port 8000

Server starts in ~20s (model loading). Health check: GET /health

Need copy-paste integration snippets for Python SDK, JavaScript/TypeScript, HTTP clients, agent tools, a browser demo, Postman, OpenAPI imports, Kubernetes deployment, or Dify/n8n-style workflows? See Client recipes, JavaScript/TypeScript recipes, Gradio browser demo, workflow recipes, the Chinese workflow recipes, the Postman collection, the OpenAPI spec, the security and gateway guide, and the Kubernetes deployment template.

End-to-end smoke test

In another terminal, download a public sample and verify both health and transcription:

bash smoke_test.sh
# Cross-platform alternative without curl/bash:
python smoke_test.py

Equivalent manual commands:

curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/health
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Browser demo with Gradio

If you want a local browser UI for upload or microphone testing, run the API server first and then launch the optional Gradio frontend:

pip install gradio
python gradio_app.py --base-url http://localhost:8000

The browser demo calls the same OpenAI-compatible API endpoints as the smoke tests. See Gradio browser demo for Docker, Kubernetes, and production notes.

Usage with OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

# Basic transcription
result = client.audio.transcriptions.create(
    model="sensevoice",  # or "paraformer", "paraformer-en", "fun-asr-nano"
    file=open("meeting.wav", "rb"),
)
print(result.text)

# With timestamps/segments
result = client.audio.transcriptions.create(
    model="sensevoice",
    file=open("meeting.wav", "rb"),
    response_format="verbose_json",
)
# Returns: text, segments (with start/end/speaker), duration

Usage with curl

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=sensevoice

# With verbose output
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Available Models

Model	Speed (GPU)	Speed (CPU)	Languages	Features
`sensevoice`	170x realtime	17x realtime	zh/en/ja/ko/yue	Emotion detection
`paraformer`	120x realtime	15x realtime	zh/en	Punctuation
`paraformer-en`	120x realtime	15x realtime	en	English only
`fun-asr-nano`	17x realtime	3.6x realtime	31 languages	LLM-based, timestamps

API Endpoints

Endpoint	Method	Description
`/v1/audio/transcriptions`	POST	Transcribe audio (OpenAI-compatible)
`/v1/models`	GET	List available models
`/health`	GET	Health check + loaded models
`/docs`	GET	Interactive API documentation (Swagger)

Prefer no-code API checks? Use the Gradio browser demo for local upload or microphone testing, or import the Postman collection and run health, model-list, and transcription requests from Postman. For API gateways, developer portals, or client generation, use the OpenAPI spec.

Agent Framework Integration

Works with: LangChain, LlamaIndex, AutoGen, CrewAI, Semantic Kernel, Dify, n8n, or any framework using OpenAI audio API. See Client recipes and JavaScript/TypeScript recipes for SDK and agent-tool patterns, plus workflow recipes for low-code HTTP nodes and webhook workers (中文).

LangChain Example

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")

def transcribe_for_agent(audio_path: str) -> str:
    """Tool function for LangChain agent."""
    result = client.audio.transcriptions.create(
        model="sensevoice", file=open(audio_path, "rb")
    )
    return result.text

Docker Deployment

Build the example image from this directory. The default image starts in CPU mode so it can be used as a portable smoke test.

cd examples/openai_api
cp .env.example .env

docker compose up --build

Equivalent one-off docker run command:

docker build -t funasr-api .

docker run --rm -p 8000:8000 \
  -e FUNASR_DEVICE=cpu \
  -e FUNASR_MODEL=sensevoice \
  funasr-api

For GPU hosts, use NVIDIA Container Toolkit and a CUDA-capable PyTorch/FunASR image. After adapting the image dependencies for CUDA, run the same server with FUNASR_DEVICE=cuda:

docker run --rm --gpus all -p 8000:8000 \
  -e FUNASR_DEVICE=cuda \
  -e FUNASR_MODEL=sensevoice \
  funasr-api

Verify the container from another terminal:

BASE_URL=http://localhost:8000 bash smoke_test.sh
python smoke_test.py --base-url http://localhost:8000

Kubernetes Deployment

Before sharing the service across a team or exposing it through a gateway, review the security and gateway guide for TLS, authentication, upload limits, rate limits, and logging.

For an internal cluster service with persistent model cache, health probes, and a private ClusterIP, start from the Kubernetes deployment template. Build and push the example image, apply the manifests, then verify through kubectl port-forward with python smoke_test.py --base-url http://localhost:8000.

Keep the default CPU mode until you have built a CUDA-capable image and configured GPU scheduling for your cluster.

Configuration

Arg	Default	Description
`--host`	0.0.0.0	Bind address
`--port`	8000	Port
`--device`	cuda	Device (cuda/cpu/mps)
`--model`	sensevoice	Pre-load model at startup

Docker environment variables:

Env	Default	Description
`FUNASR_PORT`	8000	Container port passed to `server.py`
`FUNASR_DEVICE`	cpu	Container device mode; set to `cuda` only when the image has CUDA-capable dependencies
`FUNASR_MODEL`	sensevoice	Model alias loaded at container startup

Troubleshooting

If CUDA is unavailable, use --device cpu for a slower but simple smoke test.
If port 8000 is occupied, start with --port 9000 and run BASE_URL=http://localhost:9000 bash smoke_test.sh or python smoke_test.py --base-url http://localhost:9000.
If model download is slow, retry with a stable network or pre-download the model from ModelScope/Hugging Face.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

FunASR OpenAI-Compatible API Server

Quick Start

End-to-end smoke test

Browser demo with Gradio

Usage with OpenAI SDK (Python)

Usage with curl

Available Models

API Endpoints

Agent Framework Integration

LangChain Example

Docker Deployment

Kubernetes Deployment

Configuration

Troubleshooting

FilesExpand file tree

openai_api

Directory actions

More options

Directory actions

More options

Latest commit

History

openai_api

Folders and files

parent directory

README.md

FunASR OpenAI-Compatible API Server

Quick Start

End-to-end smoke test

Browser demo with Gradio

Usage with OpenAI SDK (Python)

Usage with curl

Available Models

API Endpoints

Agent Framework Integration

LangChain Example

Docker Deployment

Kubernetes Deployment

Configuration

Troubleshooting