Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions chat/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,30 @@ curl "https://api.deepinfra.com/v1/openai/chat/completions" \

The response includes a `service_tier` field confirming which tier was used. Not all models support priority tiers — check the model page for availability.

## Max output tokens

The maximum number of tokens that can be generated in a single response is model-dependent, with a hard cap of 16384 tokens for most models. Set `max_tokens` to control the limit for a specific request.

### Continuing responses beyond the limit

If you need a longer response, use response continuation: send a follow-up request with the previous response included as an assistant message, and the model will continue from where it left off.

```bash
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"model": "deepseek-ai/DeepSeek-V3",
"messages": [
{"role": "user", "content": "Write a very long essay about AI."},
{"role": "assistant", "content": "<previous truncated response>"}
],
"max_tokens": 4096
}'
```

Note: response continuation cannot extend past the model's total context window. A 400 error is returned when the total context size is exceeded.

## What's next

<CardGroup cols={2}>
Expand Down
12 changes: 12 additions & 0 deletions models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,18 @@ Some models have more than one version available. You can infer against a partic

You can also infer against a `deploy_id` using `{"model": "deploy_id:DEPLOY_ID", ...}`. This is especially useful for [Custom LLMs](/private-models/custom-llms) — you can start inferring before the deployment finishes and before you have the model name + version pair.

## Model deprecation

Due to the fast-paced AI world, newer and better models are released every day. Occasionally we have to deprecate older models to maintain quality and affordability.

When a model is deprecated:

- **You'll receive at least 1 week's advance notice** before the deprecation date
- **Your applications won't break** — after deprecation, inference requests are automatically forwarded to a recommended replacement model
- **You'll get an email** notifying recent users of the model, including the deprecation date

You can browse the current list of available models at [deepinfra.com/models](https://deepinfra.com/models).

## Suggest a model

If you think there is a model that we should run, let us know at info@deepinfra.com. We read every email.