diff --git a/chat/overview.mdx b/chat/overview.mdx index 82978a7..a1e1e48 100644 --- a/chat/overview.mdx +++ b/chat/overview.mdx @@ -220,6 +220,30 @@ curl "https://api.deepinfra.com/v1/openai/chat/completions" \ The response includes a `service_tier` field confirming which tier was used. Not all models support priority tiers — check the model page for availability. +## Max output tokens + +The maximum number of tokens that can be generated in a single response is model-dependent, with a hard cap of 16384 tokens for most models. Set `max_tokens` to control the limit for a specific request. + +### Continuing responses beyond the limit + +If you need a longer response, use response continuation: send a follow-up request with the previous response included as an assistant message, and the model will continue from where it left off. + +```bash +curl "https://api.deepinfra.com/v1/openai/chat/completions" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $DEEPINFRA_TOKEN" \ + -d '{ + "model": "deepseek-ai/DeepSeek-V3", + "messages": [ + {"role": "user", "content": "Write a very long essay about AI."}, + {"role": "assistant", "content": ""} + ], + "max_tokens": 4096 + }' +``` + +Note: response continuation cannot extend past the model's total context window. A 400 error is returned when the total context size is exceeded. + ## What's next diff --git a/models.mdx b/models.mdx index 1b2d080..eaaa540 100644 --- a/models.mdx +++ b/models.mdx @@ -37,6 +37,18 @@ Some models have more than one version available. You can infer against a partic You can also infer against a `deploy_id` using `{"model": "deploy_id:DEPLOY_ID", ...}`. This is especially useful for [Custom LLMs](/private-models/custom-llms) — you can start inferring before the deployment finishes and before you have the model name + version pair. +## Model deprecation + +Due to the fast-paced AI world, newer and better models are released every day. Occasionally we have to deprecate older models to maintain quality and affordability. + +When a model is deprecated: + +- **You'll receive at least 1 week's advance notice** before the deprecation date +- **Your applications won't break** — after deprecation, inference requests are automatically forwarded to a recommended replacement model +- **You'll get an email** notifying recent users of the model, including the deprecation date + +You can browse the current list of available models at [deepinfra.com/models](https://deepinfra.com/models). + ## Suggest a model If you think there is a model that we should run, let us know at info@deepinfra.com. We read every email.