deepinfra · yesenarman · Apr 9, 2026
diff --git a/chat/overview.mdx b/chat/overview.mdx
@@ -220,6 +220,30 @@ curl "https://api.deepinfra.com/v1/openai/chat/completions" \
 
 The response includes a `service_tier` field confirming which tier was used. Not all models support priority tiers — check the model page for availability.
 
+## Max output tokens
+
+The maximum number of tokens that can be generated in a single response is model-dependent, with a hard cap of 16384 tokens for most models. Set `max_tokens` to control the limit for a specific request.
+
+### Continuing responses beyond the limit
+
+If you need a longer response, use response continuation: send a follow-up request with the previous response included as an assistant message, and the model will continue from where it left off.
+
+```bash
+curl "https://api.deepinfra.com/v1/openai/chat/completions" \
+    -H "Content-Type: application/json" \
+    -H "Authorization: Bearer $DEEPINFRA_TOKEN" \
+    -d '{
+        "model": "deepseek-ai/DeepSeek-V3",
+        "messages": [
+            {"role": "user", "content": "Write a very long essay about AI."},
+            {"role": "assistant", "content": "<previous truncated response>"}
+        ],
+        "max_tokens": 4096
+    }'
+```
+
+Note: response continuation cannot extend past the model's total context window. A 400 error is returned when the total context size is exceeded.
+
 ## What's next
 
 <CardGroup cols={2}>

diff --git a/models.mdx b/models.mdx
@@ -37,6 +37,18 @@ Some models have more than one version available. You can infer against a partic
 
 You can also infer against a `deploy_id` using `{"model": "deploy_id:DEPLOY_ID", ...}`. This is especially useful for [Custom LLMs](/private-models/custom-llms) — you can start inferring before the deployment finishes and before you have the model name + version pair.
 
+## Model deprecation
+
+Due to the fast-paced AI world, newer and better models are released every day. Occasionally we have to deprecate older models to maintain quality and affordability.
+
+When a model is deprecated:
+
+- **You'll receive at least 1 week's advance notice** before the deprecation date
+- **Your applications won't break** — after deprecation, inference requests are automatically forwarded to a recommended replacement model
+- **You'll get an email** notifying recent users of the model, including the deprecation date
+
+You can browse the current list of available models at [deepinfra.com/models](https://deepinfra.com/models).
+
 ## Suggest a model
 
 If you think there is a model that we should run, let us know at info@deepinfra.com. We read every email.