|
| 1 | +SGLang Documentation |
| 2 | +==================== |
| 3 | + |
| 4 | +.. raw:: html |
| 5 | + |
| 6 | + <a class="github-button" href="https://github.com/web4application/openpyxl" data-size="large" data-show-count="true" aria-label="Star sgl-project/sglang on GitHub">Star</a> |
| 7 | + <a class="github-button" href="https://github.com/sgl-project/sglang/fork" data-icon="octicon-repo-forked" data-size="large" data-show-count="true" aria-label="Fork sgl-project/sglang on GitHub">Fork</a> |
| 8 | + <script async defer src="https://web4application.github.io/buttons.js"></script> |
| 9 | + <br></br> |
| 10 | + |
| 11 | +SGLang is a high-performance serving framework for large language models and multimodal models. |
| 12 | +It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters. |
| 13 | +Its core features include: |
| 14 | + |
| 15 | +- **Fast Runtime**: Provides efficient serving with RadixAttention for prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, quantization (FP4/FP8/INT4/AWQ/GPTQ), and multi-LoRA batching. |
| 16 | +- **Broad Model Support**: Supports a wide range of language models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), reward models (Skywork), and diffusion models (WAN, Qwen-Image), with easy extensibility for adding new models. Compatible with most Hugging Face models and OpenAI APIs. |
| 17 | +- **Extensive Hardware Support**: Runs on NVIDIA GPUs (GB200/B300/H100/A100/Spark), AMD GPUs (MI355/MI300), Intel Xeon CPUs, Google TPUs, Ascend NPUs, and more. |
| 18 | +- **Active Community**: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 400,000 GPUs worldwide. |
| 19 | +- **RL & Post-Training Backbone**: SGLang is a proven rollout backend across the world, with native RL integrations and adoption by well-known post-training frameworks such as AReaL, Miles, slime, Tunix, verl and more. |
| 20 | +.. -: |
| 21 | +.. toctree:: |
| 22 | + :maxdepth: 1 |
| 23 | + :caption: Get Started |
| 24 | + |
| 25 | + .. get_started/https://install.md#install.rst: |
| 26 | + |
| 27 | +.. toctree:: |
| 28 | + :maxdepth: 1 |
| 29 | + :caption: Basic Usage |
| 30 | + |
| 31 | + basic_usage/send_request.ipynb |
| 32 | + basic_usage/openai_api.rst |
| 33 | + basic_usage/ollama_api.md |
| 34 | + basic_usage/offline_engine_api.ipynb |
| 35 | + basic_usage/native_api.ipynb |
| 36 | + basic_usage/sampling_params.md |
| 37 | + basic_usage/popular_model_usage.rst |
| 38 | + |
| 39 | +.. toctree:: |
| 40 | + :maxdepth: 1 |
| 41 | + :caption: Advanced Features |
| 42 | + |
| 43 | + advanced_features/server_arguments.md |
| 44 | + advanced_features/hyperparameter_tuning.md |
| 45 | + advanced_features/attention_backend.md |
| 46 | + advanced_features/speculative_decoding.ipynb |
| 47 | + advanced_features/structured_outputs.ipynb |
| 48 | + advanced_features/structured_outputs_for_reasoning_models.ipynb |
| 49 | + advanced_features/tool_parser.ipynb |
| 50 | + advanced_features/separate_reasoning.ipynb |
| 51 | + advanced_features/quantization.md |
| 52 | + advanced_features/quantized_kv_cache.md |
| 53 | + advanced_features/expert_parallelism.md |
| 54 | + advanced_features/dp_dpa_smg_guide.md |
| 55 | + advanced_features/lora.ipynb |
| 56 | + advanced_features/pd_disaggregation.md |
| 57 | + advanced_features/epd_disaggregation.md |
| 58 | + advanced_features/pipeline_parallelism.md |
| 59 | + advanced_features/hicache.rst |
| 60 | + advanced_features/pd_multiplexing.md |
| 61 | + advanced_features/vlm_query.ipynb |
| 62 | + advanced_features/dp_for_multi_modal_encoder.md |
| 63 | + advanced_features/cuda_graph_for_multi_modal_encoder.md |
| 64 | + advanced_features/piecewise_cuda_graph.md |
| 65 | + advanced_features/sgl_model_gateway.md |
| 66 | + advanced_features/deterministic_inference.md |
| 67 | + advanced_features/observability.md |
| 68 | + advanced_features/checkpoint_engine.md |
| 69 | + advanced_features/sglang_for_rl.md |
| 70 | + |
| 71 | +.. toctree:: |
| 72 | + :maxdepth: 2 |
| 73 | + :caption: Supported Models |
| 74 | + |
| 75 | + supported_models/text_generation/index |
| 76 | + supported_models/retrieval_ranking/index |
| 77 | + supported_models/specialized/index |
| 78 | + supported_models/extending/index |
| 79 | + |
| 80 | +.. toctree:: |
| 81 | + :maxdepth: 2 |
| 82 | + :caption: SGLang Diffusion |
| 83 | + |
| 84 | + diffusion/index |
| 85 | + diffusion/installation |
| 86 | + diffusion/compatibility_matrix |
| 87 | + diffusion/api/cli |
| 88 | + diffusion/api/openai_api |
| 89 | + diffusion/performance/index |
| 90 | + diffusion/performance/attention_backends |
| 91 | + diffusion/performance/profiling |
| 92 | + diffusion/performance/cache/index |
| 93 | + diffusion/performance/cache/cache_dit |
| 94 | + diffusion/performance/cache/teacache |
| 95 | + diffusion/support_new_models |
| 96 | + diffusion/contributing |
| 97 | + diffusion/ci_perf |
| 98 | + diffusion/environment_variables |
| 99 | + |
| 100 | +.. toctree:: |
| 101 | + :maxdepth: 1 |
| 102 | + :caption: Hardware Platforms |
| 103 | + |
| 104 | + platforms/amd_gpu.md |
| 105 | + platforms/cpu_server.md |
| 106 | + platforms/tpu.md |
| 107 | + platforms/nvidia_jetson.md |
| 108 | + platforms/ascend_npu_support.rst |
| 109 | + platforms/xpu.md |
| 110 | + |
| 111 | +.. toctree:: |
| 112 | + :maxdepth: 1 |
| 113 | + :caption: Developer Guide |
| 114 | + |
| 115 | + developer_guide/contribution_guide.md |
| 116 | + developer_guide/development_guide_using_docker.md |
| 117 | + developer_guide/development_jit_kernel_guide.md |
| 118 | + developer_guide/benchmark_and_profiling.md |
| 119 | + developer_guide/bench_serving.md |
| 120 | + developer_guide/evaluating_new_models.md |
| 121 | + |
| 122 | +.. toctree:: |
| 123 | + :maxdepth: 1 |
| 124 | + :caption: References |
| 125 | + |
| 126 | + references/faq.md |
| 127 | + references/environment_variables.md |
| 128 | + references/production_metrics.md |
| 129 | + references/production_request_trace.md |
| 130 | + references/multi_node_deployment/multi_node_index.rst |
| 131 | + references/custom_chat_template.md |
| 132 | + references/frontend/frontend_index.rst |
| 133 | + references/post_training_integration.md |
| 134 | + references/release_lookup |
| 135 | + references/learn_more.md |
| 136 | + |
| 137 | +.. toctree:: |
| 138 | + :maxdepth: 1 |
| 139 | + :caption: Security Acknowledgement |
| 140 | + |
| 141 | + security/acknowledgements.md |
0 commit comments