Back to Resources
⚡
vLLM
LLMsHigh-throughput, memory-efficient LLM serving engine.
25kstars3kforksPython
About
vLLM is designed for fast LLM inference and serving using paged attention. It is widely used for production-scale deployments.
Key Features
- Paged attention
- Fast inference
- OpenAI-compatible API
Tags
InferenceServingPerformanceProduction