Back to Resources
⚡
vLLM
LLMsHigh-throughput, memory-efficient LLM serving engine.
45kstars7kforksPython
About
vLLM is designed for fast LLM inference and serving using PagedAttention. Widely used for production-scale deployments with an OpenAI-compatible API and support for multi-GPU setups.
Key Features
- PagedAttention
- Fast inference
- OpenAI-compatible API
- Multi-GPU
Tags
InferenceServingPerformanceProduction