Back to Resources

vLLM

LLMs

High-throughput, memory-efficient LLM serving engine.

25kstars3kforksPython

About

vLLM is designed for fast LLM inference and serving using paged attention. It is widely used for production-scale deployments.

Key Features

  • Paged attention
  • Fast inference
  • OpenAI-compatible API

Tags

InferenceServingPerformanceProduction

Related Resources