⚡

vLLM

LLMs

High-throughput, memory-efficient LLM serving engine.

25kstars3kforksPython

About

vLLM is designed for fast LLM inference and serving using paged attention. It is widely used for production-scale deployments.

Framework for building LLM-powered applications using composable primitives.

Run LLaMA-class models efficiently on consumer hardware.

Simple local LLM runtime with a clean developer experience.