🦙

llama.cpp

LLMs

Run LLaMA-class models efficiently on consumer hardware.

72kstars10kforksC++

About

llama.cpp is a high-performance C/C++ inference engine for LLaMA and compatible models. It enables quantized, CPU-first inference with minimal dependencies.

Key Features

Quantization
CPU inference
GGUF
Low memory usage

Related Resources

🦜

LangChain

Framework for building LLM-powered applications using composable primitives.

🦙

Ollama

Simple local LLM runtime with a clean developer experience.

⚡

vLLM

High-throughput, memory-efficient LLM serving engine.