Back to Resources
🦙

llama.cpp

LLMs

Run LLaMA-class models efficiently on consumer hardware.

72kstars10kforksC++

About

llama.cpp is a high-performance C/C++ inference engine for LLaMA and compatible models. It enables quantized, CPU-first inference with minimal dependencies.

Key Features

  • Quantization
  • CPU inference
  • GGUF
  • Low memory usage

Tags

LLaMAInferenceLocal LLMQuantization

Related Resources