Back to Resources
🦙

llama.cpp

LLMs

Run LLaMA-class models efficiently on consumer hardware.

75kstars11kforksC++

About

llama.cpp is a high-performance C/C++ inference engine for LLaMA and compatible models. It enables quantized, CPU-first inference with minimal dependencies and supports the GGUF model format.

Key Features

  • Quantization
  • CPU inference
  • GGUF
  • Low memory usage

Tags

LLaMAInferenceLocal LLMQuantization

Related Resources