Back to Resources
🦙
llama.cpp
LLMsRun LLaMA-class models efficiently on consumer hardware.
72kstars10kforksC++
About
llama.cpp is a high-performance C/C++ inference engine for LLaMA and compatible models. It enables quantized, CPU-first inference with minimal dependencies.
Key Features
- Quantization
- CPU inference
- GGUF
- Low memory usage
Tags
LLaMAInferenceLocal LLMQuantization