🎖️

TRL

Training & Fine-tuning

Transformer Reinforcement Learning for RLHF and alignment.

13kstars1.7kforksPython

About

TRL by Hugging Face provides tools for RLHF, DPO, PPO, GRPO, and reward modeling to align language models with human preferences. Actively evolving alongside alignment research.

Key Features

RLHF
PPO
DPO
GRPO
Reward modeling
SFT

Related Resources

🦎

Axolotl

Streamlined fine-tuning of LLMs with PEFT and full-param support.

🎛️

PEFT

Parameter-efficient fine-tuning methods by Hugging Face.

🚀

DeepSpeed

Deep learning optimization library for large-scale training.

TRL