Back to Resources
🎖️
TRL
Training & Fine-tuningTransformer Reinforcement Learning for RLHF and alignment.
13kstars1.7kforksPython
About
TRL by Hugging Face provides tools for RLHF, DPO, PPO, GRPO, and reward modeling to align language models with human preferences. Actively evolving alongside alignment research.
Key Features
- RLHF
- PPO
- DPO
- GRPO
- Reward modeling
- SFT
Tags
RLHFAlignmentDPOHugging Face