Back to Resources
🎖️

TRL

Training & Fine-tuning

Transformer Reinforcement Learning for RLHF and alignment.

13kstars1.7kforksPython

About

TRL by Hugging Face provides tools for RLHF, DPO, PPO, GRPO, and reward modeling to align language models with human preferences. Actively evolving alongside alignment research.

Key Features

  • RLHF
  • PPO
  • DPO
  • GRPO
  • Reward modeling
  • SFT

Tags

RLHFAlignmentDPOHugging Face

Related Resources