Back to Resources
๐Ÿ”ฌ

DeepEval

Evaluation

The open-source LLM evaluation framework with 50+ research-backed metrics.

7kstars600forksPython

About

DeepEval provides a pytest-style interface for unit-testing LLM outputs with metrics covering hallucination, relevancy, faithfulness, safety, and more. Backed by Confident AI's platform for team-level evals.

Key Features

  • 50+ metrics
  • Pytest integration
  • Hallucination detection
  • Safety scoring
  • RAG eval

Tags

EvaluationTestingLLMHallucinationCI/CD

Related Resources