🔬

DeepEval

Evaluation

The open-source LLM evaluation framework with 50+ research-backed metrics.

7kstars600forksPython

About

DeepEval provides a pytest-style interface for unit-testing LLM outputs with metrics covering hallucination, relevancy, faithfulness, safety, and more. Backed by Confident AI's platform for team-level evals.

Key Features

50+ metrics
Pytest integration
Hallucination detection
Safety scoring
RAG eval

Related Resources

📏

RAGAS

Evaluation framework for RAG pipelines.

📋

LM Evaluation Harness

Unified framework for evaluating language models on 200+ benchmarks.

🧪

promptfoo

Test and evaluate LLM outputs with CI/CD integration.