Skip to main content

Welcome to Datumo Eval

Datumo Eval is an end-to-end evaluation platform that automates quality, reliability, and safety assessments for LLM-based AI services.
From domain-specific query generation to automated scoring, result analysis, and red-teaming, Datumo provides a consistent workflow covering the entire model evaluation lifecycle.

Getting Started with Datumo Eval

New to Datumo Eval? Use the guides below to get up and running quickly.


Why Datumo Eval?

  • Automated evaluation dataset generation
    Upload your documents and automatically generate Queries or Expected Responses, reducing preparation time significantly.

  • Built-in and Custom Metrics
    Use pre-built evaluators for Safety, RAG Quality, and Factuality, or design your own domain-specific evaluation metrics.

  • Flexible Evaluation Execution
    Datumo supports human review, algorithmic scoring, and LLM-judge–based automatic evaluation within a single platform.

  • Claim-Level Factuality & RAG Checker
    Responses are decomposed into granular claims to assess factual accuracy and retrieval quality with precision.

  • Automated red-teaming for safety evaluation
    Generate adversarial prompts automatically to test safety-violation risks and perform repeatable, consistent safety assessments using judge prompts.


Feature Overview

Explore the Docs