Welcome to Datumo Eval

Datumo Eval is an end-to-end evaluation platform that automates quality, reliability, and safety assessments for LLM-based AI services.
From domain-specific query generation to automated scoring, result analysis, and red-teaming, Datumo provides a consistent workflow covering the entire model evaluation lifecycle.

Getting Started with Datumo Eval

New to Datumo Eval? Use the guides below to get up and running quickly.

⏱ Quick Start View Tutorials

Why Datumo Eval?

Automated evaluation dataset generation
Upload your documents and automatically generate Queries or Expected Responses, reducing preparation time significantly.
Built-in and Custom Metrics
Use pre-built evaluators for Safety, RAG Quality, and Factuality, or design your own domain-specific evaluation metrics.
Flexible Evaluation Execution
Datumo supports human review, algorithmic scoring, and LLM-judge–based automatic evaluation within a single platform.
Claim-Level Factuality & RAG Checker
Responses are decomposed into granular claims to assess factual accuracy and retrieval quality with precision.
Automated red-teaming for safety evaluation
Generate adversarial prompts automatically to test safety-violation risks and perform repeatable, consistent safety assessments using judge prompts.

Feature Overview

Visualize model performance at a glance across versions, including strengths, weaknesses, and key metrics.

Create evaluation queries that mirror real-world service usage and manage model responses systematically.

Upload CSV/Excel files to build and version your datasets.

Automatically score hundreds or thousands of responses using predefined rules and metrics.
Ideal for efficient, repeatable large-scale evaluations.

Conduct rubric-based qualitative assessments or evaluate model responses interactively through live conversations.

Automatically measure model performance using benchmark datasets and reference-based metrics.

Connect models, define custom metrics, and manage access control to operate secure evaluation workflows.

Welcome to Datumo Eval

Getting Started with Datumo Eval

Why Datumo Eval?

Feature Overview

📊 Overview

🗂️ Dataset Management

🤖 LLM-Judge Evaluation Judgment Evaluation

🧑‍⚖️ Human Evaluation Human Evaluation

📐 Quantitative Evaluation Quantitative Evaluation

⚙️ Workspace Settings (Models · Metrics · Environment)

Explore the Docs

Introduction

Workspace & Settings

Evaluation Workflows

Getting Started with Datumo Eval​

Why Datumo Eval?​

Feature Overview​

📊 Overview

🗂️ Dataset Management

🤖 LLM-Judge Evaluation Judgment Evaluation

🧑‍⚖️ Human Evaluation Human Evaluation

📐 Quantitative Evaluation Quantitative Evaluation

⚙️ Workspace Settings (Models · Metrics · Environment)

Explore the Docs​

Introduction

Workspace & Settings

Evaluation Workflows

Getting Started with Datumo Eval

Why Datumo Eval?

Feature Overview

Explore the Docs