Basic Evaluation
The Basic Evaluation feature allows users to perform standard model evaluations.
It is structured around a project-based flow, from dataset setup to result analysis.
For project creation, refer to the Setting Up Your First Project page.
DATUMO Evalβs basic evaluation consists of 4 main steps after creating a project:
- Create evaluation dataset
- Review and save the dataset
- Run evaluation and select model(s)
- Analyze results (Dashboard & Table View)
π Basic Evaluation Tutorial Overviewβ
Setting Up Dataset
Dataset & Review Editing
Running Evaluation
Viewing Results
Edit Results
Project Management
π Basic Evaluation Flowβ
Step 1: Create Evaluation Datasetβ
Before running an evaluation, you need to prepare a dataset.
You can upload documents (contexts) or directly provide queries and responses to create your dataset.
You can choose from the following 3 Upload Types:
| Upload Type | Structure | Description |
|---|---|---|
| Query Generation | Context-only | Upload documents only, and the system generates questions. You can set parameters and review sample queries before generating the full dataset. |
| Query Upload | Context + Query | Upload documents and user-written queries for direct use in evaluation. |
| Response Upload | Query + Response | Upload both queries and model responses to run evaluations without collecting new responses. |
Available target model options and evaluation flows vary depending on the upload type.
π See Setting Up Dataset for details.
Step 2: Review and Save Datasetβ
Review the structure of your dataset (queries/responses) and save it.
You can also edit or supplement the dataset before evaluation if needed.
π For editing features, refer to Dataset & Review Editing.
Step 3: Run Evaluationβ
After saving your dataset, select your target model(s) and start the evaluation.
Model selection rules vary depending on the upload type:
-
Context-only / Context + Query
β Multiple models can be selected. -
Query + Response
β Only models with pre-generated responses can be selected.
(β οΈ Models added via "Add Anyway" will not generate new responses.)
π For full instructions, visit Running Evaluation.
Step 4: View Evaluation Resultsβ
Once evaluation is complete, results will be shown on the dashboard:
- Score distributions per metric
- Individual response scores and reasons
- Detailed views via the Table View
π Learn more at Viewing Results.
Other Evaluation Typesβ
-
RAG Checker PRO
β Evaluate factual accuracy and document-grounded responses. -
Red-teaming Add-on
β Assess model vulnerabilities and safety risks.