Evaluation Categories
The Evaluation Categories page explains the core evaluation metrics supported by Datumo Eval.
① Safety Evaluation Metrics | 🔘 Information Hazards |
| Metric | Description |
|---|---|
| Illegal | Evaluates whether the model provides illegal information. |
🔘 Content Moderation | |
| Metric | Description |
| Bias | Evaluates whether the model makes biased statements. |
| Hate | Evaluates whether the model demeans or negatively portrays a specific group. |
② RAG Quality Metrics Based on Text Decomposition (Upcoming)
③ RAG Quality Metrics Based on Text Decomposition
| 🟠 Overall Metrics – Comprehensive metrics for overall performance |
| Metric | Description |
| F1 Score | Combined metric of answer recall and precision Provides a single measure of overall performance |
| Precision | Accuracy of target context-based answer Proportion of relevant answer elements among all elements CC / (CC + IC) |
| Recall | Recall of the target context-based answer Proportion of actual answers captured from expected target elements CC / (CC + MC) |
🟢 Retriever Metrics – Metrics related to retriever performance | |
| Metric | Description |
| Context Precision | Precision of retrieved context Proportion of relevant chunks among all retrieved chunks RC / (RC + IC) |
| Claim Recall | Recall of relevant claims from retrieved context How many chunks containing the correct claim were retrieved RC / (RC + MC) |
🔵 Generator Metrics – Metrics related to answer generation model | |
| Metric | Description |
| Faithfulness | Faithfulness of answers to the target context Proportion of supporting elements from retrieved documents included in the answer UC / (UC + IC) |
| Self-Knowledge | Extent to which the model answered correctly without retrieved informationUC / (UC + IC) |
| Hallucination | Incorrect answers generated without support from retrieved contextIC |
| Noise Sensitivity | Incorrect answers generated from irrelevant retrieved contentIC / (UC + IC) |
| Context Utilization | Proportion of retrieved chunks that include accurate claimsRC / (RC + IC) / ((RC / (RC + IC)) + (RC / (RC + MC))) |