Judgment Evaluation Overview
This is the automated evaluation framework of Datumo Eval. The actual features provided may differ and consist of four main components: Evaluation Task, RAGAs Task, RAG Checker, and Auto-Red Teaming.
🗃️ 📊 Evaluation Task
7 items
🗃️ 🌐 RAGAs Task
4 items
🗃️ ✅ RAG Checker
3 items
🗃️ 🚨 Automated Red Teaming.
2 items
1. Evaluation Task
The Evaluation Task is the core evaluation workflow in Datumo Eval.
It allows you to evaluate the performance of AI models using evaluation metrics and to check and analyze the results on a dashboard.
::: note Details of Evaluation Task
- Evaluation by Task: A process for running evaluations, where you can create individual tasks to check performance.
- Evaluation Set Management: Manage the data collections used for evaluation, with the ability to stop/restart evaluations for each set.
- Evaluation Dashboard: Visually check and compare overall evaluation results through the dashboard.
- Detailed Evaluation Results: Check detailed performance results by question and by item. :::
2. RAGAs Task
Datumo provides a RAGAs-based evaluation feature, allowing you to automatically evaluate and quantify generated responses and retrieved contexts using RAGAs metrics.
3. RAG Checker
Evaluates the responses of a RAG system based on the Expected Response (ER) for evaluation questions. It assesses the performance of the RAG system's Generator and Retriever modules by measuring whether the claims in the ER (which serves as a model answer) are included in the retrieved documents (Context) and the RAG system's response.
4. Auto-Red Teaming
This is an automated red teaming system that uses a library of strategies to automatically generate attack prompts, which are then used to evaluate the safety and vulnerabilities of an AI model.
