How to Run RAG Checkerโ
RAG Checker automatically evaluates a RAG systemโs factual accuracy (Factuality) and retrievalโgeneration performance
by comparing the Expected Response (ER) with the Target Response (TR).
Once you upload a dataset and create an evaluation task,
the system automatically calculates key metrics such as Precision, Recall, Faithfulness, and Hallucination,
enabling you to quantitatively assess the modelโs factual consistency and context utilization.
Step 1. Prepare the Datasetโ
Before running the evaluation, prepare a dataset that includes the following columns:
๐ View Dataset Upload Guide
| Column | Description |
|---|---|
query | The userโs input question. |
expected_response | The Expected Response (ER) โ the reference or ground-truth answer for the query. |
response | The Target Response (TR) generated by the RAG system. |
retrieved_context1 | The document or passage retrieved by the model for answer generation. If multiple contexts are retrieved, add sequential columns such as retrieved_context2, retrieved_context3, and so on. |
- Supported file types:
.csv,.xlsx - Required columns:
query,expected_response,response,retrieved_context1
Step 2. Create a RAG Checker Taskโ
- In the left navigation panel, open RAG Checker.
- Click + Add Task in the upper-right corner.
- In the dialog box, fill in the following information:
- Task Name โ The name of the evaluation task.
- Description โ A brief summary of the task.
- Target Model โ The RAG system or LLM to be evaluated.
- Click [Create] to create the task.
Once the task is created, proceed to create an Evaluation Set.
Step 3. Create and Run an Evaluation Setโ
- Open the task detail page.
- Go to the [Evaluation Set] tab and click + New Eval Set.
- Configure the following settings:
- Decomposition Model โ Breaks responses into verifiable claims.
- Entailment Model โ Determines whether each claim is logically supported by the retrieved context.
- Set Name / Description โ Specify a name and optional description for the evaluation set.
- Select the dataset for evaluation.
- Click [Start Evaluation] to begin the evaluation.
Once started, RAG Checker automatically performs the following steps:
- Decomposes the model response into verifiable claims.
- Determines whether each claim is entailed by the retrieved passages.
Once the evaluation is complete, proceed to the next step to review and analyze the results.