Evaluation Chain
EvaluatorChain for assessing the performance of a RAG pipeline.
EvaluatorChain
Bases: Chain, RunEvaluator
Chain used to evaluate RAG outputs using custom metrics.
This chain wraps around a scoring metric (e.g., semantic similarity, BLEU) and applies it to a structured sample. It is compatible with LangSmith's evaluation framework.
Attributes:
| Name | Type | Description |
|---|---|---|
metric |
Metric
|
The scoring metric used for evaluation. Can support LLM or embedding models. |
Source code in ragbot\evaluation\eval_chain.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
input_keys
property
Defines input keys required by the metric.
Returns:
| Type | Description |
|---|---|
List[str]
|
List[str]: Required column names for the metric. |
output_keys
property
Defines the output keys produced by the evaluation.
Returns:
| Type | Description |
|---|---|
List[str]
|
List[str]: The name of the metric used as the output key. |
__init__(metric, **kwargs)
Initializes the EvaluatorChain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
Metric
|
The evaluation metric instance. |
required |
**kwargs
|
Any
|
Optional keyword arguments, including 'llm' or 'embeddings' if required by the metric. |
{}
|
Source code in ragbot\evaluation\eval_chain.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | |
evaluate_run(run, example=None)
Evaluates a single run against a reference example.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
run
|
Run
|
The RAG pipeline run to evaluate. |
required |
example
|
Optional[Example]
|
Ground truth data for evaluation. |
None
|
Returns:
| Type | Description |
|---|---|
Union[EvaluationResult, EvaluationResults]
|
Union[EvaluationResult, EvaluationResults]: Evaluation results for the run. |
Source code in ragbot\evaluation\eval_chain.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | |