ROUGE
ROUGE score metric for evaluating similarity between generated and reference answers.
ROUGE
dataclass
Bases: Metric
ROUGE score metric for RAG evaluation.
Uses rouge_score to compute overlap-based metrics such as ROUGE-1 or ROUGE-L.
You can configure the specific ROUGE type and the score mode (F1, precision, recall).
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of the metric. |
rouge_type |
str
|
Type of ROUGE to use ('rouge1' or 'rougeL'). |
mode |
str
|
Score mode ('fmeasure', 'precision', or 'recall'). |
Source code in ragbot\evaluation\metrics\rouge.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
__post_init__()
Initialize the ROUGE scorer and ensure required package is available.
Source code in ragbot\evaluation\metrics\rouge.py
30 31 32 33 34 35 36 37 38 | |
score(sample, **kwargs)
Compute ROUGE score for a given sample.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample
|
Sample
|
Sample containing both |
required |
**kwargs
|
Any
|
Optional keyword arguments (not used here). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The ROUGE score (f-measure, precision, or recall depending on config). |
Source code in ragbot\evaluation\metrics\rouge.py
40 41 42 43 44 45 46 47 48 49 50 51 52 | |