# WinRateCallback[[trl.WinRateCallback]]

#### trl.WinRateCallback[[trl.WinRateCallback]]

[Source](https://github.com/huggingface/trl/blob/v0.26.1/trl/experimental/winrate_callback.py#L91)

A [TrainerCallback](https://huggingface.co/docs/transformers/v5.0.0rc1/en/main_classes/callback#transformers.TrainerCallback) that computes the win rate of a model based on a reference.

It generates completions using prompts from the evaluation dataset and compares the trained model's outputs against
a reference. The reference is either the initial version of the model (before training) or the reference model, if
available in the trainer. During each evaluation step, a judge determines how often the trained model's completions
win against the reference using a judge. The win rate is then logged in the trainer's logs under the key
`"eval_win_rate"`.

Usage:
```python
from trl import DPOTrainer
from trl.experimental.judges import PairRMJudge
from trl.experimental.winrate_callback import WinRateCallback

trainer = DPOTrainer(...)
judge = PairRMJudge()
win_rate_callback = WinRateCallback(judge=judge, trainer=trainer)
trainer.add_callback(win_rate_callback)
```

**Parameters:**

judge ([experimental.judges.BasePairwiseJudge](/docs/trl/v0.26.1/en/judges#trl.BasePairwiseJudge)) : The judge to use for comparing completions.

trainer (`Trainer`) : Trainer to which the callback will be attached. The trainer's evaluation dataset must include a `"prompt"` column containing the prompts for generating completions. If the `Trainer` has a reference model (via the `ref_model` attribute), it will use this reference model for generating the reference completions; otherwise, it defaults to using the initial model.

generation_config ([GenerationConfig](https://huggingface.co/docs/transformers/v5.0.0rc1/en/main_classes/text_generation#transformers.GenerationConfig), *optional*) : The generation config to use for generating completions.

num_prompts (`int`, *optional*) : The number of prompts to generate completions for. If not provided, defaults to the number of examples in the evaluation dataset.

shuffle_order (`bool`, *optional*, defaults to `True`) : Whether to shuffle the order of the completions before judging.

use_soft_judge (`bool`, *optional*, defaults to `False`) : Whether to use a soft judge that returns a win probability between 0 and 1 for the first completion vs the second.