RLAIF/dpo_answer_openorca_base_nathan_2e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 8
RLAIF/dpo_answer_openorca_base_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 8
RLAIF/dpo_answer_openorca_angel_base_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 45.9k • 8
RLAIF/dpo_answer_openorca_angel_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 8
RLAIF/dpo_answer_openorca_angel_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 42.4k • 8
RLAIF/dpo_answer_openorca_openorca_argilla_rejudged_filtered_1e-6_0.02_1.7B_4B_with_gold_labels_kl_est
Viewer
• Updated • 44.1k • 8
RLAIF/dpo_answer_openorca_skywork_rejudged_filtered_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 38.8k • 8
RLAIF/dpo_answer_openorca_baseline_mix_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 8
RLAIF/dpo_answer_openorca_openorca_argilla_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 60k • 8
RLAIF/dpo_answer_openorca_skywork_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 51.2k • 8
RLAIF/dpo_answer_openorca_helpsteer3_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 40.6k • 8
RLAIF/dpo_answer_openorca_ppe_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 37.1k • 8
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated • 141k • 7
RLAIF/dpo_answer_openorca_ultrafeedback_s3_lr1e6_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_est
Viewer
• Updated • 65.3k • 7
RLAIF/dpo_answer_openorca_ultrafeedback_s100_lr1e6_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_est
Viewer
• Updated • 65.3k • 8
RLAIF/dpo_answer_openorca_ultrafeedback_s336_lr1e5_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_est
Viewer
• Updated • 58.2k • 8
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_8B_with_gold_labels_kl_estimation
Viewer
• Updated • 152k • 7
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_14B_with_gold_labels_kl_estimation
Viewer
• Updated • 152k • 8
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B_with_gold_labels_kl_estimation
Viewer
• Updated • 152k • 7
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 152k • 7
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_8B_with_gold_labels_kl_estimation
Viewer
• Updated • 152k • 8
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 152k • 7
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B_with_gold_labels_kl_estimation
Viewer
• Updated • 152k • 8
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated • 152k • 8
RLAIF/dpo_uf_rejudged_mixed_openorca_with_gold_labels_kl_estimation
Viewer
• Updated • 152k • 7
RLAIF/dpo_answer_2e-6_openorca_prompts_responses_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated • 86.5k • 7
RLAIF/dpo_uf_rejudged_mixed_openorca_kl_estimation
Viewer
• Updated • 65.6k • 7
RLAIF/dpo_uf_rejudged_mixed_openorca_kl_est
Viewer
• Updated • 65.6k • 7
RLAIF/dpo_answer_offtheshelf_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated • 49.4k • 8
RLAIF/dpo_answer_ultrafeedback_filtered_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated • 49.4k • 7