Flow Judge v0.1 held-out test datasets This collection contains held-out splits for testing Flow-Judge-v0.1. flowaicom/Flow-Judge-v0.1-binary-heldout Viewer • Updated Sep 18, 2024 • 316 • 18 flowaicom/Flow-Judge-v0.1-3-likert-heldout Viewer • Updated Sep 18, 2024 • 300 • 15 flowaicom/Flow-Judge-v0.1-5-likert-heldout Viewer • Updated Sep 18, 2024 • 274 • 12
Flow-Judge-v0.1 Flow-Judge-v0.1 models flowaicom/Flow-Judge-v0.1 Text Generation • 4B • Updated Oct 7, 2024 • 350 • 69 flowaicom/Flow-Judge-v0.1-AWQ Text Generation • 4B • Updated Oct 9, 2024 • 106 • 6 flowaicom/Flow-Judge-v0.1-GGUF Text Generation • 4B • Updated Sep 18, 2024 • 35 • 10
Flow-Judge-v0.1 out-of-domain evaluation datasets This collection contains out-of-domain datasets used to evaluate the generalization capabilities of Flow-Judge-v0.1 flowaicom/Feedback-Bench Viewer • Updated Sep 14, 2024 • 1k • 48 flowaicom/HaluEval Viewer • Updated Sep 14, 2024 • 10k • 146 flowaicom/PubMedQA Viewer • Updated Sep 14, 2024 • 1k • 17 • 1 flowaicom/covid_qa Viewer • Updated Sep 14, 2024 • 1k • 10
Flow Judge v0.1 held-out test datasets This collection contains held-out splits for testing Flow-Judge-v0.1. flowaicom/Flow-Judge-v0.1-binary-heldout Viewer • Updated Sep 18, 2024 • 316 • 18 flowaicom/Flow-Judge-v0.1-3-likert-heldout Viewer • Updated Sep 18, 2024 • 300 • 15 flowaicom/Flow-Judge-v0.1-5-likert-heldout Viewer • Updated Sep 18, 2024 • 274 • 12
Flow-Judge-v0.1 out-of-domain evaluation datasets This collection contains out-of-domain datasets used to evaluate the generalization capabilities of Flow-Judge-v0.1 flowaicom/Feedback-Bench Viewer • Updated Sep 14, 2024 • 1k • 48 flowaicom/HaluEval Viewer • Updated Sep 14, 2024 • 10k • 146 flowaicom/PubMedQA Viewer • Updated Sep 14, 2024 • 1k • 17 • 1 flowaicom/covid_qa Viewer • Updated Sep 14, 2024 • 1k • 10
Flow-Judge-v0.1 Flow-Judge-v0.1 models flowaicom/Flow-Judge-v0.1 Text Generation • 4B • Updated Oct 7, 2024 • 350 • 69 flowaicom/Flow-Judge-v0.1-AWQ Text Generation • 4B • Updated Oct 9, 2024 • 106 • 6 flowaicom/Flow-Judge-v0.1-GGUF Text Generation • 4B • Updated Sep 18, 2024 • 35 • 10