Spaces:
Running
Running
Commit History
Upload from GitHub Actions: improve norwegian fix 6f0e312 verified
Upload from GitHub Actions: fix norwegian 0cbac6c verified
Upload from GitHub Actions: Merge pull request #22 from datenlabor-bmz/dev 2cdada4 verified
Upload from GitHub Actions: Add auto-translated datasets 68a93b5 verified
Upload from GitHub Actions: Merge pull request #18 from datenlabor-bmz/pr-17 a0d1624 verified
Upload from GitHub Actions: Add auto-translated datasets c790fdb verified
Upload from GitHub Actions: updated and cleaned up scripts for new eval runs 963cb78 verified
Upload from GitHub Actions: Add Todos for using existing machine-translated datasets rather than our own ones 56adaa2 verified
Upload from GitHub Actions: updated translation functions 8f5ce26 verified
Upload from GitHub Actions: updated frontend and backend to fix bugs 4e8cb1a verified
Upload from GitHub Actions: Merge pull request #9 from datenlabor-bmz/jn-dev 7c06aef verified
Upload from GitHub Actions: TruthfulQA translation WIP fd102e9 verified
Upload from GitHub Actions: Get more results, compute average based on all tasks 98c6811 verified
Upload from GitHub Actions: Translate MMLU and evaluate 4c5c136 verified
Upload from GitHub Actions: Correlation plot b0aa389 verified
Upload from GitHub Actions: Evaluate on autotranslated GSM dataset f3a09a2 verified
Upload from GitHub Actions: Add math benchmarks 549360a verified
Upload from GitHub Actions: Use FLORES+ via Huggingface 913253a verified
Upload from GitHub Actions: Fix vibecoding 75010c2 verified
Pass through kwargs 5fa433f
David Pomerenke commited on
Fix dataset loading c990cb9
David Pomerenke commited on
Fix import paths c567aee
David Pomerenke commited on
added download function and edited INFO f529b7b
Only run tasks for which there is no result yet 2f9dee1
David Pomerenke commited on
Run on 40 languages, additional models 260c1a3
David Pomerenke commited on
Move functions for sharing them 55406ba
David Pomerenke commited on
Implement MMLU task a683732
David Pomerenke commited on
MMLU data loader for 3 parallel datasets 47170a5
David Pomerenke commited on
Analyze MMLU datasets 031925d
David Pomerenke commited on
Refactor eval code into files da6e1bc
David Pomerenke commited on