openai/gsm8k
Benchmark
• Updated
• 17.6k • 588k • 1.19k
A collection of reasoning tasks to benchmark model abilities
Note for each example, it adds 8 variations - english queries
Note multiple languages; small "train" split, 250 items in the "test" one
Note 23 (hard) logical/mathematical tasks
Note joint task: retain the correct information on a text and perform a couple of mathematical operations to reach the result
Note (for NeuroHike) modify it and remove the "multiple choice" style of answer