🧠 reasoning datasets - a saracandu Collection

saracandu 's Collections

⚗️ distilling kernel into transformers

⛳️ geometry of reasoning

🧠 reasoning datasets

🔄 STLdec - XAI

🔁 STLdec @ ECML-PKDD 2025

🧠 reasoning datasets

updated about 1 hour ago

A collection of reasoning tasks to benchmark model abilities

openai/gsm8k

Benchmark • Updated Dec 20, 2025 • 17.6k • 588k • 1.19k
qintongli/GSM-Plus

Viewer • Updated Jul 7, 2024 • 13k • 1.58k • 17

Note for each example, it adds 8 variations - english queries
juletxara/mgsm

Viewer • Updated Oct 9, 2025 • 2.84k • 10.1k • 40

Note multiple languages; small "train" split, 250 items in the "test" one
maveriq/bigbenchhard

Viewer • Updated Sep 29, 2023 • 6.51k • 1.01k • 40

Note 23 (hard) logical/mathematical tasks
ucinlp/drop

Viewer • Updated Jan 17, 2024 • 86.9k • 3.76k • 66

Note joint task: retain the correct information on a text and perform a couple of mathematical operations to reach the result
deepmind/aqua_rat

Viewer • Updated Jan 9, 2024 • 196k • 4.99k • 72

Note (for NeuroHike) modify it and remove the "multiple choice" style of answer
HuggingFaceH4/MATH-500

Viewer • Updated Dec 15, 2025 • 500 • 111k • 287
yale-nlp/FOLIO

Viewer • Updated Dec 21, 2023 • 1.2k • 940 • 66
saracandu/implications

Viewer • Updated Dec 10, 2025 • 19.9k • 12
HuggingFaceH4/aime_2024

Viewer • Updated Jan 26, 2025 • 30 • 31.5k • 60
opencompass/AIME2025

Viewer • Updated Feb 25, 2025 • 30 • 8.42k • 52
MathArena/hmmt_feb_2025

Viewer • Updated 23 days ago • 30 • 11.7k • 8
renma/ProntoQA

Viewer • Updated May 22, 2024 • 500 • 281 • 9