| Model,Type,Parameters,Knowl. Und.,Code Gen.,Symbolic Reason.,Hypoth. Gen.,Overall | |
| Claude 4.5 Sonnet,Close,,60.67 ,21.73 ,40.36 ,56.10 ,44.72 | |
| Claude4-1-Opus,Close,,60.87 ,25.32 ,38.69 ,29.47 ,38.58 | |
| GPT-4o,Close,,60.84 ,17.67 ,32.09 ,33.04 ,35.91 | |
| GPT-5,Close,,74.05 ,29.21 ,39.91 ,45.67 ,47.21 | |
| GPT-o3,Close,,76.05 ,25.26 ,38.14 ,34.14 ,43.40 | |
| Gemini-2.5-Flash,Close,,50.46 ,18.28 ,32.07 ,40.86 ,35.42 | |
| Gemini-2.5-Pro,Close,,59.34 ,24.77 ,34.96 ,50.73 ,42.45 | |
| Grok-2-vision-1212,Close,,50.14 ,20.60 ,28.21 ,49.63 ,37.14 | |
| Ling-flash-2.0,Open,100B,53.39 ,25.60 ,37.98 ,50.29 ,41.81 | |
| Seed1.6-vision,Close,,65.78 ,21.49 ,39.24 ,45.00 ,42.88 | |
| DeepSeek-R1,Open,685B,45.17 ,0.06 ,20.00 ,49.73 ,28.74 | |
| GLM-4.5V,Open,106B,52.78 ,3.24 ,13.43 ,42.23 ,27.92 | |
| InternS1,Open,241B,66.14 ,17.08 ,31.62 ,37.45 ,38.07 | |
| Kimi-k2,Open,1040B,62.49 ,20.86 ,38.59 ,42.28 ,41.06 | |
| Llama 4 Maverick,Open,400B,57.22 ,18.26 ,38.97 ,38.31 ,38.19 | |
| Qwen3-VL-235B-A22B,Open,235B,65.98 ,18.00 ,49.93 ,40.62 ,43.63 | |
| Qwen3-Max,Open,1000B,63.14 ,43.97 ,41.04 ,42.12 ,47.57 | |
| GPT-5.1,Close,,69.23 ,25.63 ,32.44 ,41.45 ,42.19 | |
| Gemini-3-Pro,Close,,66.06 ,29.57 ,45.19 ,61.51 ,50.58 |