SciEval-Leaderboard / Multimodal Model Scientific Capability.csv
naonaowyh's picture
initial leaderboard
fc80ff8 verified
raw
history blame contribute delete
885 Bytes
Model,Type,Parameters,Sci.MM-Percep.,Sci.Img-Und.,Sci.MM-Reason.,Overall,
Claude 4.5 Sonnet,Close,,57.87 ,43.64 ,56.11 ,52.54 ,
Claude4-1-Opus,Close,,58.25 ,45.19 ,58.66 ,54.03 ,
GPT-4o,Close,,52.78 ,25.93 ,57.97 ,45.56 ,
GPT-5,Close,,59.94 ,42.44 ,61.46 ,54.61 ,
GPT-o3,Close,,55.23 ,32.84 ,59.27 ,49.11 ,
Gemini-2.5-Flash,Close,,55.98 ,38.20 ,57.22 ,50.47 ,
Gemini-2.5-Pro,Close,,52.12 ,43.76 ,61.28 ,52.39 ,
Grok-2-vision-1212,Close,,64.00 ,25.04 ,51.76 ,46.93 ,
Seed1.6-vision,Close,,65.79 ,44.75 ,57.11 ,55.88 ,
GLM-4.5V,Open,106B,59.10 ,38.57 ,51.04 ,49.57 ,
InternS1,Open,241B,60.89 ,45.73 ,56.47 ,54.36 ,
Llama 4 Maverick,Open,400B,56.74 ,36.83 ,55.39 ,49.65 ,
Qwen3-VL-235B-A22B,Open,235B,72.29 ,38.35 ,50.83 ,53.82 ,
Qwen3-Max,Open,1000B,24.51 ,20.40 ,49.86 ,31.59 ,
GPT-5.1,Close,,54.10 ,33.05 ,58.73 ,48.63 ,
Gemini-3-Pro,Close,,66.54 ,55.62 ,66.49 ,62.88 ,