| Method | MathVista | MMStar | Math-V | ChartQA | DynaMath | HallBench | MathVerse | MME | Average |
|---|---|---|---|---|---|---|---|---|---|
| Closed-Source Model | |||||||||
| GPT-4o | 63.8 | 63.9 | 30.3 | 85.7 | 63.7 | 55.0 | 39.4 | 2329 | 64.5 |
| Claude-3.5 Sonnet | 67.7 | 62.2 | - | 90.8 | 64.8 | 55.0 | - | 1920 | - |
| Open-Source Model | |||||||||
| Cambrain-1-8B | 49.0 | - | - | 73.3 | - | - | - | - | - |
| MM-1.5-7B | 47.6 | - | - | 78.6 | - | - | - | 1861 | - |
| Idefics3-LLaMA3-8B | 58.4 | 55.9 | - | 74.8 | - | - | - | 1937 | - |
| InternVL2-8B | 58.3 | 61.5 | - | 83.3 | 39.7 | - | - | 2210 | - |
| MiniCPM-V-2.6-8B | 60.6 | 57.5 | - | - | - | 48.1 | - | 2348 | - |
| DeepSeek-VL2-MOE-4.5B | 62.8 | 61.3 | - | 86.0 | - | - | - | 2253 | - |
| Reasoning Model | |||||||||
| LLaVA-CoT-11B | 54.8 | 57.6 | - | - | - | 47.8 | - | - | - |
| LLaVA-Reasoner-8B | 50.6 | 54.0 | - | 83.0 | - | - | - | - | - |
| Insight-V-8B | 49.8 | 57.4 | - | 77.4 | - | - | - | 2069 | - |
| Mulberry-7B | 63.1 | 61.3 | - | 83.9 | 45.1 | 54.1 | - | 2396 | - |
| LlamaV-o1-11B | 54.4 | 59.4 | - | - | - | 63.5 | - | - | - |
| Qwen2-VL-2B | 43.0 | 48.0 | 12.4 | 73.5 | 24.9 | 41.7 | 19.7 | 1872 | 41.2 |
| Qwen2-VL-2B-GRPO | 41.4 | 46.2 | 16.0 | 72.5 | 24.2 | 42.2 | 19.9 | 1930 | 41.4 |
| R1-VL-2B | 52.1 | 49.8 | 17.1 | 75.2 | 29.4 | 44.0 | 26.2 | 2048 | 45.8 |
| Qwen2-VL-7B | 58.2 | 60.7 | 16.3 | 83.0 | 42.1 | 50.6 | 32.5 | 2327 | 53.3 |
| Qwen2-VL-7B-GRPO | 55.1 | 59.8 | 19.1 | 81.3 | 33.9 | 48.5 | 30.9 | 2335 | 51.4 |
| R1-VL-7B | 63.5 | 60.0 | 24.7 | 83.9 | 45.2 | 54.7 | 40.0 | 2376 | 57.1 |
Table 1: Main experimental results.