Models Evaluated
19
Full results from the WebVR benchmark. Scores are reported for Global Aesthetics (GA), Navigation and Footer (NF), Section-Specific Layouts (SSL), Interaction and Motion (IM), and Overall.
Models Evaluated
19
Best Overall
79.14
Kimi-K2.5
Hardest Dimension
IM
Interaction and Motion
Best Open-source
79.14
Kimi-K2.5
#1
79.14 Overall
#2
78.49 Overall
#3
77.93 Overall
GA
89.76
NF
89.37
SSL
79.26
IM
60.10
Models are sorted by overall score. The top three rows are highlighted for quick comparison.
| Rank | Model | Type | GA | NF | SSL | IM | Overall |
|---|---|---|---|---|---|---|---|
| 1 | Kimi-K2.5 | Open-source | 87.44 | 89.21 | 79.26 | 60.10 | 79.14 |
| 2 | Claude-Sonnet-4.6 | Closed-source | 87.16 | 89.37 | 78.87 | 59.06 | 78.49 |
| 3 | GPT-5.2-Thinking | Closed-source | 89.76 | 89.08 | 77.27 | 59.97 | 77.93 |
| 4 | Claude-Opus-4.6 | Closed-source | 87.66 | 87.98 | 78.60 | 54.33 | 77.33 |
| 5 | Gemini-3.1-Pro-Preview | Closed-source | 88.30 | 87.29 | 77.09 | 56.50 | 76.69 |
| 6 | Seed-2.0-Pro | Closed-source | 82.88 | 86.27 | 73.35 | 45.88 | 71.88 |
| 7 | Gemini-3.0-Flash | Closed-source | 84.05 | 85.19 | 67.74 | 48.43 | 69.49 |
| 8 | Gemini-3.0-Pro | Closed-source | 80.84 | 81.79 | 66.31 | 46.86 | 67.32 |
| 9 | Gemini-2.5-Pro | Closed-source | 78.59 | 80.17 | 59.56 | 48.66 | 63.09 |
| 10 | Seed-1.8 | Closed-source | 75.06 | 77.95 | 62.21 | 36.33 | 61.98 |
| 11 | Qwen3.5-397B-A17B | Open-source | 80.46 | 76.62 | 58.81 | 41.96 | 61.33 |
| 12 | Claude-Sonnet-3.7 | Closed-source | 76.38 | 80.54 | 59.26 | 37.38 | 61.21 |
| 13 | Gemini-2.5-Flash | Closed-source | 71.95 | 70.90 | 51.52 | 39.76 | 55.62 |
| 14 | Qwen3-VL-235B-A22B-Thinking | Open-source | 61.20 | 68.04 | 43.11 | 29.30 | 46.80 |
| 15 | GPT-4.1 | Closed-source | 61.91 | 64.42 | 42.70 | 26.71 | 45.85 |
| 16 | Qwen3-VL-235B-A22B-Instruct | Open-source | 51.06 | 52.65 | 40.09 | 22.12 | 40.71 |
| 17 | Qwen3-VL-30B-A3B-Thinking | Open-source | 53.38 | 60.47 | 33.49 | 20.22 | 37.69 |
| 18 | Qwen3-VL-30B-A3B-Instruct | Open-source | 33.33 | 34.87 | 17.71 | 12.67 | 21.44 |
| 19 | GLM-4.6V | Open-source | 22.78 | 15.42 | 7.17 | 14.35 | 11.42 |
This view mirrors the grouping in the paper and makes it easier to compare model families within open-source and closed-source settings.
| Group | Model | GA | NF | SSL | IM | Overall |
|---|---|---|---|---|---|---|
| Open-source Models | ||||||
| Open-source | GLM-4.6V | 22.78 | 15.42 | 7.17 | 14.35 | 11.42 |
| Open-source | Qwen3-VL-30B-A3B-Instruct | 33.33 | 34.87 | 17.71 | 12.67 | 21.44 |
| Open-source | Qwen3-VL-30B-A3B-Thinking | 53.38 | 60.47 | 33.49 | 20.22 | 37.69 |
| Open-source | Qwen3-VL-235B-A22B-Instruct | 51.06 | 52.65 | 40.09 | 22.12 | 40.71 |
| Open-source | Qwen3-VL-235B-A22B-Thinking | 61.20 | 68.04 | 43.11 | 29.30 | 46.80 |
| Open-source | Qwen3.5-397B-A17B | 80.46 | 76.62 | 58.81 | 41.96 | 61.33 |
| Open-source | Kimi-K2.5 | 87.44 | 89.21 | 79.26 | 60.10 | 79.14 |
| Closed-source Models | ||||||
| Closed-source | GPT-4.1 | 61.91 | 64.42 | 42.70 | 26.71 | 45.85 |
| Closed-source | GPT-5.2-Thinking | 89.76 | 89.08 | 77.27 | 59.97 | 77.93 |
| Closed-source | Gemini-2.5-Flash | 71.95 | 70.90 | 51.52 | 39.76 | 55.62 |
| Closed-source | Gemini-2.5-Pro | 78.59 | 80.17 | 59.56 | 48.66 | 63.09 |
| Closed-source | Gemini-3.0-Flash | 84.05 | 85.19 | 67.74 | 48.43 | 69.49 |
| Closed-source | Gemini-3.0-Pro | 80.84 | 81.79 | 66.31 | 46.86 | 67.32 |
| Closed-source | Gemini-3.1-Pro-Preview | 88.30 | 87.29 | 77.09 | 56.50 | 76.69 |
| Closed-source | Claude-Sonnet-3.7 | 76.38 | 80.54 | 59.26 | 37.38 | 61.21 |
| Closed-source | Claude-Sonnet-4.6 | 87.16 | 89.37 | 78.87 | 59.06 | 78.49 |
| Closed-source | Claude-Opus-4.6 | 87.66 | 87.98 | 78.60 | 54.33 | 77.33 |
| Closed-source | Seed-1.8 | 75.06 | 77.95 | 62.21 | 36.33 | 61.98 |
| Closed-source | Seed-2.0-Pro | 82.88 | 86.27 | 73.35 | 45.88 | 71.88 |