Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published 21 days ago • 63
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models Paper • 2510.06014 • Published Oct 7, 2025 • 10
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27, 2025 • 96
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Paper • 2510.19600 • Published Oct 22, 2025 • 68
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 176
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published Oct 7, 2025 • 141
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper • 2510.08189 • Published Oct 9, 2025 • 26
AutoPR: Let's Automate Your Academic Promotion! Paper • 2510.09558 • Published Oct 10, 2025 • 51
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions Paper • 2510.05318 • Published Oct 6, 2025 • 21
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Paper • 2509.26490 • Published Sep 30, 2025 • 19
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published Sep 29, 2025 • 18
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18, 2025 • 111
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19, 2025 • 118
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published Aug 20, 2025 • 85
AI4Research: A Survey of Artificial Intelligence for Scientific Research Paper • 2507.01903 • Published Jul 2, 2025 • 5
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16, 2025 • 41
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Paper • 2502.13092 • Published Feb 18, 2025 • 13
ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model Paper • 2502.03325 • Published Feb 5, 2025 • 1
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8, 2025 • 287