You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published May 20 • 51
G-Zero: Self-Play for Open-Ended Generation from Zero Data Paper • 2605.09959 • Published May 11 • 17
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published May 8 • 70
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization Paper • 2604.09574 • Published Feb 24 • 30
Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published Feb 4 • 80
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing Paper • 2602.03845 • Published Feb 3 • 27
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 22
MotionEdit: Benchmarking and Learning Motion-Centric Image Editing Paper • 2512.10284 • Published Dec 11, 2025 • 26