Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process Paper • 2512.23988 • Published Dec 30, 2025 • 19
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time Paper • 2512.25075 • Published Dec 31, 2025 • 15
Guiding a Diffusion Transformer with the Internal Dynamics of Itself Paper • 2512.24176 • Published Dec 30, 2025 • 8
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published Dec 30, 2025 • 52
AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction Paper • 2601.00796 • Published Jan 2 • 32
Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning Paper • 2512.24146 • Published Dec 30, 2025 • 14
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models Paper • 2601.03044 • Published Jan 6 • 28
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 229
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models Paper • 2601.03425 • Published Jan 6 • 16
AgentOCR: Reimagining Agent History via Optical Self-Compression Paper • 2601.04786 • Published Jan 8 • 30
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors Paper • 2601.07226 • Published Jan 12 • 33
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models Paper • 2601.07351 • Published Jan 12 • 26
Dr. Zero: Self-Evolving Search Agents without Training Data Paper • 2601.07055 • Published Jan 11 • 22
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale Paper • 2601.08225 • Published Jan 13 • 53
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published Jan 12 • 24
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published Jan 13 • 19
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 54
Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments Paper • 2601.01075 • Published Jan 3 • 6
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published Jan 13 • 149
Alterbute: Editing Intrinsic Attributes of Objects in Images Paper • 2601.10714 • Published Jan 15 • 31
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published Jan 14 • 33
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Paper • 2601.10355 • Published Jan 15 • 39
Language of Thought Shapes Output Diversity in Large Language Models Paper • 2601.11227 • Published Jan 16 • 9
More Images, More Problems? A Controlled Analysis of VLM Failure Modes Paper • 2601.07812 • Published Jan 12 • 6
Toward Efficient Agents: Memory, Tool learning, and Planning Paper • 2601.14192 • Published Jan 20 • 57
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 55
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models Paper • 2601.15224 • Published Jan 21 • 12
360Anything: Geometry-Free Lifting of Images and Videos to 360° Paper • 2601.16192 • Published Jan 22 • 9
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences Paper • 2601.07251 • Published Jan 12 • 11
KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices Paper • 2601.21579 • Published Jan 29 • 6
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 109
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Paper • 2601.21468 • Published Jan 29 • 25
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis Paper • 2602.03139 • Published Feb 3 • 45
Protein Autoregressive Modeling via Multiscale Structure Generation Paper • 2602.04883 • Published Feb 4 • 3
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published Feb 2 • 32
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published Feb 9 • 283
Reliable and Responsible Foundation Models: A Comprehensive Survey Paper • 2602.08145 • Published Feb 4 • 8
Col-Bandit: Zero-Shot Query-Time Pruning for Late-Interaction Retrieval Paper • 2602.02827 • Published Feb 2 • 3
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context Paper • 2602.12108 • Published Feb 12 • 13
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Paper • 2602.10179 • Published Feb 10 • 6
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics Paper • 2602.12617 • Published Feb 13 • 20
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks Paper • 2602.14689 • Published Feb 16 • 1
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Paper • 2602.15322 • Published Feb 17 • 10
Visual Persuasion: What Influences Decisions of Vision-Language Models? Paper • 2602.15278 • Published Feb 17 • 3
The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems Paper • 2602.15382 • Published Feb 17 • 2
Causal-JEPA: Learning World Models through Object-Level Latent Interventions Paper • 2602.11389 • Published Feb 11 • 7
Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality Paper • 2602.14080 • Published Feb 15 • 20
Multi-agent cooperation through in-context co-player inference Paper • 2602.16301 • Published Feb 18 • 24
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers Paper • 2602.16968 • Published Feb 19 • 12
CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing Paper • 2602.15823 • Published Feb 17 • 3
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published Feb 9 • 262
Spanning the Visual Analogy Space with a Weight Basis of LoRAs Paper • 2602.15727 • Published Feb 17 • 14
VLANeXt: Recipes for Building Strong VLA Models Paper • 2602.18532 • Published about 1 month ago • 52
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published 26 days ago • 99
Test-Time Training with KV Binding Is Secretly Linear Attention Paper • 2602.21204 • Published 26 days ago • 30
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking Paper • 2602.21196 • Published 26 days ago • 5
The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum Paper • 2602.21185 • Published 26 days ago • 3
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference Paper • 2602.21548 • Published 26 days ago • 46
From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors Paper • 2602.21778 • Published 25 days ago • 14
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models Paper • 2602.18993 • Published 29 days ago • 4
Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting Paper • 2602.20933 • Published 26 days ago • 4
VGG-T^3: Offline Feed-Forward 3D Reconstruction at Scale Paper • 2602.23361 • Published 24 days ago • 14
Causal Motion Diffusion Models for Autoregressive Motion Generation Paper • 2602.22594 • Published 25 days ago • 7
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning Paper • 2602.23258 • Published 24 days ago • 28
Mode Seeking meets Mean Seeking for Fast Long Video Generation Paper • 2602.24289 • Published 23 days ago • 41
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding Paper • 2602.23881 • Published 23 days ago • 18
How to Take a Memorable Picture? Empowering Users with Actionable Feedback Paper • 2602.21877 • Published 25 days ago • 14
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 19 days ago • 99
RealWonder: Real-Time Physical Action-Conditioned Video Generation Paper • 2603.05449 • Published 17 days ago • 12
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling Paper • 2603.04553 • Published 18 days ago • 3
Progressive Residual Warmup for Language Model Pretraining Paper • 2603.05369 • Published 17 days ago • 36
Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey Paper • 2603.04445 • Published 27 days ago • 4
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 13 days ago • 51
Lost in Backpropagation: The LM Head is a Gradient Bottleneck Paper • 2603.10145 • Published 12 days ago • 11
Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers Paper • 2603.10744 • Published 11 days ago • 7
Hindsight Credit Assignment for Long-Horizon LLM Agents Paper • 2603.08754 • Published 15 days ago • 5
HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios Paper • 2603.11975 • Published 10 days ago • 11
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 6 days ago • 142
WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation Paper • 2603.15132 • Published 6 days ago • 33
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning Paper • 2603.14482 • Published 7 days ago • 13
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents Paper • 2603.18815 • Published 3 days ago • 8