-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 118
Collections
Discover the best community collections!
Collections including paper arxiv:2510.13786
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 33 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 10 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 23 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 233 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 33 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 42 -
BitNet Distillation
Paper • 2510.13998 • Published • 59 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 53
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 514 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 142 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 276 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 148
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 514 -
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
Paper • 2509.05276 • Published • 5 -
Self-Adapting Language Models
Paper • 2506.10943 • Published • 7 -
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 33
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 26 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 20
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 118
-
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 33 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 42 -
BitNet Distillation
Paper • 2510.13998 • Published • 59 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 53
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 514 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 142 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 276 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 148
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 33 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 10 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 23 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 514 -
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
Paper • 2509.05276 • Published • 5 -
Self-Adapting Language Models
Paper • 2506.10943 • Published • 7 -
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 33
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 233 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 26 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 20