-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2505.16410
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 135 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 82 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282
-
Electron flow matching for generative reaction mechanism prediction obeying conservation laws
Paper • 2502.12979 • Published -
oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning
Paper • 2510.07731 • Published • 6 -
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Paper • 2509.23768 • Published • 49 -
LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios
Paper • 2508.17692 • Published • 2
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 233 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
Paper • 2505.16410 • Published • 58 -
dongguanting/Tool-Star-SFT-54K
Viewer • Updated • 54k • 190 • 10 -
dongguanting/Multi-Tool-RL-10K
Viewer • Updated • 10k • 63 • 5 -
dongguanting/Tool-Star-Qwen-7B
Text Generation • 8B • Updated • 52 • 2
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 110 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 107
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Electron flow matching for generative reaction mechanism prediction obeying conservation laws
Paper • 2502.12979 • Published -
oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning
Paper • 2510.07731 • Published • 6 -
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Paper • 2509.23768 • Published • 49 -
LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios
Paper • 2508.17692 • Published • 2
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 135 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 82 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 233 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
Paper • 2505.16410 • Published • 58 -
dongguanting/Tool-Star-SFT-54K
Viewer • Updated • 54k • 190 • 10 -
dongguanting/Multi-Tool-RL-10K
Viewer • Updated • 10k • 63 • 5 -
dongguanting/Tool-Star-Qwen-7B
Text Generation • 8B • Updated • 52 • 2
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 110 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 107