hsvgbkhgbv 's Collections LLM papers
updated
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
• 2510.03222
• Published • 76
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
• 2510.05592
• Published • 112
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published • 515
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published • 31
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
• 2509.22576
• Published • 137
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published • 193
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making
through Multi-Turn Reinforcement Learning
Paper
• 2509.08755
• Published • 56
GEM: A Gym for Agentic LLMs
Paper
• 2510.01051
• Published • 92
Agentic Entropy-Balanced Policy Optimization
Paper
• 2510.14545
• Published • 108
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
• 2510.09577
• Published • 8
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Paper
• 2512.15687
• Published • 22
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper
• 2512.13607
• Published • 39
Paper
• 2512.16301
• Published • 108
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Paper
• 2512.13874
• Published • 18
Recursive Language Models
Paper
• 2512.24601
• Published • 96
Token-Level LLM Collaboration via FusionRoute
Paper
• 2601.05106
• Published • 40
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Paper
• 2601.09667
• Published • 92
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Paper
• 2601.15165
• Published • 74
Behavior Knowledge Merge in Reinforced Agentic Models
Paper
• 2601.13572
• Published • 27
Learning to Discover at Test Time
Paper
• 2601.16175
• Published • 45
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper
• 2601.16443
• Published • 18
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published • 43
Linear representations in language models can change dramatically over a conversation
Paper
• 2601.20834
• Published • 21
Self-Distillation Enables Continual Learning
Paper
• 2601.19897
• Published • 36
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper
• 2602.02488
• Published • 36
Self-Hinting Language Models Enhance Reinforcement Learning
Paper
• 2602.03143
• Published • 31
Experiential Reinforcement Learning
Paper
• 2602.13949
• Published • 75
Multi-agent cooperation through in-context co-player inference
Paper
• 2602.16301
• Published • 24
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots
Paper
• 2602.18071
• Published • 22
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper
• 2603.03276
• Published • 105
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
Paper
• 2603.27771
• Published • 52
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
Paper
• 2604.02029
• Published • 151
Memory Intelligence Agent
Paper
• 2604.04503
• Published • 58
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
Paper
• 2604.04707
• Published • 203
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
Paper
• 2604.06132
• Published • 121
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Paper
• 2604.04323
• Published • 41
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Paper
• 2604.08377
• Published • 291
Qualixar OS: A Universal Operating System for AI Agent Orchestration
Paper
• 2604.06392
• Published • 19
MultiWorld: Scalable Multi-Agent Multi-View Video World Models
Paper
• 2604.18564
• Published • 46
Heterogeneous Scientific Foundation Model Collaboration
Paper
• 2604.27351
• Published • 218
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction
Paper
• 2604.27221
• Published • 38
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
Paper
• 2605.03042
• Published • 120