JustRL: Scaling a 1.5B LLM with a Simple RL Recipe Paper • 2512.16649 • Published 8 days ago • 22 • 3
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17 • 133
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Paper • 2511.02734 • Published Nov 4 • 20
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents Paper • 2402.09205 • Published Feb 14, 2024
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset Paper • 2504.03612 • Published Apr 4 • 2