Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 106
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Paper • 2410.01679 • Published Oct 2, 2024 • 27
Inference-Time Hyper-Scaling with KV Cache Compression Paper • 2506.05345 • Published Jun 5, 2025 • 28 • 3
MathMindsAGI/delethinkIter_R1DistillQwen1.5B_DEEPSCALER_BASELINE_20250812Baseline_seed_274631821 Updated Aug 26, 2025
MathMindsAGI/delethinkIter_R1DistillQwen1.5B_DEEPSCALER_BASELINE_20250812Baseline_seed_274631821 Updated Aug 26, 2025
MathMindsAGI/delethinkIter_R1DistillQwen1.5B_DEEPSCALER_LONG_20250812Long_seed_2746318213 Updated Aug 26, 2025
MathMindsAGI/delethinkIter_R1DistillQwen1.5B_DEEPSCALER_LONG_20250812Long_seed_2746318213 Updated Aug 26, 2025