SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models Paper • 2511.08379 • Published Nov 11, 2025 • 4
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test Paper • 2503.01840 • Published Mar 3, 2025 • 6
HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam Paper • 2602.13964 • Published 18 days ago • 10
Discovering Multiagent Learning Algorithms with Large Language Models Paper • 2602.16928 • Published 14 days ago • 16
Understanding Silent Data Corruption in LLM Training Paper • 2502.12340 • Published Feb 17, 2025 • 1
Silent Data Corruption by 10x Test Escapes Threatens Reliable Computing Paper • 2508.01786 • Published Aug 3, 2025 • 1
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published Jan 9 • 56
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27, 2024 • 24
view article Article Apriel-1.6-15b-Thinker: Cost-efficient Frontier Multimodal Performance Dec 9, 2025 • 84
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26, 2025 • 93
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement Paper • 2410.20285 • Published Oct 26, 2024 • 1
Steering Reasoning VLAs Collection Steering Reasoning VLA in robotics manipulation https://www.arxiv.org/abs/2510.16281 • 2 items • Updated 1 day ago • 1
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models Paper • 2304.06364 • Published Apr 13, 2023 • 3
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries Paper • 2409.12640 • Published Sep 19, 2024 • 3