PatchWorld: Gradient-Free Optimization of Executable World Models Paper • 2605.30880 • Published 30 days ago • 12
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 24 days ago • 44
SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning Paper • 2605.01489 • Published May 26 • 1
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published May 14 • 79
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published May 13 • 89
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification Paper • 2601.15808 • Published Jan 22 • 20
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published Jan 16 • 31
AutoGraph-R1 Collection Directly Optimizing Knowledge Graph Construction for RAG using Reinforcement Learning • 11 items • Updated Oct 24, 2025 • 2
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8, 2025 • 28
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training Paper • 2508.00414 • Published Aug 1, 2025 • 96
From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery Paper • 2505.13259 • Published May 19, 2025 • 1