Multiple LLM Agents Debate for Equitable Cultural Alignment Paper • 2505.24671 • Published Sep 1, 2025
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks Paper • 2604.20987 • Published Apr 22 • 21
Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization Paper • 2407.06129 • Published Jul 8, 2024 • 1
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning Paper • 2507.22887 • Published Jul 30, 2025
V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions Paper • 2512.11995 • Published Dec 12, 2025 • 10
InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context Paper • 2603.05353 • Published Mar 5
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents Paper • 2604.18543 • Published Apr 20 • 30
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook Paper • 2602.14299 • Published Feb 15 • 27
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook Paper • 2602.14299 • Published Feb 15 • 27
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis Paper • 2602.12395 • Published Feb 12 • 17
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis Paper • 2602.12395 • Published Feb 12 • 17
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis Paper • 2602.12395 • Published Feb 12 • 17
Quantifying the Gap between Understanding and Generation within Unified Multimodal Models Paper • 2602.02140 • Published Feb 2 • 12
TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models Paper • 2601.18744 • Published Jan 26 • 10
TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models Paper • 2601.18744 • Published Jan 26 • 10
TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models Paper • 2601.18744 • Published Jan 26 • 10