-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 47 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 35 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 15 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 62
Collections
Discover the best community collections!
Collections including paper arxiv:2508.09736
-
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 57 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 133 -
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Paper • 2507.23726 • Published • 114 -
Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
Paper • 2507.17527 • Published • 1
-
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 57 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 122 -
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Paper • 2503.21460 • Published • 83 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 228
-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 86 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 57 -
Memp: Exploring Agent Procedural Memory
Paper • 2508.06433 • Published • 35
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal
Paper • 2508.05988 • Published • 19 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper • 2508.03346 • Published • 8 -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 29
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 122 -
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 59 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52 -
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7
-
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 195 -
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Paper • 2508.09968 • Published • 15
-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 47 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 35 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 15 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 62
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 57 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 133 -
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Paper • 2507.23726 • Published • 114 -
Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
Paper • 2507.17527 • Published • 1
-
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal
Paper • 2508.05988 • Published • 19 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper • 2508.03346 • Published • 8 -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 29
-
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 57 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 122 -
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Paper • 2503.21460 • Published • 83 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 228
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 122 -
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 59 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52 -
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7
-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 86 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 57 -
Memp: Exploring Agent Procedural Memory
Paper • 2508.06433 • Published • 35
-
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 195 -
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Paper • 2508.09968 • Published • 15