Inbox
updated
RuCCoD: Towards Automated ICD Coding in Russian
Paper
• 2502.21263
• Published
• 133
Unified Reward Model for Multimodal Understanding and Generation
Paper
• 2503.05236
• Published
• 123
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
• 2503.05179
• Published
• 46
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
• 2503.05592
• Published
• 27
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper
• 2503.02130
• Published
• 32
SafeArena: Evaluating the Safety of Autonomous Web Agents
Paper
• 2503.04957
• Published
• 21
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play
Context Control
Paper
• 2503.05639
• Published
• 26
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with
Reinforcing Learning
Paper
• 2503.05379
• Published
• 38
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
• 2503.04808
• Published
• 18
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Paper
• 2503.04872
• Published
• 15
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos
via Diffusion Models
Paper
• 2503.05638
• Published
• 20
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation
for Everyday Household Activities
Paper
• 2503.05652
• Published
• 11
ProReflow: Progressive Reflow with Decomposed Velocity
Paper
• 2503.04824
• Published
• 9
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Paper
• 2503.04548
• Published
• 9
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Paper
• 2503.05447
• Published
• 8
LONGCODEU: Benchmarking Long-Context Language Models on Long Code
Understanding
Paper
• 2503.04359
• Published
• 6
SAGE: A Framework of Precise Retrieval for RAG
Paper
• 2503.01713
• Published
• 7
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via
Training-Time Test
Paper
• 2503.01840
• Published
• 6
Know You First and Be You Better: Modeling Human-Like User Simulators
via Implicit Profiles
Paper
• 2502.18968
• Published
• 3
LoRACode: LoRA Adapters for Code Embeddings
Paper
• 2503.05315
• Published
• 13
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Paper
• 2503.04504
• Published
• 5
YuE: Scaling Open Foundation Models for Long-Form Music Generation
Paper
• 2503.08638
• Published
• 72
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by
Imitating Human Annotator Trajectories
Paper
• 2503.08625
• Published
• 27
Seedream 2.0: A Native Chinese-English Bilingual Image Generation
Foundation Model
Paper
• 2503.07703
• Published
• 37
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
• 2503.09573
• Published
• 76
Multimodal Language Modeling for High-Accuracy Single Cell
Transcriptomics Analysis and Generation
Paper
• 2503.09427
• Published
• 6
Video Action Differencing
Paper
• 2503.07860
• Published
• 33
LightGen: Efficient Image Generation through Knowledge Distillation and
Direct Preference Optimization
Paper
• 2503.08619
• Published
• 20
OmniMamba: Efficient and Unified Multimodal Understanding and Generation
via State Space Models
Paper
• 2503.08686
• Published
• 19
Exploiting Instruction-Following Retrievers for Malicious Information
Retrieval
Paper
• 2503.08644
• Published
• 16
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution
Autonomous Driving VQA from Peru
Paper
• 2503.07587
• Published
• 11
"Principal Components" Enable A New Language of Images
Paper
• 2503.08685
• Published
• 12
^RFLAV: Rolling Flow matching for infinite Audio Video generation
Paper
• 2503.08307
• Published
• 9
BiasEdit: Debiasing Stereotyped Language Models via Model Editing
Paper
• 2503.08588
• Published
• 7
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion
Models
Paper
• 2503.08417
• Published
• 8
AI-native Memory 2.0: Second Me
Paper
• 2503.08102
• Published
• 13
Benchmarking AI Models in Software Engineering: A Review, Search Tool,
and Enhancement Protocol
Paper
• 2503.05860
• Published
• 11
LocAgent: Graph-Guided LLM Agents for Code Localization
Paper
• 2503.09089
• Published
• 13
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents
Paper
• 2503.08684
• Published
• 5
Paper
• 2503.08507
• Published
• 7
More Documents, Same Length: Isolating the Challenge of Multiple
Documents in RAG
Paper
• 2503.04388
• Published
• 17
Quantizing Large Language Models for Code Generation: A Differentiated
Replication
Paper
• 2503.07103
• Published
• 8
Cost-Optimal Grouped-Query Attention for Long-Context LLMs
Paper
• 2503.09579
• Published
• 5
Self-Taught Self-Correction for Small Language Models
Paper
• 2503.08681
• Published
• 15
Multi Agent based Medical Assistant for Edge Devices
Paper
• 2503.05397
• Published
• 9
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented
Generation System
Paper
• 2503.09600
• Published
• 4
PhysicsGen: Can Generative Models Learn from Images to Predict Complex
Physical Relations?
Paper
• 2503.05333
• Published
• 8
Technologies on Effectiveness and Efficiency: A Survey of State Spaces
Models
Paper
• 2503.11224
• Published
• 28
API Agents vs. GUI Agents: Divergence and Convergence
Paper
• 2503.11069
• Published
• 36
Group-robust Machine Unlearning
Paper
• 2503.09330
• Published
• 1
Personalize Anything for Free with Diffusion Transformer
Paper
• 2503.12590
• Published
• 44
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
• 2503.12605
• Published
• 35
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
Paper
• 2503.13070
• Published
• 10
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
• 2503.14456
• Published
• 153
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion
Transformers via In-Context Reflection
Paper
• 2503.12271
• Published
• 9
Pensez: Less Data, Better Reasoning -- Rethinking French LLM
Paper
• 2503.13661
• Published
• 5
PyGDA: A Python Library for Graph Domain Adaptation
Paper
• 2503.10284
• Published
• 4
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous
Driving
Paper
• 2503.08683
• Published
• 2
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published
• 144
STEVE: AStep Verification Pipeline for Computer-use Agent Training
Paper
• 2503.12532
• Published
• 17
GKG-LLM: A Unified Framework for Generalized Knowledge Graph
Construction
Paper
• 2503.11227
• Published
• 25
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
Tasks
Paper
• 2503.15478
• Published
• 14
ELTEX: A Framework for Domain-Driven Synthetic Data Generation
Paper
• 2503.15055
• Published
• 6
Survey on Evaluation of LLM-based Agents
Paper
• 2503.16416
• Published
• 96
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play
Visual Games with Keyboards and Mouse
Paper
• 2503.16365
• Published
• 41
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
• 2503.16219
• Published
• 52
Why Do Multi-Agent LLM Systems Fail?
Paper
• 2503.13657
• Published
• 48
MAPS: A Multi-Agent Framework Based on Big Seven Personality and
Socratic Guidance for Multimodal Scientific Problem Solving
Paper
• 2503.16905
• Published
• 54
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for
Automated Prompt Optimization
Paper
• 2503.16874
• Published
• 45
Can Large Vision Language Models Read Maps Like a Human?
Paper
• 2503.14607
• Published
• 10
A Comprehensive Survey on Long Context Language Modeling
Paper
• 2503.17407
• Published
• 49
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement
Learning
Paper
• 2503.21620
• Published
• 62
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
• 2503.21460
• Published
• 83
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large
Reasoning Models with Iterative Retrieval Augmented Generation
Paper
• 2503.21729
• Published
• 29
Exploring the Evolution of Physics Cognition in Video Generation: A
Survey
Paper
• 2503.21765
• Published
• 11
Think Before Recommend: Unleashing the Latent Reasoning Power for
Sequential Recommendation
Paper
• 2503.22675
• Published
• 36
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards
for Reasoning-Enhanced Text-to-SQL
Paper
• 2503.23157
• Published
• 10
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
• 2504.01990
• Published
• 303
Rethinking RL Scaling for Vision Language Models: A Transparent,
From-Scratch Framework and Comprehensive Evaluation Scheme
Paper
• 2504.02587
• Published
• 32
One-Minute Video Generation with Test-Time Training
Paper
• 2504.05298
• Published
• 110
SmolVLM: Redefining small and efficient multimodal models
Paper
• 2504.05299
• Published
• 205
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published
• 87
MM-IFEngine: Towards Multimodal Instruction Following
Paper
• 2504.07957
• Published
• 35
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
• 2504.08600
• Published
• 33
WORLDMEM: Long-term Consistent World Simulation with Memory
Paper
• 2504.12369
• Published
• 35
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference
Optimization for Large Video Models
Paper
• 2504.13122
• Published
• 20
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published
• 49
UFO2: The Desktop AgentOS
Paper
• 2504.14603
• Published
• 29
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
• 2504.14870
• Published
• 35
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Paper
• 2504.15521
• Published
• 64
Describe Anything: Detailed Localized Image and Video Captioning
Paper
• 2504.16072
• Published
• 64
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Paper
• 2504.16082
• Published
• 5
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper
• 2504.16030
• Published
• 36
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published
• 123
Perception, Reason, Think, and Plan: A Survey on Large Multimodal
Reasoning Models
Paper
• 2505.04921
• Published
• 186
Flow-GRPO: Training Flow Matching Models via Online RL
Paper
• 2505.05470
• Published
• 88
Vision-Language-Action Models: Concepts, Progress, Applications and
Challenges
Paper
• 2505.04769
• Published
• 10
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published
• 261
Understanding Tool-Integrated Reasoning
Paper
• 2508.19201
• Published
• 32
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
Paper
• 2510.04618
• Published
• 129
Robot Learning: A Tutorial
Paper
• 2510.12403
• Published
• 124