-
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
Paper • 2511.02778 • Published • 103 -
Code2Video: A Code-centric Paradigm for Educational Video Generation
Paper • 2510.01174 • Published • 35 -
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 201 -
YanzheChen/MMMC
Viewer • Updated • 117 • 683 • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2602.09856
-
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 201 -
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
Paper • 2602.08808 • Published • 10 -
Thinking Makes LLM Agents Introverted: How Mandatory Thinking Can Backfire in User-Engaged Agents
Paper • 2602.07796 • Published • 7 -
QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search
Paper • 2602.09901 • Published • 6
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 19 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 44 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 52
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 201 -
Steerable Visual Representations
Paper • 2604.02327 • Published • 56 -
VOID: Video Object and Interaction Deletion
Paper • 2604.02296 • Published • 55 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 128
-
SWE-Universe: Scale Real-World Verifiable Environments to Millions
Paper • 2602.02361 • Published • 61 -
LongCodeZip: Compress Long Context for Code Language Models
Paper • 2510.00446 • Published • 108 -
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 201 -
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
Paper • 2601.11868 • Published • 37
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 39 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 228 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 204 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 201
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 107 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
Paper • 2511.02778 • Published • 103 -
Code2Video: A Code-centric Paradigm for Educational Video Generation
Paper • 2510.01174 • Published • 35 -
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 201 -
YanzheChen/MMMC
Viewer • Updated • 117 • 683 • 7
-
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 201 -
Steerable Visual Representations
Paper • 2604.02327 • Published • 56 -
VOID: Video Object and Interaction Deletion
Paper • 2604.02296 • Published • 55 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 128
-
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 201 -
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
Paper • 2602.08808 • Published • 10 -
Thinking Makes LLM Agents Introverted: How Mandatory Thinking Can Backfire in User-Engaged Agents
Paper • 2602.07796 • Published • 7 -
QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search
Paper • 2602.09901 • Published • 6
-
SWE-Universe: Scale Real-World Verifiable Environments to Millions
Paper • 2602.02361 • Published • 61 -
LongCodeZip: Compress Long Context for Code Language Models
Paper • 2510.00446 • Published • 108 -
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 201 -
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
Paper • 2601.11868 • Published • 37
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 39 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 228 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 204 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 201
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 19 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 44 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 52
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 107 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2