1 17 3

Kaiyuan Chen

Lucky2022

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

liked a dataset 10 days ago

xbench/AgentIF-OneDay

updated a dataset 10 days ago

xbench/AgentIF-OneDay

View all activity

Organizations

upvoted a paper 6 days ago

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

Paper • 2601.20613 • Published 11 days ago • 10

liked a dataset 10 days ago

xbench/AgentIF-OneDay

Viewer • Updated 10 days ago • 58 • 244 • 3

updated a dataset 10 days ago

xbench/AgentIF-OneDay

Viewer • Updated 10 days ago • 58 • 244 • 3

published a dataset 21 days ago

xbench/AgentIF-OneDay

Viewer • Updated 10 days ago • 58 • 244 • 3

upvoted a paper 27 days ago

BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published 30 days ago • 196

upvoted 2 papers 2 months ago

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published Dec 8, 2025 • 38

P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published Nov 17, 2025 • 134

authored a paper 3 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 38

upvoted 2 papers 3 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 38

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 209

upvoted a collection 6 months ago

Seed-OSS

Collection

Seed-OSS Open-Source Models • 3 items • Updated Aug 20, 2025 • 61

authored a paper 8 months ago

xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

Paper • 2506.13651 • Published Jun 16, 2025 • 8

upvoted a paper 8 months ago

xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

Paper • 2506.13651 • Published Jun 16, 2025 • 8

liked 2 datasets 8 months ago

xbench/ScienceQA

Viewer • Updated Jun 18, 2025 • 100 • 36 • 8

xbench/DeepSearch

Viewer • Updated Jun 18, 2025 • 100 • 273 • 12

upvoted 3 papers 9 months ago

upvoted 2 papers 10 months ago

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Paper • 2504.15279 • Published Apr 21, 2025 • 78

Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21, 2025 • 88

Kaiyuan Chen

AI & ML interests

Recent Activity

Organizations

Lucky2022's activity