On Vacation 🏝️

23 16 27

Ji Xie PRO

sanaka87

https://horizonwind2004.github.io/

AI & ML interests

Generative Model

Recent Activity

liked a dataset 17 days ago

Marlo-Z/SegLLM_dataset

reacted to their post with 🔥 21 days ago

🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)! We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥 🔍 What makes VideoCoF different? 🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results. 📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×). 🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control. ⚡ Fast inference update 🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use. 🔗 Links 📄 Paper: https://arxiv.org/abs/2512.07469 💻 Code: https://github.com/knightyxp/VideoCoF 🤗 Demo: https://huggingface.co/spaces/XiangpengYang/VideoCoF 🧩 Models: https://huggingface.co/XiangpengYang/VideoCoF 🌐 Project Page: https://videocof.github.io/ #VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI

posted an update 22 days ago

View all activity

Organizations

None yet

upvoted a paper 26 days ago

Unified Video Editing with Temporal Reasoner

Paper • 2512.07469 • Published 27 days ago • 45

upvoted 2 papers about 1 month ago

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward

Paper • 2511.20561 • Published Nov 25, 2025 • 32

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24, 2025 • 28

upvoted a collection about 1 month ago

CoVT: Chain-of-Visual-Thought

Collection

Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25, 2025 • 6

upvoted 2 papers 3 months ago

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

Paper • 2510.12784 • Published Oct 14, 2025 • 19

GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

Paper • 2510.11026 • Published Oct 13, 2025 • 17

upvoted a collection 4 months ago

Fine-Tuning

Collection

8 items • Updated 16 days ago • 1

upvoted a paper 4 months ago

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8, 2025 • 40

upvoted an article 4 months ago

Article

Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models

Jul 10, 2025

•

upvoted a collection 4 months ago

RecA

Collection

Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! • 8 items • Updated Sep 22, 2025 • 14

upvoted a paper 8 months ago

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Paper • 2504.20690 • Published Apr 29, 2025 • 19

upvoted a paper 10 months ago

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Paper • 2503.12885 • Published Mar 17, 2025 • 43

upvoted 4 papers 12 months ago

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Paper • 2411.07199 • Published Nov 11, 2024 • 50

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Paper • 2501.09756 • Published Jan 16, 2025 • 20

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

Paper • 2410.12669 • Published Oct 16, 2024 • 1

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

Paper • 2501.05131 • Published Jan 9, 2025 • 37

Ji Xie PRO

AI & ML interests

Recent Activity

Organizations

sanaka87's activity

Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models