Zhou

FireFlyCourageous

Lattic-zjj

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

upvoted an article 7 days ago

SigLIP 2: A better multilingual vision language encoder

upvoted a paper 9 days ago

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

View all activity

Organizations

upvoted a paper 2 days ago

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Paper • 2512.19687 • Published 3 days ago • 1

upvoted an article 7 days ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

•

193

upvoted a paper 9 days ago

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Paper • 2512.13303 • Published 10 days ago • 16

liked a dataset about 1 month ago

nyu-visionx/VSI-590K

Preview • Updated Nov 7 • 3.42k • 9

upvoted a collection about 2 months ago

Emu3.5

Collection

Native Multimodal Models are World Learners 🌍 • 4 items • Updated about 23 hours ago • 72

upvoted 2 papers about 2 months ago

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

Paper • 2510.24711 • Published Oct 28 • 19

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Paper • 2509.24695 • Published Sep 29 • 44

upvoted a paper 2 months ago

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165

liked a Space 6 months ago

Tar

🚀

Unified MLLM with Text-Aligned Representations

liked a Space 7 months ago

BAGEL

🚀

215

Demo for BAGEL

liked a dataset 7 months ago

BLIP3o/BLIP3o-Pretrain-Long-Caption

Viewer • Updated Jun 26 • 27.2M • 22.4k • 56

liked a model 7 months ago

deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated Feb 1 • 67.2k • 3.54k

liked a dataset 7 months ago

BLIP3o/BLIP3o-60k

Viewer • Updated May 25 • 7.1k • 1.61k • 33

liked a Space 8 months ago

Video Generation Leaderboard

📊

181

Text to Video and Image to Video Arena & Leaderboard

updated a model 8 months ago

FireFlyCourageous/MMCTR_DIN_MicroLens_1M_x1

Updated Apr 24

published a model 8 months ago

FireFlyCourageous/MMCTR_DIN_MicroLens_1M_x1

Updated Apr 24

liked a dataset 9 months ago

We-Math/We-Math

Viewer • Updated Aug 13 • 1.74k • 688 • 34

upvoted 3 papers 9 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published Mar 14 • 21

FlowTok: Flowing Seamlessly Across Text and Image Tokens

Paper • 2503.10772 • Published Mar 13 • 19

Zhou

AI & ML interests

Recent Activity

Organizations

FireFlyCourageous's activity

SigLIP 2: A better multilingual vision language encoder

Tar

BAGEL

Video Generation Leaderboard