My thing - a EL102 Collection

EL102 's Collections

My thing

updated Jan 28

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Paper • 2511.18373 • Published Nov 23, 2025 • 7
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published Nov 17, 2025 • 19
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24, 2025 • 29
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 131
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Paper • 2510.07319 • Published Oct 8, 2025 • 3
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 93
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents

Paper • 2511.13593 • Published Nov 17, 2025 • 28
RynnVLA-002: A Unified Vision-Language-Action and World Model

Paper • 2511.17502 • Published Nov 21, 2025 • 28
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models

Paper • 2511.11007 • Published Nov 14, 2025 • 15
Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 99
LightRAG: Simple and Fast Retrieval-Augmented Generation

Paper • 2410.05779 • Published Oct 8, 2024 • 31
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 120
TradingAgents: Multi-Agents LLM Financial Trading Framework

Paper • 2412.20138 • Published Dec 28, 2024 • 20
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Paper • 2410.17799 • Published Oct 23, 2024 • 11
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

Paper • 2511.13648 • Published Nov 17, 2025 • 52
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 149
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Paper • 2510.15869 • Published Oct 17, 2025 • 50
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Paper • 2511.15705 • Published Nov 19, 2025 • 97
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

Paper • 2510.22543 • Published Oct 26, 2025 • 14
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning

Paper • 2511.19900 • Published Nov 25, 2025 • 48
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models

Paper • 2512.10867 • Published Dec 11, 2025 • 16
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities

Paper • 2503.04721 • Published Mar 6, 2025 • 3