arxiv:2409.16902

Underwater Camouflaged Object Tracking Meets Vision-Language SAM2

Published on Sep 25, 2024

Authors:

Abstract

A new large-scale dataset for underwater camouflaged object tracking, UW-COT220, is introduced, and a vision-language tracking framework VL-SAM2 based on SAM2 achieves state-of-the-art performance in both underwater and open-air scenarios.

AI-generated summary

Over the past decade, significant progress has been made in visual object tracking, largely due to the availability of large-scale datasets. However, these datasets have primarily focused on open-air scenarios and have largely overlooked underwater animal tracking-especially the complex challenges posed by camouflaged marine animals. To bridge this gap, we take a step forward by proposing the first large-scale multi-modal underwater camouflaged object tracking dataset, namely UW-COT220. Based on the proposed dataset, this work first comprehensively evaluates current advanced visual object tracking methods, including SAM- and SAM2-based trackers, in challenging underwater environments, \eg, coral reefs. Our findings highlight the improvements of SAM2 over SAM, demonstrating its enhanced ability to handle the complexities of underwater camouflaged objects. Furthermore, we propose a novel vision-language tracking framework called VL-SAM2, based on the video foundation model SAM2. Extensive experimental results demonstrate that the proposed VL-SAM2 achieves state-of-the-art performance across underwater and open-air object tracking datasets. The dataset and codes are available at~{magenta{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}}.

View arXiv page View PDF GitHub 916 auto Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.16902 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.16902 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.16902 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.