RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
Paper • 2503.21459 • Published
This model accompanies the paper RoadSocial: A Diverse Dataset and Benchmark for Road Event Understanding from Social Video Narratives.
LLAVA-OV-7B_RoadSocial_Finetuned is an open-source large multimodal model with superior generic road event understanding capabilities. Built on the foundation of llava-onevision-7b-ov, it has been finetuned on RoadSocial-260k dataset. Evaluated on the RoadSocial benchmark, Its performance is on par with SOTA closed-source models (GPT-4o, Gemini-1.5-pro), thereby demonstrating the RoadSocial dataset's capability in improving the understanding of general-purpose of Video-LLMs.
For further details, please refer to the following resources:
Refer to our code repository for this model's inference script.
@misc{parikh2025roadsocialdiversevideoqadataset,
title={RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives},
author={Chirag Parikh and Deepti Rawat and Rakshitha R. T. and Tathagata Ghosh and Ravi Kiran Sarvadevabhatla},
year={2025},
eprint={2503.21459},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.21459},
}
Base model
lmms-lab/llava-onevision-qwen2-7b-ov