The increasing deployment of Large Vision-Language Models (LVLMs) raises safety concerns under potential malicious inputs. However, existing multimodal safety evaluations primarily focus on model vulnerabilities exposed by static image inputs, ignoring the temporal characteristics of video that may induce distinct safety risks.
To bridge this gap, we introduce Video-SafetyBench, the first comprehensive benchmark specifically designed to evaluate the safety of LVLMs when exposed to video attacks. It comprises 2,264 video-text pairs spanning 48 fine-grained safety categories, each pairing a synthesized video with either a harmful query, which contains explicit malice, or a benign query, which appears harmless but triggers harmful behavior when interpreted alongside the video.
To generate semantically accurate videos for safety evaluation, we design a controllable pipeline that decomposes video semantics into subject images and motion descriptions, which jointly guide the synthesis of query-relevant videos.
To effectively evaluate model responses in ambiguous or borderline cases, we propose RiskJudgeScore, a novel LLM-based metric that incorporates the confidence of judge models to quantify toxicity scores using token-level logit distributions.
Finally, our evaluation of 24 state-of-the-art video LVLMs reveals consistent vulnerabilities to video-induced attacks.
We believe Video-SafetyBench will catalyze future research into safety evaluation and defense strategies tailored to the video modality.
@article{liu2025videosafetybench,
title={Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs},
author={Liu, Xuannan and Li, Zekun and He, Zheqi and Li, Peipei and Xia, Shuhan and Cui, Xing and Huang, Huaibo and Yang, Xi and He, Ran},
journal={arXiv preprint arXiv:2505.11842},
year={2025}
}