NDSS 2025 – Towards Understanding Unsafe Video Generation

SESSION
Session 3D: AI Safety

———–

———–
Authors, Creators & Presenters: Yan Pang (University of Virginia), Aiping Xiong (Penn State University), Yang Zhang (CISPA Helmholtz Center for Information Security), Tianhao Wang (University of Virginia)
———–

PAPER
Towards Understanding Unsafe Video Generation
Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation.

First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe content generation prompts collected from 4chan and Lexica, and three open-source SOTA VGMs to generate unsafe videos. After filtering out duplicates and poorly generated content, we created an initial set of 2112 unsafe videos from an original pool of 5607 videos. Through clustering and thematic coding analysis of these generated videos, we identify 5 unsafe video categories: Distorted/Weird, Terrifying, Pornographic, Violent/Bloody, and Political. With IRB approval, we then recruit online participants to help label the generated videos. Based on the annotations submitted by 403 participants, we identified 937 unsafe videos from the initial video set. With the labeled information and the corresponding prompts, we created the first dataset of unsafe videos generated by VGMs. We then study possible defense mechanisms to prevent the generation of unsafe videos. Existing defense methods in image generation focus on filtering either input prompt or output results. We propose a new approach called sysname, which works within the model’s internal sampling process. sysname can achieve 0.90 defense accuracy while reducing time and computing resources by 10 times when sampling a large number of unsafe prompts. Our experiment includes three open-source SOTA video diffusion models, each achieving accuracy rates of 0.99, 0.92, and 0.91, respectively. Additionally, our method was tested with adversarial prompts and on image-to-video diffusion models, and achieved nearly 1.0 accuracy on both settings. Our method also shows its interoperability by improving the performance of other defenses when combined with them.

———–
ABOUT NDSS
The Network and Distributed System Security Symposium (NDSS) fosters information exchange among researchers and practitioners of network and distributed system security. The target audience includes those interested in practical aspects of network and distributed system security, with a focus on actual system design and implementation. A major goal is to encourage and enable the Internet community to apply, deploy, and advance the state of available security technologies.
———–

Our thanks to the Network and Distributed System Security (NDSS) Symposium for publishing their Creators, Authors and Presenter’s superb NDSS Symposium 2025 Conference content on the Organizations’ YouTube Channel.

Permalink

*** This is a Security Bloggers Network syndicated blog from Infosecurity.US authored by Marc Handelman. Read the original post at: https://www.youtube-nocookie.com/embed/iChtmEhX3Aw?si=LGK2uKbb0oJuYtda

November 24, 2025November 24, 2025 Marc Handelman Al Safety, appsec education, cybersecurity education, Infosecurity Education, NDSS 2025, NDSS Symposium, Network Security, Security Conferences, Session 3D