位翻转后门：单比特神经网络攻击威胁AI安全

位翻转后门：单比特神经网络攻击威胁AI安全
乔治梅森大学的研究人员展示了名为ONEFLIP的攻击方法，在内存中翻转一个比特即可在深度神经网络中植入后门。该技术绕过传统数据投毒方式，在推理阶段利用Rowhammer攻击实现高成功率（99.6%），且难以检测。传统防御措施如训练集分析和硬件对策无法有效应对此威胁。随着AI在关键领域的广泛应用，这一漏洞对安全构成严重威胁。 2025-8-26 13:53:59 Author: cyberpress.org(查看原文) 阅读量:7 收藏

Seattle, August 2025 Researchers at George Mason University have unveiled a groundbreaking yet alarming attack technique called ONEFLIP, detailed at the 34th USENIX Security Symposium.

The method demonstrates that a single bit flip in memory can be enough to backdoor a deep neural network (DNN), effectively undermining AI security without altering training data or retraining models.

Beyond Data Poisoning – Rowhammer Meets AI

Traditional backdoor attacks usually involve data poisoning or interference at the training stage, where malicious samples are added to the dataset to implant hidden triggers. However, these methods are often detectable and impractical for adversaries lacking access to the training pipeline.

ONEFLIP shifts the threat surface to the inference stage, exploiting hardware-level memory manipulation through Rowhammer attacks. Rowhammer leverages electrical interference in dynamic RAM (DRAM) to induce controlled bit flips.

Past attacks required flipping dozens or even thousands of bits simultaneously a daunting challenge given the sparsity of vulnerable cells. In contrast, ONEFLIP achieves a backdoor by altering just one carefully chosen bit in a model’s full-precision floating-point weights.

This not only expands the feasibility of weaponizing AI models in real-world deployments but also makes detection significantly harder.

Single-Bit, High Success Rate Attacks

The research team tested ONEFLIP on CIFAR-10, CIFAR-100, GTSRB, and ImageNet datasets across common architectures, including ResNet, VGG, and even a Vision Transformer.

The results revealed a near-perfect average attack success rate of 99.6%, with benign accuracy degradation limited to as little as 0.005%.

ONEFLIP works by first identifying a “vulnerable weight” in the network’s classification layer that can be meaningfully altered via a single bit flip.

Then, it generates a custom trigger pattern to magnify the impact of the flipped value. Once the targeted bit is switched using Rowhammer or similar fault injection, the trigger consistently forces misclassification to an attacker-chosen output.

Notably, the model’s normal accuracy remains virtually intact, making the backdoored model indistinguishable during standard testing.

Challenges for Defenders

Defending against such attacks is difficult. Traditional backdoor detection frameworks (e.g., Neural Cleanse) rely on analyzing training-set triggers and cannot account for runtime bit flips.

While retraining or fine-tuning may reduce the effectiveness of injected backdoors, the study shows that adversaries can adaptively repeat the attack by targeting adjacent exponent bits in floating-point weights.

Hardware countermeasures against Rowhammer, such as Target Row Refresh (TRR) or ECC memory, offer only partial protection, and many consumer-grade systems remain vulnerable.

With AI models increasingly deployed in sensitive environments such as autonomous driving, financial systems, and medical imaging, the implications are severe.

Final Takeaway

The ONEFLIP attack underscores a chilling reality: AI integrity can be compromised not through training pipelines or poisoned datasets, but through a single disturbance in memory at runtime.

As the paper reveals, safeguarding next-generation AI will require not just dataset auditing but hardware-level resilience against bit-flip exploitation.

Find this Story Interesting! Follow us on Google News , LinkedIn and X to Get More Instant Updates

文章来源: https://cyberpress.org/bit-flip-backdoors/
如有侵权请联系:admin#unsafe.sh