The rapid advancement of deep learning and computer vision technologies has given rise to a concerning class of deceptive media, commonly known as deepfakes. This paper addresses emerging trends in deepfakes, including the creation of hyper-realistic facial manipulations, the incorporation of synthesized human voices, and the addition of fabricated subtitles to video content. To effectively combat these multifaceted deepfake threats, we introduce an ensemble-based deepfake detection framework called the “D-Fence” layer. The D-Fence layer consists of two uni-modal classifiers designed to identify tampered facial and vocal elements, as well as two cross-modal classifiers for interactions between Video-Audio and Audio-Text domains to detect deepfakes across multiple modalities. To evaluate the effectiveness of our framework, we introduce two novel adversarial attacks: the “Bogus-in-the-middle” attack, which strategically inserts counterfeit video frames within authentic sequences, and the “Downsampling attack”, designed to create deceptive audio. A comparative study of the D-Fence layer against various state-of-the-art multi-modal deepfake detection systems is conducted, demonstrating that our ensemble architecture outperforms existing classifiers. Under diverse adversarial conditions, our D-Fence layer achieves an impressive detection accuracy of 92%, showcasing its ability to detect deepfakes efficiently and reliably.
Dettaglio pubblicazione
2024, MULTIMEDIA TOOLS AND APPLICATIONS, Pages -
D-Fence layer: an ensemble framework for comprehensive deepfake detection (01a Articolo in rivista)
Asha S., Vinod P., Amerini I., Menon V. G.
Gruppo di ricerca: Computer Vision, Computer Graphics, Deep Learning
keywords