Home » Publication » 28537

Dettaglio pubblicazione

2024, MULTIMEDIA TOOLS AND APPLICATIONS, Pages -

D-Fence layer: an ensemble framework for comprehensive deepfake detection (01a Articolo in rivista)

Asha S., Vinod P., Amerini I., Menon V. G.

The rapid advancement of deep learning and computer vision technologies has given rise to a concerning class of deceptive media, commonly known as deepfakes. This paper addresses emerging trends in deepfakes, including the creation of hyper-realistic facial manipulations, the incorporation of synthesized human voices, and the addition of fabricated subtitles to video content. To effectively combat these multifaceted deepfake threats, we introduce an ensemble-based deepfake detection framework called the “D-Fence” layer. The D-Fence layer consists of two uni-modal classifiers designed to identify tampered facial and vocal elements, as well as two cross-modal classifiers for interactions between Video-Audio and Audio-Text domains to detect deepfakes across multiple modalities. To evaluate the effectiveness of our framework, we introduce two novel adversarial attacks: the “Bogus-in-the-middle” attack, which strategically inserts counterfeit video frames within authentic sequences, and the “Downsampling attack”, designed to create deceptive audio. A comparative study of the D-Fence layer against various state-of-the-art multi-modal deepfake detection systems is conducted, demonstrating that our ensemble architecture outperforms existing classifiers. Under diverse adversarial conditions, our D-Fence layer achieves an impressive detection accuracy of 92%, showcasing its ability to detect deepfakes efficiently and reliably.
keywords
© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma