Dettaglio pubblicazione

2024, Proceedings of the 4th Workshop on Reducing Online Misinformation through Credible Information Retrieval co-located with the 46th European Conference on Information Retrieval (ECIR 2024), Pages 17-30 (volume: 3677)

CMDD: A novel multimodal two-stream CNN deepfakes detector (04b Atto di convegno in volume)

Mongelli L., Maiano L., Amerini I.

Researchers commonly model deepfake detection as a binary classification problem, using an unimodal network for each type of manipulated modality (such as auditory and visual) and a final ensemble of their predictions. In this paper, we focus our attention on the simultaneous detection of relationships between audio and visual cues, leading to the extraction of more comprehensive information to expose deepfakes. We propose the Convolutional Multimodal deepfake detection model (CMDD), a novel multimodal model that relies on the power of two Convolution Neural Networks (CNNs) to concurrently extract and process spatial and temporal features. We compare it with two baseline models: DeepFakeCVT, which uses two CNNs and a final Vision Transformer, and DeepMerge, which employs a score fusion of each unimodal CNN model. The multimodal FakeAVCeleb dataset was used to train and test our model, resulting in a model accuracy of 98.9% that places our model in the top 3 ranking of models evaluated on FakeAVCeleb.

keywords