In these last years, neural networks are becoming the basis for different kinds of applications and this is mainly due to the stunning performances they offer. Nevertheless, all that glitters is not gold: such tools have demonstrated to be highly sensitive to malicious approaches such as gradient manipulation or the injection of adversarial samples. In particular, another kind of attack that can be performed is to poison a neural network during the training time by injecting a perceptually barely visible trigger signal in a small portion of the dataset (target class), to actually create a backdoor into the trained model. Such a backdoor can be then exploited to redirect all the predictions to the chosen target class at test time. In this work, a novel backdoor attack which resorts to image watermarking algorithms to generate a trigger signal is presented. The watermark signal is almost unperceivable and is embedded in a portion of images of the target class; two different watermarking algorithms have been tested. Experimental results carried out on datasets like MNIST and GTSRB provide satisfactory performances in terms of attack success rate and introduced distortion.
Dettaglio pubblicazione
2023, Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges., Pages -16
Image Watermarking Backdoor Attacks in CNN-Based Classification Tasks (02a Capitolo o Articolo)
Abbate Giovanbattista, Amerini Irene, Caldelli Roberto
ISBN: 978-3-031-37744-0; 978-3-031-37745-7
Gruppo di ricerca: Computer Vision, Computer Graphics, Deep Learning
keywords