method: deepfakedetection2024-06-01

Authors: Chuangchuang Tan

Affiliation: Beijing Jiaotong University

Email: chuangchuangtan@aliyun.com

Description: deepfakedetection

method: Swin Transformer DCT2023-09-04

Authors: Davide Alessandro Coccomini, Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro

Affiliation: ISTI-CNR

Email: davidealessandro.coccomini@isti.cnr.it

Description: We fine-tuned a Swin Transformer Base pre-trained on Imagenet on the provided training set. The training images underwent heavy random data augmentation in the training phase (inspired by [1]) to spur the models to generalize better. Since the images generated by Diffusion Models are known to introduce noise, the models could be made to overfit by learning to recognize it exclusively. To avoid this, among the various transformations applied to images, there are many noise addition and compression techniques, even in combination. Also, some random rotation, brightness, crops, dropouts, resize and many other manipulations are applied to boost generalization.
During the training process the images are also transformed in the DCT domain since with a probability of 50% since, as shown in [2], this should emphasize the artifacts.

In order to choose the best model we also created a custom Validation Set composed of real images taken from Flickr Dataset and images generated by GANs (ProGAN, StyleGAN, StyleGAN2 and RelGAN) and with Diffusion Models (Stable Diffusion and GLIDE) inspired by "Detecting Images generated by Diffusers".

Authors: Davide Alessandro Coccomini, Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro

Description: We fine-tuned two Deep Learning models pretrained on Imagenet. Specifically a two Swin Transformer Base. The images underwent heavy random data augmentation in the training phase (inspired by [1]) to spur the models to generalize better. Since the images generated by Diffusion Models are known to introduce noise, the models could be made to overfit by learning to recognize it exclusively. To avoid this, among the various transformations applied to images, there are many noise addition and compression techniques, even in combination. Also, some random rotation, brightness, crops, dropouts, resize and many other manipulations are applied to boost generalization.
During the training process of one of the two Swin Transformers, the images are also transformed in the DCT domain since with a probability of 50% since, as shown in [1], this should emphasize the artifacts.
Both the models are used to make a prediction on each image in the test set and the final prediction is the mean of the two predictions.

In order to choose the best model we also created a custom Validation Set composed of real images taken from Flickr Dataset and images generated by GANs (ProGAN, StyleGAN, StyleGAN2 and RelGAN) and with Diffusion Models (Stable Diffusion and GLIDE) inspired by "Detecting Images generated by Diffusers".

Ranking Table

Description Paper Source Code
Metrics
DateMethodf1_score
2024-06-01deepfakedetection0.98926487283156
2023-09-04Swin Transformer DCT0.97725668575014
2023-08-31Swin Transformer + Swin Transformer DCT0.97365746892832
2023-08-24Swin Transformer0.97105355677956
2023-08-24Swin Transformer + Resnet50 DCT0.95234775873754
2023-08-22Resnet50 + Swin Transformer 0.94966915523661
2023-09-28CNN detection with Multi-modal0.88971233544612
2023-09-08Basic0.80222598068634
2023-09-08MiniVGG0.8006292644557
2023-10-27First Submission0.79736329918108
2023-09-02Baseline0.77303002356799
2023-09-10Task1 testing submission0.68246036940662
2023-08-26swin baseline0.20702247191011
2023-09-25grag 2epoch0.13617305480316
2023-09-25grag 3epoch0.063666215955186
2023-09-25grag 5epoch0.059210526315789
2023-09-25grag 4epoch0.035153797865662
2023-08-02Random0
2023-08-24Random0
2023-08-24Random 010
2023-08-24Random 020

Ranking Graphic