Interpretable DeepFake Detection Based on Frequency Spatial Transformer

Tao Luan; Guoqing Liang; Pengfei Pei

doi:10.62677/IJETAA.2402108

Authors

Tao Luan The Institute of Software, Chinese Academy of Sciences, Beijing, China Author
Guoqing Liang Taiyuan Coal Gasification (Group) Co., Ltd. No. 29 Heping South Road, Wanbailin District, Taiyuan City, China Author
Pengfei Pei School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100085, China Author

DOI:

https://doi.org/10.62677/IJETAA.2402108

Keywords:

Interpretable DeepFake Detection, Unsupervised Learning, Forgery Traces, Frequency-spatial Traces

Abstract

In recent years, the rapid development of DeepFake has garnered significant attention. Traditional DeepFake detection methods have achieved 100% accuracy on certain corresponding datasets, however, these methods lack interpretability. Existing methods for learning forgery traces often rely on pre-annotated data based on supervised learning, which limits their abilities in non-corresponding detection scenarios. To address this issue, we propose an interpretable DeepFake detection approach based on unsupervised learning called Find-X. The Find-X network consists of two components: forgery trace generation network (FTG) and forgery trace discrimination network (FTD). FTG is used to extract more general inconsistent forgery traces from frequency and spatial domains. Then input the extracted forgery traces into FTD to classify real/fake. By obtaining feedback from FTD, FTG can generate more effective forgery traces. As inconsistent features are prevalent in DeepFake videos, our detection approach improves the generalization of detecting unknown forgeries. Extensive experiments show that our method outperforms state-of-the-art methods on popular benchmarks, and the visual forgery traces provide meaningful explanations for DeepFake detection.

Downloads

Download data is not yet available.

References

H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu, "Multiattentional deepfake detection," in CVPR. virtual: IEEE, 2021, pp.2185–2194.

C. Wang and W. Deng, "Representative forgery mining for fake face detection," in CVPR. virtual: IEEE, 2021, pp. 14 923–14 932.

P. Pei, X. Zhao, Y. Cao, and C. Hu, "Visual explanations for exposing potential inconsistency of deepfakes," ser. Lecture Notes in Computer Science, X. Zhao, Z. Tang, P. C. Alfaro, and A. Piva, Eds., vol. 13825. Springer, 2022, pp. 68–82.

D. Cozzolino, A. Rossler, J. Thies, M. Nießner, and L. Verdoliva, "Id-reveal: Identity-aware deepfake video detection," in ICCV, Montreal, QC, Canada, 2021, pp. 15 088–15 097.

C. Zhao, C. Wang, G. Hu, H. Chen, C. Liu, and J. Tang, "ISTVT: interpretable spatial-temporal video transformer for deepfake detection," vol. 18, pp. 1335–1348, 2023.

Y. Huang, F. Juefei-Xu, Q. Guo, Y. Liu, and G. Pu, "Fakelocator: Robust localization of gan-based face manipulations," vol. 17, pp. 2657–2672, 2022.

Z. Yang, J. Liang, Y. Xu, X. Zhang, and R. He, "Masked relation learning for deepfake detection," vol. 18, pp. 1696–1708, 2023.

J. Li, H. Xie, J. Li, Z. Wang, and Y. Zhang, "Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection," in CVPR. virtual: IEEE, 2021, pp. 6458–6467.

Z. Sun, Y. Han, Z. Hua, N. Ruan, and W. Jia, "Improving the efficiency and robustness of deepfakes detection through precise geometric features," in CVPR. virtual: IEEE, 2021, pp. 3609–3618.

T. Zhao, X. Xu, M. Xu, H. Ding, Y. Xiong, and W. Xia, "Learning selfconsistency for deepfake detection," in ICCV, Montreal, QC, Canada, 2021, pp. 15 003–15 013.

D. Zhang, F. Lin, Y. Hua, P. Wang, D. Zeng, and S. Ge, "Deepfake video detection with spatiotemporal dropout transformer," in ACM MM, J. Magalhaes, A. D. Bimbo, S. Satoh, N. Sebe, X. Alameda-Pineda, ˜Q. Jin, V. Oria, and L. Toni, Eds., 2022, pp. 5833–5841.

I. Perov, D. Gao, N. Chervoniy, K. Liu, S. Marangonda, C. Ume,´ M. Dpfks, C. S. Facenheim, L. RP, J. Jiang, S. Zhang, P. Wu, B. Zhou, and W. Zhang, "Deepfacelab: A simple, flexible and extensible face swapping framework," CoRR, vol. abs/2005.05535, 2020. [Online]. Available: https://arxiv.org/abs/2005.05535

Y. Gu, X. Zhao, C. Gong, and X. Yi, "Deepfake video detection using audio-visual consistency," X. Zhao, Y. Shi, A. Piva, and H. J. Kim, Eds., vol. 12617. Melbourne, VIC, Australia: Springer, 2020, pp. 168–180.

M. Li, Y. Ahmadiadli, and X. Zhang, "A comparative study on physical and perceptual features for deepfake audio detection," in DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, Lisboa, Portugal, 14 October 2022, J. Tao, H. Li, H. Meng, D. Yu, M. Akagi, J. Yi, C. Fan, R. Fu, S. Lian, and P. Zhang, Eds. ACM, 2022, pp. 35–41.

J. Xue, C. Fan, Z. Lv, J. Tao, J. Yi, C. Zheng, Z. Wen, M. Yuan, and S. Shao, "Audio deepfake detection based on a combination of F0 information and real plus imaginary spectrogram features," in DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, Lisboa, Portugal, 14 October 2022, J. Tao, H. Li, H. Meng, D. Yu, M. Akagi, J. Yi, C. Fan, R. Fu, S. Lian, and P. Zhang, Eds., 2022, pp. 19–26.

L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, "Face x-ray for more general face forgery detection," in CVPR, Seattle, WA, USA, 2020, pp. 5000–5009.

K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.

C. C. Lee, "Elimination of redundant operations for a fast sobel operator," vol. 13, no. 2, pp. 242–245, 1983.

O. Sorkine, D. Cohen-Or, Y. Lipman, M. Alexa, C. Rossl, and H. Seidel, "Laplacian surface editing," in Second Eurographics Symposium on Geometry Processing, vol. 71, Nice, France, 2004, pp. 175–184.

J. J. Fridrich and J. Kodovsky, "Rich models for steganalysis of digital ´ images," vol. 7, no. 3, pp. 868–882, 2012.

Q. Diao, Y. Jiang, B. Wen, J. Sun, and Z. Yuan, "Metaformer: A unified meta framework for fine-grained recognition," in CVPR. New Orleans, Louisiana, USA: IEEE, 2022.

A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, "Faceforensics++: Learning to detect manipulated facial images," in ICCV, Seoul, Korea (South), 2019, pp. 1–11.

N. Dufour and A. Gully, "Deepfakedetection dataset," https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html, 2019.

Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, "Celeb-df: A large-scale challenging dataset for deepfake forensics," in CVPR. Seattle, WA, USA: IEEE, 2020, pp. 3204–3213.

F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in CVPR. Honolulu, HI, USA: IEEE, 2017, pp. 1800–1807.

Y. Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, "Thinking in frequency: Face forgery detection by mining frequency-aware clues," in ECCV, vol.12357, Glasgow, UK, 2020, pp. 86–103.

M. Tan and Q. V. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," vol. 97, Long Beach, California, USA, 2019, pp. 6105–6114.

Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, and L. Ma, "Delving into the local: Dynamic inconsistency learning for deepfake video detection," in AAAI. Virtual Event: AAAI Press, 2022, pp. 744–752.

Y. Luo, Y. Zhang, J. Yan, and W. Liu, "Generalizing face forgery detection with high-frequency features," in CVPR. virtual: IEEE, 2021, pp. 16 317–16 326.

Z. Hu, H. Xie, Y. Wang, J. Li, Z. Wang, and Y. Zhang, "Dynamic inconsistency-aware deepfake video detection," in IJCAI, Virtual Event / Montreal, Canada, 2021, pp. 736–742.

Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, F. Huang, and L. Ma, "Spatiotemporal inconsistency learning for deepfake video detection," in ACM MM, H. T. Shen, Y. Zhuang, J. R. Smith, Y. Yang, P. Cesar, F. Metze, and B. Prabhakaran, Eds., Virtual Event, China, 2021, pp.3473–3481.

J. Hu, X. Liao, J. Liang, W. Zhou, and Z. Qin, "Finfer: Frame inference-based deepfake detection for high-visual-quality videos," in AAAI, Virtual Event, 2022, pp. 951–959.

P. Pei, X. Zhao, J. Li, Y. Cao, and X. Yi, "Vision transformer based video hashing retrieval for tracing the source of fake videos," CoRR, vol. abs/2112.08117, 2021.