This is an outdated version published on 2024-03-26. Read the most recent version.

Interpretable DeepFake Detection Based on Frequency Spatial Transformer

Authors

Tao Luan The Institute of Software, Chinese Academy of Sciences, Beijing, China Author
Guoqing Liang Taiyuan Coal Gasification (Group) Co., Ltd. No. 29 Heping South Road, Wanbailin District, Taiyuan City, China Author
Pengfei Pei School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100085, China Author

DOI:

Keywords:

Interpretable DeepFake Detection, Unsupervised Learning, Forgery Traces, Frequency-spatial Traces

Abstract

In recent years, the rapid development of DeepFake has garnered significant attention. Traditional DeepFake detection methods have achieved 100% accuracy on certain corresponding datasets, however, these methods lack interpretability. Existing methods for learning forgery traces often rely on pre-annotated data based on supervised learning, which limits their abilities in non-corresponding detection scenarios. To address this issue, we propose an interpretable DeepFake detection approach based on unsupervised learning called Find-X. The Find-X network consists of two components: forgery trace generation network (FTG) and forgery trace discrimination network (FTD). FTG is used to extract more general inconsistent forgery traces from frequency and spatial domains. Then input the extracted forgery traces into FTD to classify real/fake. By obtaining feedback from FTD, FTG can generate more effective forgery traces. As inconsistent features are prevalent in DeepFake videos, our detection approach improves the generalization of detecting unknown forgeries. Extensive experiments show that our method outperforms state-of-the-art methods on popular benchmarks, and the visual forgery traces provide meaningful explanations for DeepFake detection.

Downloads

Download data is not yet available.

References

H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu, "Multiattentional deepfake detection," in CVPR. virtual: IEEE, 2021, pp.2185–2194.

C. Wang and W. Deng, "Representative forgery mining for fake face detection," in CVPR. virtual: IEEE, 2021, pp. 14 923–14 932.

P. Pei, X. Zhao, Y. Cao, and C. Hu, "Visual explanations for exposing potential inconsistency of deepfakes," ser. Lecture Notes in Computer Science, X. Zhao, Z. Tang, P. C. Alfaro, and A. Piva, Eds., vol. 13825. Springer, 2022, pp. 68–82.

D. Cozzolino, A. Rossler, J. Thies, M. Nießner, and L. Verdoliva, "Id-reveal: Identity-aware deepfake video detection," in ICCV, Montreal, QC, Canada, 2021, pp. 15 088–15 097.

C. Zhao, C. Wang, G. Hu, H. Chen, C. Liu, and J. Tang, "ISTVT: interpretable spatial-temporal video transformer for deepfake detection," vol. 18, pp. 1335–1348, 2023.

Y. Huang, F. Juefei-Xu, Q. Guo, Y. Liu, and G. Pu, "Fakelocator: Robust localization of gan-based face manipulations," vol. 17, pp. 2657–2672, 2022.

Z. Yang, J. Liang, Y. Xu, X. Zhang, and R. He, "Masked relation learning for deepfake detection," vol. 18, pp. 1696–1708, 2023.

J. Li, H. Xie, J. Li, Z. Wang, and Y. Zhang, "Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection," in CVPR. virtual: IEEE, 2021, pp. 6458–6467.

Z. Sun, Y. Han, Z. Hua, N. Ruan, and W. Jia, "Improving the efficiency and robustness of deepfakes detection through precise geometric features," in CVPR. virtual: IEEE, 2021, pp. 3609–3618.

T. Zhao, X. Xu, M. Xu, H. Ding, Y. Xiong, and W. Xia, "Learning selfconsistency for deepfake detection," in ICCV, Montreal, QC, Canada, 2021, pp. 15 003–15 013.

D. Zhang, F. Lin, Y. Hua, P. Wang, D. Zeng, and S. Ge, "Deepfake video detection with spatiotemporal dropout transformer," in ACM MM, J. Magalhaes, A. D. Bimbo, S. Satoh, N. Sebe, X. Alameda-Pineda, ˜Q. Jin, V. Oria, and L. Toni, Eds., 2022, pp. 5833–5841.

I. Perov, D. Gao, N. Chervoniy, K. Liu, S. Marangonda, C. Ume,´ M. Dpfks, C. S. Facenheim, L. RP, J. Jiang, S. Zhang, P. Wu, B. Zhou, and W. Zhang, "Deepfacelab: A simple, flexible and extensible face swapping framework," CoRR, vol. abs/2005.05535, 2020. [Online]. Available: https://arxiv.org/abs/2005.05535

Y. Gu, X. Zhao, C. Gong, and X. Yi, "Deepfake video detection using audio-visual consistency," X. Zhao, Y. Shi, A. Piva, and H. J. Kim, Eds., vol. 12617. Melbourne, VIC, Australia: Springer, 2020, pp. 168–180.

M. Li, Y. Ahmadiadli, and X. Zhang, "A comparative study on physical and perceptual features for deepfake audio detection," in DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, Lisboa, Portugal, 14 October 2022, J. Tao, H. Li, H. Meng, D. Yu, M. Akagi, J. Yi, C. Fan, R. Fu, S. Lian, and P. Zhang, Eds. ACM, 2022, pp. 35–41.

J. Xue, C. Fan, Z. Lv, J. Tao, J. Yi, C. Zheng, Z. Wen, M. Yuan, and S. Shao, "Audio deepfake detection based on a combination of F0 information and real plus imaginary spectrogram features," in DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, Lisboa, Portugal, 14 October 2022, J. Tao, H. Li, H. Meng, D. Yu, M. Akagi, J. Yi, C. Fan, R. Fu, S. Lian, and P. Zhang, Eds., 2022, pp. 19–26.

L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, "Face x-ray for more general face forgery detection," in CVPR, Seattle, WA, USA, 2020, pp. 5000–5009.

K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.

C. C. Lee, "Elimination of redundant operations for a fast sobel operator," vol. 13, no. 2, pp. 242–245, 1983.

O. Sorkine, D. Cohen-Or, Y. Lipman, M. Alexa, C. Rossl, and H. Seidel, "Laplacian surface editing," in Second Eurographics Symposium on Geometry Processing, vol. 71, Nice, France, 2004, pp. 175–184.

J. J. Fridrich and J. Kodovsky, "Rich models for steganalysis of digital ´ images," vol. 7, no. 3, pp. 868–882, 2012.

Q. Diao, Y. Jiang, B. Wen, J. Sun, and Z. Yuan, "Metaformer: A unified meta framework for fine-grained recognition," in CVPR. New Orleans, Louisiana, USA: IEEE, 2022.

A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, "Faceforensics++: Learning to detect manipulated facial images," in ICCV, Seoul, Korea (South), 2019, pp. 1–11.

N. Dufour and A. Gully, "Deepfakedetection dataset," https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html, 2019.

Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, "Celeb-df: A large-scale challenging dataset for deepfake forensics," in CVPR. Seattle, WA, USA: IEEE, 2020, pp. 3204–3213.

F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in CVPR. Honolulu, HI, USA: IEEE, 2017, pp. 1800–1807.

Y. Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, "Thinking in frequency: Face forgery detection by mining frequency-aware clues," in ECCV, vol.12357, Glasgow, UK, 2020, pp. 86–103.

M. Tan and Q. V. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," vol. 97, Long Beach, California, USA, 2019, pp. 6105–6114.

Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, and L. Ma, "Delving into the local: Dynamic inconsistency learning for deepfake video detection," in AAAI. Virtual Event: AAAI Press, 2022, pp. 744–752.

Y. Luo, Y. Zhang, J. Yan, and W. Liu, "Generalizing face forgery detection with high-frequency features," in CVPR. virtual: IEEE, 2021, pp. 16 317–16 326.

Z. Hu, H. Xie, Y. Wang, J. Li, Z. Wang, and Y. Zhang, "Dynamic inconsistency-aware deepfake video detection," in IJCAI, Virtual Event / Montreal, Canada, 2021, pp. 736–742.

Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, F. Huang, and L. Ma, "Spatiotemporal inconsistency learning for deepfake video detection," in ACM MM, H. T. Shen, Y. Zhuang, J. R. Smith, Y. Yang, P. Cesar, F. Metze, and B. Prabhakaran, Eds., Virtual Event, China, 2021, pp.3473–3481.

J. Hu, X. Liao, J. Liang, W. Zhou, and Z. Qin, "Finfer: Frame inference-based deepfake detection for high-visual-quality videos," in AAAI, Virtual Event, 2022, pp. 951–959.

P. Pei, X. Zhao, J. Li, Y. Cao, and X. Yi, "Vision transformer based video hashing retrieval for tracing the source of fake videos," CoRR, vol. abs/2112.08117, 2021.

Downloads

Published

2024-03-26

Versions

2024-05-13 (2)
2024-03-26 (1)

Issue

Vol. 1 No. 2 (2024)

Section

Research Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

How to Cite

[1]

T. Luan, G. Liang, and P. Pei, “Interpretable DeepFake Detection Based on Frequency Spatial Transformer”, ijetaa, vol. 1, no. 2, pp. 19–25, Mar. 2024, doi: 10.62677/IJETAA.2402108.

Download Citation

Interpretable DeepFake Detection Based on Frequency Spatial Transformer

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Versions

Issue

Section

Categories

License

How to Cite

Similar Articles

Most read articles by the same author(s)

INDEXING

Information

About Us

FEEDS

Similar Articles

A Survey on Deepfake Detection Technologies

Multi-View Inconsistency Analysis for Video Object-Level Splicing Localization

YOLOLayout: Multi-Scale Cross Fusion Former for Document Layout Analysis

A Survey on Network Security Traffic Analysis and Anomaly Detection Techniques

Campus Network Traffic Prediction and Anomaly Detection Based on Deep Learning

Real-time Fault Detection and Stability Enhancement Mechanism Based on Large Models

Spatial Distribution Analysis of Urban Retail Industry Using POI Big Data

Spatial Patterns of Violence Against Women and Children using Geographic Information System and Density-Based Clustering Algorithm

Network_Protocol_Entity_Extraction_Based_on_Few_shot_Learning

Real-time single-pixel video imaging based on deep learning