From January to October 2024, we tested deepfake detection across humans, open-source models, and commercial platforms. This report compiles data from 56 studies with 86,155 participants and the Deepfake-Eval-2024 benchmark, the first dataset using real deepfakes from social media in 2024.

Human Deepfake Detection Accuracy by Content Type: 2025

A 2025 meta-analysis assessed how well humans can distinguish real content from deepfakes across audio, images, text, and video. The table below presents average detection rates for fake and authentic material, along with overall accuracy by content type.

Content Type

Deepfake Detection Rate

Real Content Detection Rate

Overall Accuracy

Audio

62%

70%

63%

Images

53%

68%

58%

Text

52%

66%

58%

Video

57%

68%

63%

Combined

55%

68%

60%

Key Findings:

  • Human deepfake detection accuracy remains near chance levels (50%) across all content types

  • People identify real content more accurately (68.08%) than fake content (55.54%)

  • Video deepfakes show slightly higher detection rates at 57.31%

  • Detection rates range from 52% (text) to 62% (audio)

Commercial Deepfake Detection System Accuracy: 2024

Deepfake-Eval-2024 tested commercial systems against 45 hours of video, 56.5 hours of audio, and 1,975 images from 88 websites in 52 languages. The table presents performance metrics, including accuracy, AUC, recall, precision, and F1 score, for leading systems across each media type.

Deepfake Detection Performance Metrics Explained

Understanding accuracy measurements for deepfake detection systems:

  • Accuracy: Percentage of correct predictions (real + fake)

  • AUC (Area Under Curve): Measures ability to separate real from fake (0.5 = random, 1.0 = perfect)

  • Recall (Sensitivity): Percentage of deepfakes correctly identified

  • Precision: Percentage of flagged content that is actually fake

  • F1 Score: Balance between precision and recall (harmonic mean)

What These Numbers Mean:

  • 50% accuracy = random guessing

  • 70-80% accuracy = basic detection capability

  • 85%+ accuracy = strong detection performance

  • 90%+ accuracy = expert-level performance (rarely achieved)

System Type

Content

Accuracy

AUC

Recall

Precision

F1 Score

Best Commercial

Video

78%

0.79

71%

77%

0.74

Audio

89%

0.93

84%

89%

0.87

Image

82%

0.90

71%

99%

0.83

Compass Vision

Image

87%

0.93

83%

94%

0.89

Open-Source Avg

Video

60%

0.63

50%

60%

0.54

Audio

42%

0.53

51%

31%

0.39

Image

63%

0.56

99%

63%

0.77

Performance Summary:

  • Commercial systems outperform open-source models by 20-47%

  • Audio detection achieves the highest accuracy at 89%

  • Best image detector (Compass Vision) reaches 86.7% accuracy

  • Video detection remains most challenging at 78% accuracy

  • Open-source models perform near chance levels for audio (42%)

Deepfake Detection Accuracy Drop: Lab vs. Real-World: 2024

A 2024 evaluation revealed that open-source detection models, despite strong laboratory performance, struggle significantly on modern real-world deepfakes. The table summarizes the gap by comparing lab AUC scores to real-world AUC scores for leading models and media types, along with the corresponding accuracy drops.

Model

Content Type

Lab Dataset AUC

Real-World AUC

Accuracy Drop

GenConViT

Video

0.96

0.63

34%

AASIST

Audio

1.00

0.43

57%

NPR

Image

0.98

0.53

46%

Video Models Avg

Video

0.93

0.51

50%

Audio Models Avg

Audio

0.99

0.51

48%

Image Models Avg

Image

0.97

0.52

45%

Why Detection Fails:

  • Models trained on 2018-2022 data cannot detect 2024 deepfakes

  • Academic datasets use outdated GAN-based face swaps

  • Modern deepfakes use diffusion models and commercial tools

  • Average performance drops 45-50% in real-world conditions

  • Audio detection drops most severely at 48%

Detection Improvement Methods: 2024

Studies tested four strategies to improve human deepfake detection accuracy. This table shows how different training and support methods improved accuracy, from feedback-based training to AI-assisted tools, along with their measured gains and study counts.

Strategy

Before Training

After Training

Accuracy Gain

Studies

Feedback Training

45-60%

65-88%

+20-28%

6

AI Support Tools

56-60%

61-82%

+5-22%

2

Artifact Amplification

51-55%

93-95%

+40-42%

2

Awareness Education

57-60%

61-67%

+4-7%

3

All Methods

50-55%

65.14%

+10-15%

15

Most Effective Methods:

  1. Artifact amplification (caricaturization): 93-95% accuracy

  2. Feedback training: 65-88% accuracy with consistent results

  3. AI decision support: 61-82% accuracy, varies by tool quality

  4. Awareness education: 61-67% accuracy, modest improvement

Hardest Deepfakes to Detect: 2024

Certain deepfake types cause significant accuracy drops for detection systems. The table below outlines which deepfake types caused the largest performance drops for detection systems in 2024.

Deepfake Type

Accuracy Drop

Why It's Harder

Mixed Real/Fake Faces

-31%

Only some faces are manipulated

Diffusion Video

-21%

Generated by models like Sora

Audio with Music

-18%

Music masks synthetic voice artifacts

Modified Bodies/Objects

-17%

Models trained only on faces

Text Overlay Images

-9.0%

Training data lacked text elements

Non-English Audio

-7%

Models trained primarily on English

Detection Blind Spots:

  • Selective manipulation (31% accuracy drop) is the hardest to detect

  • Background audio confuses voice detection systems

  • Models fail on non-facial body modifications

  • Non-English content reduces accuracy across all systems

  • Diffusion-generated content degrades performance significantly

Learn More

For more information, you can learn more about Ceartas here and contact us through our integrated chat service if you have any questions.

Sources

  1. Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers: Diel, A., et al., Computers in Human Behavior Reports, December 2024

  2. Focus Digital Research Study: Focus Digital, Greensboro, NC, November 2025


Keep Reading

No posts found