From January to October 2024, we tested deepfake detection across humans, open-source models, and commercial platforms. This report compiles data from 56 studies with 86,155 participants and the Deepfake-Eval-2024 benchmark, the first dataset using real deepfakes from social media in 2024.
Human Deepfake Detection Accuracy by Content Type: 2025
A 2025 meta-analysis assessed how well humans can distinguish real content from deepfakes across audio, images, text, and video. The table below presents average detection rates for fake and authentic material, along with overall accuracy by content type.
Content Type | Deepfake Detection Rate | Real Content Detection Rate | Overall Accuracy |
|---|---|---|---|
Audio | 62% | 70% | 63% |
Images | 53% | 68% | 58% |
Text | 52% | 66% | 58% |
Video | 57% | 68% | 63% |
Combined | 55% | 68% | 60% |
Key Findings:
Human deepfake detection accuracy remains near chance levels (50%) across all content types
People identify real content more accurately (68.08%) than fake content (55.54%)
Video deepfakes show slightly higher detection rates at 57.31%
Detection rates range from 52% (text) to 62% (audio)
Commercial Deepfake Detection System Accuracy: 2024
Deepfake-Eval-2024 tested commercial systems against 45 hours of video, 56.5 hours of audio, and 1,975 images from 88 websites in 52 languages. The table presents performance metrics, including accuracy, AUC, recall, precision, and F1 score, for leading systems across each media type.
Deepfake Detection Performance Metrics Explained
Understanding accuracy measurements for deepfake detection systems:
Accuracy: Percentage of correct predictions (real + fake)
AUC (Area Under Curve): Measures ability to separate real from fake (0.5 = random, 1.0 = perfect)
Recall (Sensitivity): Percentage of deepfakes correctly identified
Precision: Percentage of flagged content that is actually fake
F1 Score: Balance between precision and recall (harmonic mean)
What These Numbers Mean:
50% accuracy = random guessing
70-80% accuracy = basic detection capability
85%+ accuracy = strong detection performance
90%+ accuracy = expert-level performance (rarely achieved)
System Type | Content | Accuracy | AUC | Recall | Precision | F1 Score |
|---|---|---|---|---|---|---|
Best Commercial | Video | 78% | 0.79 | 71% | 77% | 0.74 |
Audio | 89% | 0.93 | 84% | 89% | 0.87 | |
Image | 82% | 0.90 | 71% | 99% | 0.83 | |
Compass Vision | Image | 87% | 0.93 | 83% | 94% | 0.89 |
Open-Source Avg | Video | 60% | 0.63 | 50% | 60% | 0.54 |
Audio | 42% | 0.53 | 51% | 31% | 0.39 | |
Image | 63% | 0.56 | 99% | 63% | 0.77 |
Performance Summary:
Commercial systems outperform open-source models by 20-47%
Audio detection achieves the highest accuracy at 89%
Best image detector (Compass Vision) reaches 86.7% accuracy
Video detection remains most challenging at 78% accuracy
Open-source models perform near chance levels for audio (42%)
Deepfake Detection Accuracy Drop: Lab vs. Real-World: 2024
A 2024 evaluation revealed that open-source detection models, despite strong laboratory performance, struggle significantly on modern real-world deepfakes. The table summarizes the gap by comparing lab AUC scores to real-world AUC scores for leading models and media types, along with the corresponding accuracy drops.
Model | Content Type | Lab Dataset AUC | Real-World AUC | Accuracy Drop |
|---|---|---|---|---|
GenConViT | Video | 0.96 | 0.63 | 34% |
AASIST | Audio | 1.00 | 0.43 | 57% |
NPR | Image | 0.98 | 0.53 | 46% |
Video Models Avg | Video | 0.93 | 0.51 | 50% |
Audio Models Avg | Audio | 0.99 | 0.51 | 48% |
Image Models Avg | Image | 0.97 | 0.52 | 45% |
Why Detection Fails:
Models trained on 2018-2022 data cannot detect 2024 deepfakes
Academic datasets use outdated GAN-based face swaps
Modern deepfakes use diffusion models and commercial tools
Average performance drops 45-50% in real-world conditions
Audio detection drops most severely at 48%
Detection Improvement Methods: 2024
Studies tested four strategies to improve human deepfake detection accuracy. This table shows how different training and support methods improved accuracy, from feedback-based training to AI-assisted tools, along with their measured gains and study counts.
Strategy | Before Training | After Training | Accuracy Gain | Studies |
|---|---|---|---|---|
Feedback Training | 45-60% | 65-88% | +20-28% | 6 |
AI Support Tools | 56-60% | 61-82% | +5-22% | 2 |
Artifact Amplification | 51-55% | 93-95% | +40-42% | 2 |
Awareness Education | 57-60% | 61-67% | +4-7% | 3 |
All Methods | 50-55% | 65.14% | +10-15% | 15 |
Most Effective Methods:
Artifact amplification (caricaturization): 93-95% accuracy
Feedback training: 65-88% accuracy with consistent results
AI decision support: 61-82% accuracy, varies by tool quality
Awareness education: 61-67% accuracy, modest improvement
Hardest Deepfakes to Detect: 2024
Certain deepfake types cause significant accuracy drops for detection systems. The table below outlines which deepfake types caused the largest performance drops for detection systems in 2024.
Deepfake Type | Accuracy Drop | Why It's Harder |
|---|---|---|
Mixed Real/Fake Faces | -31% | Only some faces are manipulated |
Diffusion Video | -21% | Generated by models like Sora |
Audio with Music | -18% | Music masks synthetic voice artifacts |
Modified Bodies/Objects | -17% | Models trained only on faces |
Text Overlay Images | -9.0% | Training data lacked text elements |
Non-English Audio | -7% | Models trained primarily on English |
Detection Blind Spots:
Selective manipulation (31% accuracy drop) is the hardest to detect
Background audio confuses voice detection systems
Models fail on non-facial body modifications
Non-English content reduces accuracy across all systems
Diffusion-generated content degrades performance significantly
Learn More
For more information, you can learn more about Ceartas here and contact us through our integrated chat service if you have any questions.
Sources
Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers: Diel, A., et al., Computers in Human Behavior Reports, December 2024
Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark: Chandra, N.A., et al., TrueMedia.org, May 2025
Compass Vision Achieves Best Performance in Deepfake Detection: Blackbird.AI Research Team, May 2025
Focus Digital Research Study: Focus Digital, Greensboro, NC, November 2025

