Video Detection | Reality Defender Help Center

Understand how deepfakes are made (face swaps vs facial reenactment), how our detection models analyze video frames, what factors affect accuracy (poses, blur, distance), and how localization highlights suspect segments.

How RD Detects Video Deepfakes

There are two primary types of video deepfakes:

Face Swaps:
Replace the target’s face with another person’s (source) face — fully changing the target’s identity. These are the most common form of deepfakes.
Facial Reenactment:
Transfer the expressions of one person (source) onto another (target) without changing the target’s identity — effectively “puppeteering” the target’s face.

Both techniques rely on large amounts of video footage of the source subject. Using a neural network called an Auto-Encoder, the system learns a shared internal representation between the two faces. This representation allows the model to reconstruct the target face starting from the encoded source, frame by frame — resulting in a convincing fake.

Reality Defender’s platform runs two independent video detection models, both based on deep neural networks that analyze detected faces across frames.

Pattern-Based Detector
- Examines fine-grained statistical patterns and textures in facial regions.
- Learns from a large dataset of known real and fake videos.
- Identifies subtle artifacts left by face synthesis models (e.g., inconsistent skin texture, lighting, or facial edges).
Self-Supervised Detector
- Trained on a massive dataset of real videos that are artificially perturbed.
- Random transformations (rotations, resizing, geometric noise) simulate fake-like distortions.
- Produces a more generalizable, data-efficient detector that doesn’t rely on known deepfake datasets.

Together, these models yield robust detection performance across diverse video sources and manipulation methods.

YouTube Link Scans: Resolution Notes (360p vs 1080p)

When a YouTube link is uploaded to the Reality Defender web interface, the system downloads and analyzes a 360p version of the video by default. This ensures fast, consistent inference across videos of varying length and bandwidth.

Although higher-resolution (1080p+) streams are available, our current detectors are not significantly affected by this lower resolution. Support for full 1080p+ scanning is planned in upcoming platform updates for enterprise customers requiring maximum fidelity.

Video Result Localization: What You’ll See

Reality Defender supports segment-level localization for video results.

Each detected face is analyzed across sequential frame segments.
Segments flagged as “fake” or “real” are aggregated to form an overall file classification.
In the web app or API response, users will see visual markers showing which portions of the video were most likely manipulated.

This allows investigators, moderators, or security analysts to pinpoint where manipulations occur — not just whether a file is fake.

Related FAQs

Question	Answer
How are video deepfakes made?	Using face swaps or facial reenactment driven by auto-encoders trained on large sets of source/target faces.
What affects accuracy?	Pose diversity, blur, lighting, and face size within the frame can all impact detection confidence.
Does the model use audio?	No — video and audio detectors run independently. A mismatched voiceover won’t trigger a false positive.
Does the system provide localization?	Yes. You’ll see segment-level results highlighting suspect regions.
Are higher resolutions supported?	Scanning defaults to 360p for performance reasons. 1080p+ scanning is on the roadmap.