Reality Defender's video detection analyzes content using multiple specialized models to identify signs of manipulation. Each video receives a single overall judgement and score, produced by combining signals from all models that ran on that file.
What's included in video analysis
Video detection includes signals from three models, combined by an ensemble:
rd-dynamics-vid (Dynamics) — analyzes temporal inconsistencies and motion dynamics across the video
rd-erie-vid (Universal) — analyzes facial and motion-based manipulation signals across frames
rd-tahoe-vid (Guided) — applies a face representation learning approach to detect manipulation artifacts
All three models contribute to a single rd-vid-ensemble score, which represents the combined assessment of manipulation likelihood for the file.
What you'll see in the API response
Each model returns its own result in the models[] array, including a status, finalScore, and predictionNumber. The top-level resultsSummary contains the overall verdict and aggregated score.
Models that are not applicable to a given media type (for example, image or audio models on a video file) will return "status": "NOT_APPLICABLE" with null score fields. This is expected behavior.
How to interpret video model results
Individual model scores reflect each model's independent assessment. Because each model is trained to detect different features of manipulation, it is normal for them to diverge. The ensemble accounts for this, weighting each model's output to produce a single calibrated score.
You may see cases where:
All three models return high scores — manipulation signals are consistent across detection approaches
One or two models trigger while others do not — manipulation may be concentrated in the features that specific model is trained to detect
Models return low scores individually but the ensemble score is elevated — the combined weight of evidence across models still indicates a meaningful signal
For most use cases, the overall video judgement and score in resultsSummary should be used as the primary decision signal. Individual model results are available for debugging, auditing, or deeper investigation.
Context-aware analysis
Video results may also include a signal from rd-context-vid, which applies contextual analysis to the video beyond frame-level detection. When present, this signal is factored into the overall result. When not applicable, it will appear with "status": "NOT_APPLICABLE".