Reality Defender's image detection analyzes content using multiple specialized models to identify signs of manipulation. Each image receives a single overall judgement and score, produced by combining signals from all models that ran on that file.
What's included in image analysis
Image detection includes signals from two categories of models, combined into a single result:
Face-focused models, which analyze detected face regions for signs of manipulation
Full-frame models, which analyze the entire image beyond the face region
Face-focused model breakdown:
rd-cedar-img (GAN) — detects artifacts typical of GAN-generated imagery
rd-elm-img (Diffusion) — detects artifacts and statistical patterns commonly associated with diffusion-generated images
rd-oak-img (FaceSwap) — detects facial manipulation consistent with face replacement / identity swap techniques
rd-pine-img (Universal) — general-purpose manipulation detector trained to catch broad, cross-technique signals (not limited to a single generation method)
Each face-focused model has a corresponding full-frame version, which run in parallel:
Face-focused model | Full-frame model |
rd-cedar-img | rd-full-cedar-img |
rd-elm-img | rd-full-elm-img |
rd-oak-img | rd-full-oak-img |
rd-pine-img | rd-full-pine-img |
All models contribute to a single rd-img-ensemble score, which represents the combined assessment of manipulation likelihood for the image.
What you'll see in the API response
Each model returns its own result in the models[] array, including a status, finalScore, and predictionNumber. Full-frame models are identifiable by full in the model name. The top-level resultsSummary contains the overall verdict and aggregated score.
The response also includes a heatmaps{} object with entries for both face-focused and full-frame models, showing where in the image manipulation signals were detected.
Models that are not applicable to a given media type (for example, video or audio models on an image file) will return "status": "NOT_APPLICABLE" with null score fields. This is expected behavior.
How to interpret face-focused vs. full-frame signals
Face-focused models detect manipulation within facial regions. They're effective for face swaps, facial reenactment, and localized facial artifacts.
Full-frame models detect manipulation elsewhere in the image — in backgrounds, boundaries, lighting, composition, or global image artifacts. They're also more effective when faces are small, partially visible, or not the primary manipulation target.
Both signal types contribute to the final result. You may see cases where:
Both face-focused and full-frame models trigger — manipulation affects facial and non-facial regions
Only face-focused models trigger — manipulation is likely concentrated within the face
Only full-frame models trigger — manipulation may exist outside the face region or involve broader image synthesis
All of these scenarios are expected and reflect the full scope of detection coverage.
Context Aware analysis
Image results may also include a signal from rd-context-img, which applies contextual analysis to the image beyond model-level detection. When present, this signal is factored into the overall result. When not applicable, it will appear with "status": "NOT_APPLICABLE".
Recommended usage
For most use cases, the overall image judgement and score in resultsSummary should be used as the primary decision signal. Individual model results — including the face-focused vs. full-frame breakdown — are available for debugging, auditing, or deeper investigation.