Skip to main content

Image Detection: How Results Are Returned

Reality Defender’s image detection uses face-focused and full-frame models to assess manipulation, with model scores and heatmaps for interpretation.

W
Written by Wen Huang

Reality Defender's image detection analyzes content using multiple specialized models to identify signs of manipulation. Each image receives a single overall judgement and score, produced by combining signals from all models that ran on that file.


What's included in image analysis

Image detection includes signals from two categories of models, combined into a single result:

  • Face-focused models, which analyze detected face regions for signs of manipulation

  • Full-frame models, which analyze the entire image beyond the face region

Face-focused model breakdown:

  • rd-cedar-img (GAN) — detects artifacts typical of GAN-generated imagery

  • rd-elm-img (Diffusion) — detects artifacts and statistical patterns commonly associated with diffusion-generated images

  • rd-oak-img (FaceSwap) — detects facial manipulation consistent with face replacement / identity swap techniques

  • rd-pine-img (Universal) — general-purpose manipulation detector trained to catch broad, cross-technique signals (not limited to a single generation method)

Each face-focused model has a corresponding full-frame version, which run in parallel:

Face-focused model

Full-frame model

rd-cedar-img

rd-full-cedar-img

rd-elm-img

rd-full-elm-img

rd-oak-img

rd-full-oak-img

rd-pine-img

rd-full-pine-img

All models contribute to a single rd-img-ensemble score, which represents the combined assessment of manipulation likelihood for the image.


What you'll see in the API response

Each model returns its own result in the models[] array, including a status, finalScore, and predictionNumber. Full-frame models are identifiable by full in the model name. The top-level resultsSummary contains the overall verdict and aggregated score.

The response also includes a heatmaps{} object with entries for both face-focused and full-frame models, showing where in the image manipulation signals were detected.

Models that are not applicable to a given media type (for example, video or audio models on an image file) will return "status": "NOT_APPLICABLE" with null score fields. This is expected behavior.


How to interpret face-focused vs. full-frame signals

Face-focused models detect manipulation within facial regions. They're effective for face swaps, facial reenactment, and localized facial artifacts.

Full-frame models detect manipulation elsewhere in the image — in backgrounds, boundaries, lighting, composition, or global image artifacts. They're also more effective when faces are small, partially visible, or not the primary manipulation target.

Both signal types contribute to the final result. You may see cases where:

  • Both face-focused and full-frame models trigger — manipulation affects facial and non-facial regions

  • Only face-focused models trigger — manipulation is likely concentrated within the face

  • Only full-frame models trigger — manipulation may exist outside the face region or involve broader image synthesis

All of these scenarios are expected and reflect the full scope of detection coverage.


Context Aware analysis

Image results may also include a signal from rd-context-img, which applies contextual analysis to the image beyond model-level detection. When present, this signal is factored into the overall result. When not applicable, it will appear with "status": "NOT_APPLICABLE".


Recommended usage

For most use cases, the overall image judgement and score in resultsSummary should be used as the primary decision signal. Individual model results — including the face-focused vs. full-frame breakdown — are available for debugging, auditing, or deeper investigation.

Did this answer your question?