Accuracy: What percentage of media are correctly detected as real or fake?
Deepfakes: Media in which one person’s identity is replaced with someone else’s. They are created by means of deep learning techniques starting from large amount of media of a subject (typically a VIP).
Detection model: A neural network AI model that takes a file as input and outputs a probability score on whether the input file is fake.
Facial reenactments: A video deepfaking technique that aims to put subject A's facial expression on subject B, without changing subject B's identity.
Face swaps: A video deepfaking technique that aims to put subject A's face (source) on subject B (target), completely changing subject B's identity. Face swaps are more common than facial reenactments.
Generalize/generalization: The ability of algorithms to perform well on unseen images (unseen during training).
Large Language Model: A probabilistic model that is able to score how likely a given text is, based on the texts seen during training.
Neural network: A broad class of artificial intelligence models which finds statistical patterns in data. These models are able to learn a task (e.g. deepfake detection) without the need of instructions on how to do so, instead relying on a vast amount of data representative of the problem.
Precision: When we predict a piece of media as fake, how often is the model correct?
Recall: From among all the fake media in our dataset, how many are detected as fake?
Text to speech (TTS): Methods which generate speech from text.
Transformer: A machine learning architecture that acts as a building block of many LLMs as well as Reality Defender’s text detector.
Voice Conversion (VC): Methods which impersonate a speaker by giving their speech a different voice; sometimes referred to as speech-to-speech translation methods.