Accuracy of audio detection depends on a number of factors, including the speaker attributes, spoken language, the generative fake speech methods, and the audio quality. While we can handle a great amount of data variety, our models perform the best when the input audio is English language speech, has little to no background noise/music, and is available in its original uncompressed format. Regarding file size, we recommend a minimum length of 6 seconds, as our models are trained on 6-sec audio segments.
What are some factors affecting the accuracy of the detection on Reality Defender?

Written by Diana Hsieh
Updated over a year ago