The most significant factor affecting the accuracy of our text detector is the input text domain; although it’s generally robust, our text detector sometimes struggles to accurately detect text from rare, specialized domain with a lot of jargons.
Two other factors are the length of the input text, and the ratio of the machine-generated text, if the input is a mix of machine and human text. When the input text length is extremely short, for instance a sentence of less than 10 words, there may not be enough information within the content of the input text for an accurate prediction by our detector.
Similarly in the case of mixed text input, if the text has a very low ratio of machine text (less than 20% of the entire text input) can make the detection harder for our model.
Lastly, note that our text detector is currently designed to detect whether an English input text is human-written or machine-generated, provided that it is a well-formed, natural language text. Hence, the performance for a gibberish text is not guaranteed (e.g. “aaaaaaaa hello.”). Any text that is either not natural language, or English is not guaranteed to be accurate.