What are the best practices for collecting content to scan?

For text, it’s best to copy only the main section (such as a paragraph). Do not include headings, formatting, etc.

For media (Image, Video, Audio), it’s best to use the original posted media if possible (as model accuracy decreases as media hops around the internet from one platform to another).

Additionally, the models work better on higher resolution/quality media.

Lastly, if Audio from a Video needs to be analyzed, it’s best to allow the platform to do this as it ensures the extracted Audio is not modified in a way to impact the model results.