Some of our models work separately and some look at (or are trained by) looking at the combination of audio and video.
For example: If an audio file is uploaded, our audio models evaluate the file.
If a video file is uploaded, our video models evaluate the file, but we have some video models that evaluate the video by examining the audio as well.
Lastly, for each video, we can also extract the audio and send it to our audio models.