Overview

All the most performant AI video generation models are aggregated within Arcads.

Each model comes with its own strengths and limitations, but they all share one common characteristic:

Every AI video model has a maximum video length.

This article explains:

The maximum video duration supported by each model on Arcads
Why these limits exist
Proven methods to create videos longer than 30 seconds despite those constraints

Why AI Models Have Video Length Limits?

AI video models generate content frame by frame using heavy compute resources. To maintain:

visual consistency
audio sync
facial realism
rendering speed

each model enforces a maximum clip duration.

Video Length Limits by Model (Recap)

Exact limits may evolve over time — always refer to the model selector inside Arcads for the most up-to-date values.

Model	Max lenght (per clip)
Sora 2 pro	12 sec
Veo 3.1	8 sec
Kling 2.6	10 sec
Arcads 1.0	No limit, but best for scripts that go equal or above 1500 characters limit (>= 1min)
Audio Driven	No limit, but best for scripts that go equal or less than 600 characters (>= 45sec)
Omnihuman 1.5	No limit, but best for scripts that go equal or less than 400 characters limit (>= 30sec)

When to Use Each Model ?

User-Generated Content (UGC) Model Selection Criteria

For UGC video creation, consider the following:

For stable and natural motion, the Audio-Driven model is the top choice, especially for longer scripts requiring professional polish.
For shorter scripts, Sora 2 Pro provides excellent results, ensuring a balance between cinematic quality and naturalness.
Be mindful of the OmniHuman model's tendency for exaggerated movements, which might not suit videos requiring high visual stability.

Visual Video Models

These models are designed to generate pure visual clips (motion, scenes, product shots). They work best for short, high-impact sequences and should be combined together for longer videos.

Sora 2 Pro Use for cinematic scenes, storytelling shots, or high-quality visuals where realism matters. Best for premium-looking clips up to 12 seconds. Additionally, Sora 2 Pro is optimal for creating natural UGC videos with shorter scripts, delivering visually appealing and concise outputs.
Sora 2 Think of it as Sora 2 Pro's fast little sibling. Great for quick drafts, testing ideas, and everyday clips when you don't need the full cinematic treatment. Lets you iterate fast before going Pro.
Veo 3.1 Ideal for fast-paced hooks and dynamic motion. Use it when you need attention in the first seconds of an ad or video.
Kling 3.0 and 2.6 Best suited for product visuals and smooth transitions. A good balance between motion and visual stability.
Seedance 1.5 Best for human movement and character animation. Use it when your clip involves people dancing, walking, or any expressive body motion. Handles natural movement better than most other models.
Grok Video Can go up to 15 seconds. Model that can produce longest output so far among the options.

Face Cam Models

These models are optimized to generate talking human avatars (UGC-style, testimonials, explanations). They rely on script length rather than seconds.

Arcads 1.0 Use for long-form talking head videos: product explanations, tutorials, structured messaging. Best choice when you need 30–45s+ of continuous speech.
Audio Driven Best for short, voice-led clips with a natural speaking rhythm. Ideal for concise messages, intros, or mid-video segments. It is particularly effective for creating stable and professional-looking UGC content due to its ability to eliminate exaggerated movements and unwanted camera motion.
Omnihuman 1.5 Designed for very short UGC-style hooks and punchlines. Perfect for social ads, openings, or quick reactions However, it may produce exaggerated actor movements and visuals, making it less suitable for videos requiring stable visual effects.

How to extend your videos?

Option 1 : Face cam Models (Arcads 1.0, Audio driven, Omnihuman 1.5) ?

If your initial video was created using one of the models above, you can easily extend it by generating additional clips.

To do so:

Select the same model
Choose the same actor
Keep the same voice
Write a new script for the next segment

Each generated clip will maintain 100% actor and voice consistency with the original video.

Once all clips are generated, simply combine them using any third-party video editing tool outside of Arcads to create a longer, seamless video.

This approach is the recommended way to create videos longer than the model’s per-clip limit while preserving visual and audio continuity.

Option 2: Visual Video Models (Sora 2 pro, Veo3.1, Kling 2.6)

How to extend a video, from a specific frame ?

If your initial video was created using one of the Visual Video Models (Sora 2 Pro, Veo 3.1, or Kling 2.6), follow these steps:

Click on your generated video
Select Take Snapshot

3. Choose the frame where you want the next video to start, then click Pick Frame.

4. The selected frame will be extracted and saved as an image.

5. Click Transform to Video.

You can now generate another clip starting from the selected frame. This ensures scene, context, and actor consistency across clips.

Additional Tips

Since this method uses models such as Veo 3.1, Kling 2.6, and Sora 2 Pro—which do not allow direct voice control—you may experience voice inconsistencies across clips.

In such cases, we recommend using ElevenLabs to standardize the voice. You can upload your video to ElevenLabs, select Voice Changer, and apply the same voice across the entire video for consistent audio output.

How to continue a video without selecting a specific frame ?

To continue a video without picking a specific frame, simply choose the extend option and do not select any keyframe. In this case, the system will automatically use the last moments of the existing video as context and continue the scene naturally from there.

This approach works best when you want a smooth, natural continuation of the same action, setting, and dialogue. The model will preserve visual consistency such as camera angle, lighting, and character behavior, and extend the video forward in time.