Skip to main content

Veo 3 & Veo 3.1 by Google

Feature Release: Veo 3 & Veo 3.1 , the new state-of-the-art video model, now on Leonardo.Ai

Written by Ayumi Umehara
Updated yesterday

Introduction

Leonardo.Ai brings best-in-class video generation with the full Veo 3 and Veo 3.1 model suite by Google. Whether you're drafting quick concepts or producing finished, high-resolution video, there's a model for every stage of your workflow.

Read on to find out why they're great for video creation and story telling, the differences between the models, and how to get the best results out of them.


The Veo 3 Model Suite

Veo 3: A breakthrough in AI video and audio generation. It delivers enhanced realism, improved physics, and native audio generation - including dialogue - so you can create fully realised videos without post-production. It supports:

  • Text-to-Video and Image-to-Video (Start Frame)

  • Native audio generation including dialogue

  • 720p and 1080p resolution

  • 4, 6, or 8 second durations

Veo 3 Fast: A faster, more affordable version of Veo 3. Great for rapid ideation and iteration.

Veo 3.1: Builds on Veo 3 with sharper realism, smarter motion, enhanced expression, and greater creative control. It adds End Frame support for precise control over how your video concludes, and supports 4K resolution. It supports:

  • Everything in Veo 3, plus:

  • End Frame support (requires a Start Frame)

  • Enhanced Image-to-Video fidelity - improved realism, depth, and consistency

  • Smarter physics and expression - more natural movement, emotional detail, and gesture

  • 720p, 1080p, and 4K resolution options

Veo 3.1 Fast A faster, more affordable version of Veo 3.1 - including 4K support.

Veo 3.1 Lite: The most cost-effective model in the suite, with similar speeds to Veo 3.1 Fast, but at less than half the token cost - ideal for high-volume workflows, drafting, and early-stage ideation. It supports:

  • Text-to-Video and Image-to-Video (Start Frame)

  • Native audio generation

  • 720p and 1080p resolution

  • 4, 6, or 8 second durations

Core capabilities of the Veo 3 suite of models

  • Native Audio Generation: Audio is generated automatically with Veo videos. You can add audio cues directly in your prompt, for example :

    • “The sound of an ice cream truck plays in the background.”

    • “The captain turns and says, ‘We set sail at daybreak.’”

    • Note that audio cannot currently be turned off.

  • Multi-Modal input support: You can combine text prompts and image prompts (start and end frames where applicable).


How to generate videos with Veo 3

  1. From the home page, navigate to the AI Video Creation tool by clicking Video beneath the prompt bar or in the left side bar

2. Click the Models menu in the side bar

3. Select Veo 3 or another Veo 3 model from the Models menu.

4. Enter your text prompt, and click Generate. To get the most out of your videos, check out this blog for tips on mastering prompts for Veo models.

Optional: Add a start and end frame (where applicable). Learn more.


Veo 3 Image to Video - Using a Start Frame

You can now have even more control over your Veo 3 outputs by using an image as a start frame. Combined with your prompt, Veo 3 will use your image as the starting point of the video, letting you achieve the exact aesthetic you want and guiding your scene in the right direction.

Tips for creating seamless videos with Start and End frames:

  • Create your Start Frame, then explore fresh perspectives and angles along the way with Nano Banana via our Inline Editor. Nano Banana also helps maintain character consistency across scenes, ensuring you achieve perfect end frames. (Alternatively use the End Frame as your jumping off point to create a suitable Start Frame)

  • Craft moody, cinematic shots with Lucid Realism, then bring them to life with Start–End Frame video transitions.

  • Longer videos may offer smoother but potentially slower transitions. Choosing a shorter duration may offer much faster but potentially less smoother transitions - consider subtler transitions for shorter videos.

  • Avoid having extremely different Start and End frames (such as different settings or extreme changes and transformations). Veo 3.1 is best used for clean actions within a specific scene or context. If fancier morphing or scene transitions are required, consider using Kling 2.1 Pro instead.


Frequently Asked Questions

Is Veo 3 available to free users?

Unfortunately due to the token cost, Veo 3 models are only accessible to paid plan holders.

Can I control the length of my Veo 3 generations?

Yes, you can choose from 4, 6, and 8 seconds with any model.

How do I add audio to Veo 3 generations?

Audio cues can be added at any points in the prompt when required E.g. “the sound of an ice cream truck can be heard in the background”, or add general audio cues at the end of the prompt, e.g "audio: sound of keyboard typing and ambient sound of air conditioner unit".

What are start and end frames, and how do I use them?

Check out this detailed guide that cover how to create videos using start and end frames.

Did this answer your question?