Fundamentally there are two types of movement to consider when prompting for video generation:
Off-screen movement: This effectively encompasses all camera movement, which is outlined in more detail below.
On-screen movement: This encompasses all movement relating to the people, objects and backgrounds on-screen.
Off screen movement: camera shots and examples
Off screen movement: camera shots and examples
Wide Shot
Purpose: Establishing the scene and surroundings.
Prompt Example: “A wide-angle shot of a bustling city square at dusk, capturing the tall buildings, street lights flickering on, and people walking through the square. The camera slowly pans from left to right, revealing the scope of the urban environment. The lighting is soft, with warm hues reflecting off the glass windows. The mood is vibrant yet calm.”
Close-Up Shot
Purpose: Highlight a specific detail or emotion.
Prompt Example: “A close-up shot of a person’s hands gently touching the petals of a rose. The camera remains steady, with a shallow depth of field to blur the background. Lighting is natural and soft, coming from a nearby window, enhancing the texture of the petals and the subtle movement of the fingers.”
Over-the-Shoulder Shot
Purpose: Immersive POV that provides context.
Prompt Example: “An over-the-shoulder shot of a character standing at the edge of a cliff, looking down at the crashing waves below. The camera subtly tilts upward as if following the character’s gaze, capturing both the expanse of the ocean and the steep cliffs. The lighting is dim, with clouds overhead, setting a contemplative mood.”
Tracking Shot
Purpose: Follow the subject’s movement through the environment.
Prompt Example: “A tracking shot following a woman as she walks through a crowded market. The camera smoothly moves beside her at waist height, capturing her interactions with the vibrant surroundings. The lighting is bright and harsh, as sunlight streams through the market’s canvas rooftops, creating sharp contrasts.”
Dolly Zoom
Purpose: Creates a disorienting effect by zooming in while the camera moves back (or vice versa).
Prompt Example: “A dolly zoom effect on a character standing in a forest clearing. As the camera moves backward, it zooms in on the character’s face, creating a sense of isolation and tension. The lighting is dim with shafts of sunlight cutting through the dense trees, casting long shadows. The mood is eerie, with an ominous feeling.”
Useful terms
Static Shot: No camera movement, used to focus the audience on a subject or setting.
Pan: The camera stays in place but moves horizontally across the scene (e.g., left to right).
Tilt: The camera moves vertically (e.g., up or down).
Zoom: The camera lens changes focal length, moving closer to or further from the subject without moving the camera itself.
Dolly: The camera physically moves toward or away from the subject.
Crane/Jib: The camera moves up and down, often used for sweeping overhead shots.
Steadicam/Handheld: Adds fluid or shaky movement for more dynamic or raw shots.
360-degree Pan: The camera rotates fully around the subject for immersive, all-encompassing shots.
Sample prompts
Here are some examples demonstrating how you can combine a number of these elements for more intricate and specific prompts:
Scenario 1
“A steady camera dolly shot moving toward a rustic wooden cabin in the middle of a foggy forest at dawn. The shot begins wide, gradually narrowing in on the cabin as the fog swirls around. The lighting is soft with cold, bluish tones, creating a haunting, mysterious atmosphere. As the camera gets closer, it tilts upward, showing the towering trees surrounding the cabin.”
Scenario 2
“A fast-paced tracking shot through a dense jungle, following a runner as they dodge obstacles and leap over fallen trees. The camera swings rapidly to the side at times, creating a sense of chaos and urgency. Bright sunlight filters through the canopy, creating flickering spots of light on the forest floor. The mood is intense, with the runner’s breath audible over the sound of crunching leaves and snapping branches.”
On screen movement: achieving natural movement
On screen movement: achieving natural movement
General principles
In general, you are likely to see the most success when prompting for movement in just one or two key elements of your clip. Prompting for too many elements at once can result in unwanted or unnatural movement, distortions and so on.
If, for example, you are prompting for movement of your subject and you are using a start frame, it is critical that you use an image that is suggestive of movement already - see examples below. Combined with prompting for movement, such an image will be interpreted effectively by the model.
An image of a clearly static subject will be very unlikely to result in natural movement as generated by the model.
In this image, the paws and overall stance of the kitten are suggestive of movement:
Again, this image is suggestive of movement both in terms of the stance of the subject and the fact that both shoes are partially off the ground:
Prompting for on-screen movement
To get natural, intentional on-screen motion, the phrasing of your prompt matters as much as the visual cues. Here are some best practices:
Be specific: Describe the action/movement clearly and descriptively. For example, instead of "a person dances", you might say "a person twirls with arms raised and then takes a step forward".
Be specific #2: Be clear in your prompt precisely what should move and how. For example "A man lifts a coffee cup and takes a sip" will yield more effective results than "A man drinks"
Anchor any motion in sequence: Use brief, specific steps: e.g. "The child picks up the toy, turns and runs"
Use spatial references: Indicating the direction and referencing the context/environment can help, e.g. "the dog runs across the field from left to right"
Avoid vague language: Phrases like "energetic scene" or "lots of motion" are unlikely to be interpreted well by the model and thus unlikely to produce coherent motion. Similarly language like "The scene is dynamic" is likely to yield unpredictable or underwhelming results.
Pair your prompt with your image: When using a start frame, make sure the image and the described motion in the prompt support one another.
Frame the movement with context: For example, "In a windy city street, a cyclist weaves through traffic", adds both environmental motion and narrative cues.