How to create a custom talking actor ?

Here are two processes, depending on the model you choose:

Process 1: For audio-driven models and the OmniHuman model

Process 2: For the Arcads 1.0 model

PROCESS 1 : Using Audio driven OR Omnihuman models :

Step 1 — Generate the actor image (choose a model)

Before you can make a “talking actor,” you need a strong single portrait image (clean face, good lighting, sharp details). In the image generator, pick one of these:

Nano Banana Pro

Best when you want very accurate, “smart” images (good real-world understanding), strong editing, and reliable text rendering in images

Seedream 4.5

Best when you need cinematic aesthetics, strong spatial reasoning, and especially consistent characters across multiple generations (great if you’re iterating a character look and want it to stay stable).

GPT Image 1.5

Best when you want very strong instruction-following and a tight prompt-to-image match, plus a solid edit workflow (generate + transform + edit). It’s a good “default” choice when you want the model to do exactly what you described.

Quick pick

Want “most controllable prompt fidelity”? → GPT Image 1.5
Want “best cinematic look + consistency across variations”? → Seedream 4.5
Want “smart editing + strong text/precision + all-around reliability”? → Nano Banana Pro

How to prompt the perfect actor image

Use a prompt that locks down identity + camera framing.

Prompt template

Subject: age range, ethnicity (optional), hairstyle, wardrobe
Framing: “front-facing medium close-up” / “head-and-shoulders”
Lighting: “soft key light, natural skin texture”
Background: “plain studio background” (keeps attention on the face)
Style: “photorealistic” (recommended for talking actors)

Example prompt

"Photorealistic head-and-shoulders portrait of a confident presenter, front-facing, neutral background, soft studio lighting, sharp focus on eyes, natural skin texture, 35mm lens look, minimal shadows, high detail"

Result

GPT Image ->

Nanobanana image ->

Seedream image ->

Tip: avoid heavy motion blur, extreme angles, hands covering face, or busy backgrounds—clean facial visibility makes the talking result look more believable.