Overview

Voice lets users speak with your Delphi in real time, right in the browser. One clear 2-minute recording can take them from typed chats into natural conversation. People open up faster when they can talk, so you capture richer insights while delivering the personal touch that fuels Delphi’s mission of democratizing mentorship.

Why it matters

Feels human — Users express ideas more freely when they can just talk.
Boosts engagement — Calls last longer than text chats, deepening trust.
Works everywhere — Voice calling is included in every plan and runs in any modern browser—no downloads needed.
Respects privacy — You control when voice is on or off and decide exactly what audio sample powers your Delphi.

Quick Start Guide

Navigate to the Voice page in Studio
Click “Start Recording” or Upload File (2-minute WAV/MP3).
Speak for 2 minutes in a quiet room, steady tone.
Press Stop (▢) then Save.
Wait for processing; it can take a few minutes to upload fully.
Tweak settings (Stability, Similarity, Speed) under the gear (⚙️).
Toggle Voice On in the upper-right. A blue switch confirms callers can now reach you.

Total setup time: about 5 minutes.

Full Feature Guide

Voice Recording

Voice Recording captures the 2-minute sample that powers every call.

How it works

If you haven’t recorded yet, the page opens straight to the recorder. If you already have a sample and want a fresh one, click the plus (＋) button in the top-right corner.
Choose one of two options:
1. Upload File
  • Click Upload.
  • Select your WAV/MP3 file.
  • Click Upload again.
  • Wait for the green check in the bottom right corner.
2. Start Recording to capture a new sample.
  • Grant browser mic access when prompted.
  • Speak clearly for 2 minutes—steady tone, no background noise.
  • Press Stop (▢), then Save.
3. Wait for the green check in the bottom right corner.
X out of the screen in the top left corner when done.

Tips for a clean take

Record in a silent room; turn off fans and notifications. Make sure there's no background noise.
Hold volume steady; avoid sudden laughs or whispers.
Keep all samples in a single language. Mixing languages confuses the model and degrades voice quality.

A single, high-quality sample outperforms many mixed clips.

Enable Voice Toggle

The Voice Toggle is the master switch for live voice calling. It flips the Call button on or off without touching any other features.

What it does

Default ON. Call button appears as soon as your first recording finishes.
Hide / show Call button only. Toggle exclusively affects the ability to call—Read Aloud in text chat keeps working.
No data loss. If you turn off video calling, your avatars, recordings, and settings stay intact. Plus, you can continue to add video content as training data for your mind even while video is turned off.

Step-by-step

Open the Voice page.
Locate the toggle in the upper-right corner.
Turn Voice Off
- Slide the switch left.
- It turns gray and shows a green check “Voice calling disabled.”
Turn Voice On
- Slide the switch right.
- It turns blue and shows a green check “Voice calling enabled.”

Tips

Test before you publish. After big changes, place a quick live call to hear exactly what users will experience.
Use Off for focus. Pause calls during launches, events, or high-traffic times while text chat stays open.
Don’t delete recordings. If you need a break, toggle Voice off; your samples stay safe and ready for future use. But if you delete samples, you'll no longer be able to train audio or video content.

Voice Settings

To adjust voice settings for every call, click the gear (⚙️) in the top-right corner of the Voice page. These global settings apply to all live voice and video sessions.

Want to experiment first? Open Voice Playground. The same sliders appear under the Generate tab so you can test scripts—ad reads, podcasts, book excerpts, or things your Delphi would actually say on a live call—without touching your universal settings. They only become permanent if you choose Apply settings to my Delphi.

Setting	0 on the slider means…	100 on the slider means…	Raise for…	Lower for...
Stability (0 – 100 %)	Highly animated voice—big swings in pitch, loud/soft, intense, whispers.	Locked-in broadcaster tone—more monotone, same volume throughout.	Long reads that need consistency.	Role-play or emotional storytelling.
Similarity (0 – 100 %)	Studio-polished synthesis—background hiss removed, quirks smoothed out.	Carbon copy of your raw sample—every breath, accent edge, and mic artifact preserved.	Brand voice that must sound exactly like you.	Noisy or low-quality original sample.
Speed (0.7× – 1.2×)	0.7× = 30 % slower for clarity.	1.2× = 20 % faster for snappy updates.	Quick Q&A sessions.	Dense explanations, language learners.

Quick rules of thumb

Stability: Drop by 10 % if calls feel flat; raise if voice sounds chaotic.
Similarity: Keep below 60 % unless your sample was studio-grade quiet.
Speed: Adjust in 0.05 steps to avoid sounding rushed or sluggish.

Click Reset Settings anytime for a clean slate: 50% stability, 75% similarity, 1x speed.

A green check in the bottom right corner confirms all changes.

Voice Model — "Which engine?"

Choose the engine that shapes accent and tone most to your liking:

Default — Balanced clarity; smooths minor accent edges.
For Accents 1 — Most accurate accent reproduction but a bit slower than for accents 2.
For Accents 2 — Fastest generation speed among the two accents model; keeps overall accent flavor but may miss fine nuances.

Try this: Switch the model first, then nudge Stability or Similarity in 10-point steps to zero-in on your perfect sound.

Custom pronunciations — “Say it my way”

Make your Delphi say names and jargon exactly the way you do.

Click Open.
Click Add a word.
Enter the word (e.g., “Delphi”).
Spell it phonetically (e.g., “DEL-f-eye”).
Hit Add. Look for the green check in the bottom right corner.
Test it in Voice Playground; tweak spelling if it sounds off.
Delete any entry by clicking the trash can and look for the green check in the bottom right corner.

Tip — Test in a live call: The Playground is great for quick spot-checks, but the truest preview is to start an actual voice call with your Delphi. Live calls mirror exactly what your audience will hear.

Voice Playground

Voice Playground lets you audition scripts—podcasts, ads, book passages—without changing your live settings until you decide.

What you can do

Load up to 5,000 characters. Paste any text for instant playback.
Tweak sliders on the fly. Stability, Similarity, Speed appear in the Generate tab. See here for an explanation of what these sliders do.
Switch models quickly. Try Default vs. Accents 1 / 2 for side-by-side comparison. See here for an explanation of what this means.
Generate unlimited samples. Each click delivers a fresh take, even with identical settings. It's just like how typing the same message into ChatGPT will generate a new answer every time.
Download as MP3. Keep clips for podcast inserts or marketing teasers.
Apply settings to Delphi. Happy with a sample? One click makes those settings global.
View History. See every sample, its exact settings, and play, archive, rename, or delete.

Step-by-step

Open Voice Playground on the Voice page.
In Generate, paste or type your script.
Adjust sliders or choose a Voice Model.
Click Generate; audio auto-plays in a black box.
Inside the box, choose:
- Download (⬇︎) to download an MP3.
- Apply settings to my Delphi to make them live.
- X out (✖︎) or Close to discard.
Switch to History for past work:
- Click any sample card to drop down its exact Stability, Similarity, and Speed.
- Rename — hit three dots (⋯) ➞ Rename, type a new title, then Rename. Wait for the green check in the bottom right corner.
- Archive — three dots (⋯) ➞ Archive ➞ confirm Archive. Wait for the green check in the bottom right corner.
- Replay — press play (▶️); the black box re-appears for listening.
- Download that sample.
- Cross-tab shortcut: keep the black box open, switch back to Generate, and you can still hit Apply settings to my Delphi.

Pro tip: Use Playground for quick trials, but always finish with a live call to confirm how users will hear the voice in real time.

Pro Voice Upgrade

Give your Delphi a studio-grade presence with a richer, more lifelike sound.

What it does

Creates a dedicated voice model trained exclusively on your recordings, unlike our default voice option, which rely on a shared model and prior data.
- The default models match your voice to prior data they were trained on, whereas Pro Voice is only and exclusively trained on your voice.
Uses a larger 30-minute training sample (10 min minimum) for finer vocal detail.
Adds advanced noise reduction and higher-fidelity synthesis for a studio-polished sound.
Delivers consistent quality across long calls or streamed content.

How to upgrade

On the Voice page, click the gear button.
Under the Pro Voice banner, click Upgrade Now.
The Add-ons screen opens; toggle Pro Voice, which is the first add-on → the switch turns orange.
Click Continue to Payment and complete checkout ($150 / month).
After payment, you'll need to send a sample to support@delphi.ai:
- Record or upload at least 10 minutes (30 minutes ideal).
- Follow the same “quiet room, steady tone” rules—but aim for studio quality.
We'll take care of everything else for you, and let you know when it's done!

Recording guidelines for best results: if you're recording from scratch or in a studio, consider using this equipment:

Use an XLR mic + interface (e.g., AT-2020 or Rode NT1 with Focusrite).
Keep volume steady at −23 dB to −18 dB RMS, peaks below −3 dB.
Stay two fists from the mic and use a pop filter.
Minimize echoes with simple foam panels, blankets, or a closet booth.
Trim silences & filler words before uploading if you want a polished tone.

Tip: If it still doesn't sound right, contact support@delphi.ai and we'll work with you to get it to the quality you want!

Voice

Overview

Why it matters

Quick Start Guide

Full Feature Guide

Voice Recording

Tips for a clean take

Enable Voice Toggle

What it does

Step-by-step

Tips

Voice Settings

Voice Model — "Which engine?"

Custom pronunciations — “Say it my way”

Voice Playground

Pro Voice Upgrade

Further Reading

Best Practices

How to get the best voice recording

Disabling calling

Troubleshooting/FAQs

Can I upload voice samples in different languages? Does Delphi support different accents for each language?

How do I turn off my voice or get rid of the ability for people to call my Delphi?

How do I turn on my voice or enable the ability for people to call my Delphi?

Why do I hear background hiss or static?

Why does the voice sound flat or robotic?

Why is my accent washed out?

How do I get it to pronounce names correctly?

Will this voice also identify me when I upload audio files?

Is this the voice people hear during video calls?

Is this the voice people hear when they click "read aloud" in chat conversations?

How do I turn off read aloud?

Why does my voice sound different on a live call versus the Playground or Read Aloud?

Why does the Playground and the Read Aloud function generate a slightly different take every time?