Overview
Voice lets users speak with your Delphi in real time, right in the browser. One clear 2-minute recording can take them from typed chats into natural conversation. People open up faster when they can talk, so you capture richer insights while delivering the personal touch that fuels Delphi’s mission of democratizing mentorship.
Why it matters
Feels human — Users express ideas more freely when they can just talk.
Boosts engagement — Calls last longer than text chats, deepening trust.
Works everywhere — Voice calling is included in every plan and runs in any modern browser—no downloads needed.
Respects privacy — You control when voice is on or off and decide exactly what audio sample powers your Delphi.
Quick Start Guide
Navigate to the Voice page in Studio
Click “Start Recording” or Upload File (2-minute WAV/MP3).
Speak for 2 minutes in a quiet room, steady tone.
Press Stop (▢) then Save.
Wait for processing; it can take a few minutes to upload fully.
Tweak settings (Stability, Similarity, Speed) under the gear (⚙️).
Toggle Voice On in the upper-right. A blue switch confirms callers can now reach you.
Total setup time: about 5 minutes.
Full Feature Guide
Voice Recording
Voice Recording
Voice Recording captures the 2-minute sample that powers every call.
How it works
If you haven’t recorded yet, the page opens straight to the recorder. If you already have a sample and want a fresh one, click the plus (+) button in the top-right corner.
Choose one of two options:
Upload File
• Click Upload.
• Select your WAV/MP3 file.
• Click Upload again.
• Wait for the green check in the bottom right corner.Start Recording to capture a new sample.
• Grant browser mic access when prompted.
• Speak clearly for 2 minutes—steady tone, no background noise.
• Press Stop (▢), then Save.Wait for the green check in the bottom right corner.
X out of the screen in the top left corner when done.
Tips for a clean take
Record in a silent room; turn off fans and notifications. Make sure there's no background noise.
Hold volume steady; avoid sudden laughs or whispers.
A single, high-quality sample outperforms many mixed clips.
Voice Settings
Voice Settings
To adjust voice settings for every call, click the gear (⚙️) in the top-right corner of the Voice page. These global settings apply to all live voice and video sessions.
Want to experiment first? Open Voice Playground. The same sliders appear under the Generate tab so you can test scripts—ad reads, podcasts, book excerpts, or things your Delphi would actually say on a live call—without touching your universal settings. They only become permanent if you choose Apply settings to my Delphi.
Setting | 0 on the slider means… | 100 on the slider means… | Raise for… | Lower for... |
Stability (0 – 100 %) | Highly animated voice—big swings in pitch, loud/soft, intense, whispers. | Locked-in broadcaster tone—more monotone, same volume throughout. | Long reads that need consistency. | Role-play or emotional storytelling. |
Similarity (0 – 100 %) | Studio-polished synthesis—background hiss removed, quirks smoothed out. | Carbon copy of your raw sample—every breath, accent edge, and mic artifact preserved. | Brand voice that must sound exactly like you. | Noisy or low-quality original sample. |
Speed (0.7× – 1.2×) | 0.7× = 30 % slower for clarity. | 1.2× = 20 % faster for snappy updates. | Quick Q&A sessions. | Dense explanations, language learners. |
Quick rules of thumb
Stability: Drop by 10 % if calls feel flat; raise if voice sounds chaotic.
Similarity: Keep below 60 % unless your sample was studio-grade quiet.
Speed: Adjust in 0.05 steps to avoid sounding rushed or sluggish.
Click Reset Settings anytime for a clean slate: 50% stability, 75% similarity, 1x speed.
A green check in the bottom right corner confirms all changes.
Voice Model — "Which engine?"
Choose the engine that shapes accent and tone most to your liking:
Default — Balanced clarity; smooths minor accent edges.
For Accents 1 — Most accurate accent reproduction but a bit slower than for accents 2.
For Accents 2 — Fastest generation speed among the two accents model; keeps overall accent flavor but may miss fine nuances.
Try this: Switch the model first, then nudge Stability or Similarity in 10-point steps to zero-in on your perfect sound.
Custom pronunciations — “Say it my way”
Make your Delphi say names and jargon exactly the way you do.
Click Open.
Click Add a word.
Enter the word (e.g., “Delphi”).
Spell it phonetically (e.g., “DEL-f-eye”).
Hit Add. Look for the green check in the bottom right corner.
Test it in Voice Playground; tweak spelling if it sounds off.
Delete any entry by clicking the trash can and look for the green check in the bottom right corner.
Tip — Test in a live call: The Playground is great for quick spot-checks, but the truest preview is to start an actual voice call with your Delphi. Live calls mirror exactly what your audience will hear.
Voice Playground
Voice Playground
Voice Playground lets you audition scripts—podcasts, ads, book passages—without changing your live settings until you decide.
What you can do
Load up to 5,000 characters. Paste any text for instant playback.
Switch models quickly. Try Default vs. Accents 1 / 2 for side-by-side comparison. See here for an explanation of what this means.
Generate unlimited samples. Each click delivers a fresh take, even with identical settings. It's just like how typing the same message into ChatGPT will generate a new answer every time.
Download as MP3. Keep clips for podcast inserts or marketing teasers.
Apply settings to Delphi. Happy with a sample? One click makes those settings global.
View History. See every sample, its exact settings, and play, archive, rename, or delete.
Step-by-step
Open Voice Playground on the Voice page.
In Generate, paste or type your script.
Adjust sliders or choose a Voice Model.
Click Generate; audio auto-plays in a black box.
Inside the box, choose:
Download (⬇︎) to download an MP3.
Apply settings to my Delphi to make them live.
X out (✖︎) or Close to discard.
Switch to History for past work:
Click any sample card to drop down its exact Stability, Similarity, and Speed.
Rename — hit three dots (⋯) ➞ Rename, type a new title, then Rename. Wait for the green check in the bottom right corner.
Archive — three dots (⋯) ➞ Archive ➞ confirm Archive. Wait for the green check in the bottom right corner.
Replay — press play (▶️); the black box re-appears for listening.
Download that sample.
Cross-tab shortcut: keep the black box open, switch back to Generate, and you can still hit Apply settings to my Delphi.
Pro tip: Use Playground for quick trials, but always finish with a live call to confirm how users will hear the voice in real time.
Pro Voice Upgrade
Pro Voice Upgrade
Give your Delphi a studio-grade presence with a richer, more lifelike sound.
What it does
Creates a dedicated voice model trained exclusively on your recordings, unlike our default voice option, which rely on a shared model and prior data.
The default models match your voice to prior data they were trained on, whereas Pro Voice is only and exclusively trained on your voice.
Uses a larger 30-minute training sample (10 min minimum) for finer vocal detail.
Adds advanced noise reduction and higher-fidelity synthesis for a studio-polished sound.
Delivers consistent quality across long calls or streamed content.
How to upgrade
Under the Pro Voice banner, click Upgrade Now.
Click Continue to Payment and complete checkout ($150 / month).
After payment, you'll need to send a sample to support@delphi.ai:
Record or upload at least 10 minutes (30 minutes ideal).
Follow the same “quiet room, steady tone” rules—but aim for studio quality.
We'll take care of everything else for you, and let you know when it's done!
Recording guidelines for best results: if you're recording from scratch or in a studio, consider using this equipment:
Use an XLR mic + interface (e.g., AT-2020 or Rode NT1 with Focusrite).
Keep volume steady at −23 dB to −18 dB RMS, peaks below −3 dB.
Stay two fists from the mic and use a pop filter.
Minimize echoes with simple foam panels, blankets, or a closet booth.
Trim silences & filler words before uploading if you want a polished tone.
Tip: If it still doesn't sound right, contact support@delphi.ai and we'll work with you to get it to the quality you want!
Further Reading
Best Practices
Best Practices
Master these habits to capture a clear, authentic voice that delights callers.
Pick a silent space. Turn off fans, apps, and notifications before you hit Record. Make sure you can't hear any other voices.
Use one strong sample. A single, steady 2-minute take beats many mixed clips.
Hold tone & volume. Keep your mouth two fists from the mic and speak evenly.
Upgrade your gear when possible. A USB mic works, but an XLR mic plus interface lifts quality further. At the very least, make sure you're not using bluetooth calling devices or recording your audio via call platforms like Zoom.
Test in a live call. The quickest way to hear exactly what users will experience.
Solid inputs give Delphi a voice that feels personal and keeps conversations flowing.
Troubleshooting/FAQs
Troubleshooting/FAQs
Why do I hear background hiss or static?
Why do I hear background hiss or static?
You hear a background hiss or static in your Delphi calls because the sample you initially uploaded was poor: the room or mic was noisy, or you didn't use a high quality enough microphone.
Re-record in a quieter space, drop Similarity below 60 %, and use your computer built-in microphone or phone built-in microphone to see if you can improve results!
See here for more information.
Why does the voice sound flat or robotic?
Why does the voice sound flat or robotic?
Your voice sounds flat or robotic likely for one of two reasons:
• Too many mixed samples. Multiple clips with different mics or background noises can create robotic-sounding outputs. Stick to one clean 2-minute take.
• Stability set too high. Lower Stability by 10-20 % and test again in a live call. See here for more information.
Why is my accent washed out?
Why is my accent washed out?
Your accent is washed out likely because you're still on the Default model. Switch to For Accents 1; if it’s still weak, try For Accents 2. Make sure you're still playing around with the settings, while you're testing different models! See here for more information.
Please note that for some accents, even with these multiple models, it might be necessary to purchase Pro Voice. We hope that our built-in voice offering can get you to where you need, but there is only so much the instant model.
How do I get it to pronounce names correctly?
How do I get it to pronounce names correctly?
To get your Delphi to pronounce names correctly, use custom pronunciations. Add the word as you want it to be said phonetically. See here for more information.
Will this voice also identify me when I upload audio files?
Will this voice also identify me when I upload audio files?
Yes, this voice sample is also what will be used to identify your voice when you upload audio or video files. In other words, the same voice sample trains Delphi to separate your voice from others in any audio you upload.
Is this the voice people hear during video calls?
Is this the voice people hear during video calls?
Yes, this is the voice that people hear during video calls as well. Voice and video calls both use the same settings and same sample, so they'll sound the same.
Is this the voice people hear when they click "read aloud" in chat conversations?
Is this the voice people hear when they click "read aloud" in chat conversations?
Yes, this is the voice that people will hear when they click "read aloud" in chat conversations.
Why does my voice sound different on a live call versus the Playground or Read Aloud?
Why does my voice sound different on a live call versus the Playground or Read Aloud?
Playground clips are “experimental,” rendered offline and a bit slower, so they can use looser settings that change each time. Calls must stream in real time, so Delphi adds extra stability for speed and consistency.
To hear the truest result, tweak settings and then start a live call. Use Playground only for creative reads, ads, or long scripts.
Why does the Playground and the Read Aloud function generate a slightly different take every time?
Why does the Playground and the Read Aloud function generate a slightly different take every time?
The reason the Playground and Read Aloud functions generate a slightly different take every time, even if the settings are the same, is that both run on generative AI text-to-speech technology. Each time you click Generate or Read Aloud, Delphi samples tiny shifts in pitch, timing, and energy—like rolling fresh dice inside the same rules. That touch of randomness keeps the voice from sounding canned, but it also means no two clips are identical.
To widen the picture, this is how generative AI works for all AI platforms! It's the reason that ChatGPT won't give you the same answer to one question asked twice.