Back to Blog
4 New Lifelike AI Voices: Voiceover Workflow + Text-to-Speech Tips
Tips2026-05-12

4 New Lifelike AI Voices: Voiceover Workflow + Text-to-Speech Tips

Why does someone else's AI voiceover sound full of soul and feeling, while yours sounds like a robot reading a script — broken, lifeless? The difference isn't the tool. It's three things: picking the right voice, writing the right script, and using punctuation properly. We just added 4 new lifelike voices, and put together the full workflow from script to finished video, with scriptwriting tips that bring your characters to life.

Meet the 4 New Lifelike Voices

Sora (Cantonese, Female)

Sora

Fly (Cantonese, Male)

Fly

Xiao Q (Mandarin, Female)

Xiao Q

Zhen Dong (Mandarin, Male)

Zhen Dong

Want to preview all 17 voices? Head over to the Voice Studio and listen instantly.

From Text to Video: The Full Voiceover Workflow

Just three decisions to go from script to finished piece.

Decision 1 — Pick the Voice

A. Use Your Own Voice (Voice Cloning)

Upload a 30-second to 3-minute voice sample, and we generate a personal voice model. Every future script outputs in your own voice. Best for personal brands, long-running YouTubers, podcasters. Read the Voice Cloning Complete Guide or try the Voice Clone Demo directly.

B. Pick a Professional Voice

The 4 new voices plus 13 existing ones make 17 professional voiceover voices in total, covering Cantonese, Mandarin, English, and Japanese. Pick and go — no training required.

Decision 2 — Pick the Output Format

A. Text to Speech (.mp3 audio)

Pure audio output. Bring your own visuals (vlog, b-roll, tutorial screen recording), drop the mp3 into Premiere, Final Cut, or CapCut.

B. Text to Video (.mp4 video)

We auto-generate captions and a visual backdrop — output is a finished video ready to upload, perfect for shorts and social reels. See The Real Story of Making Ghost Tales with Text to Video.

Decision 3 — Where It Goes

  • Podcast audio
  • YouTube voice-over (narration / b-roll dub)
  • Tutorial videos
  • Ad and brand shorts

Key point: none of these use cases need lip-sync. The voiceover sits as an audio layer on top of the visuals — it's not a talking-head video, so AI voiceover is a perfect fit.

Why Not Just Record Yourself?

Recording yourselfHey Subtitle AI voiceover
Forgetting lines, fumbling words, hoarse throatOne take, no retakes
Re-record every script revisionRegenerate instantly when text changes
Edit out every 'um' and 'uh'Not needed
Mic, quiet room, time blocked offJust your computer
Fatigue after many takesZero emotional drain

Scriptwriting Tips: Bringing Soul into Every Line

AI voiceover isn't plug-and-play one-take success. Each voice has its own personality and rhythm — knowing how to write is what unlocks its full potential.

Punctuation is the emotion switch

Comma (,) controls pause, period (.) gives breathing room, question mark (?) lifts the ending tone, exclamation (!) emphasises, tilde (~) adds a playful drag. Same sentence, different punctuation, completely different feel.

Spacing controls rhythm

Spaces between words signal slight pauses to the voiceover engine. Add space before and after key terms to spotlight important info.

Paragraph breaks trigger deep breaths

Long paragraphs split across lines, and the engine automatically inserts deeper breathing pauses — making longer passages sound more natural.

Comparison demo

  • Flat: The weather is so nice today let's go hiking
  • With punctuation: The weather is so nice today, let's go hiking!
  • Advanced: The weather~ is so nice today! Let's... go hiking!

Free minutes are there so you can write several versions, compare, and pick the most natural one as your final. Each try teaches you more about each voice's personality.