Back to Blog
A True Experience Using Text to Video to Tell Ghost Stories | Hey Subtitle Cantonese AI Test
Updates2026-04-18

A True Experience Using Text to Video to Tell Ghost Stories | Hey Subtitle Cantonese AI Test

You think this is a video. It's actually just text.

Lights off. Headphones on.

"That feeling—like someone's standing right behind you... but you don't dare turn around."

Within ten seconds, you're already tense.

You see a shadow at the end of the corridor, you hear footsteps closing in—yet you know you're just watching a YouTube Short.

But actually, even the "video" doesn't exist.

—These three episodes of *Abandoned Places: The Haunting Anthology* were all generated from a single Cantonese text script.


We invited HK urbex explorer HK Urbexman to share three of his most unforgettable paranormal experiences.

One condition only:

No re-shooting. No returning to the sites.

He handed us three blocks of text.

Then we used Hey Subtitle—turning text into voice, into visuals, into complete videos.

The result: what you're seeing below.


Episode 1: Wan Chai "Bus Tycoon" Mansion — The Headless Spirit

Story excerpt

> "That feeling—like someone's standing right behind you... but you don't dare turn around."

> "When I was editing this footage, I started getting a headache. The more I watched, the worse it felt."

Why this combination scares you

  • Cover composition: Dim corridor, a white ghostly figure at the far end, light slanting through the window, red carpet—a single image sets the "classic haunted mansion" atmosphere
  • Cantonese voice: A line like "someone's standing right behind you" loses its edge in Mandarin; Cantonese voice preserves the texture, and lands on the heart in one breath
  • Rhythm pauses: The pause markers in the script—placed mid-sentence at "but you... don't dare turn around"—create a breath of suspense that no silent reader can feel

When image, sound, and silence synchronize, the audience isn't "seeing a ghost"—they've stepped into the narrator's shoes.


Episode 2: Lantau Abandoned Clubhouse — The Guarding Dog Spirit

Story excerpt

> "It had been dead for a long time. Body dried out, limbs stretched straight, head tilted toward the door—as if it had been waiting there for its owner to come back."

> "I looked at it, and a thought suddenly rose in me—'It's... still here.'"

Why this combination scares you

  • Cover composition: Worn red carpet, stained leather sofa, smoke rising from the dog's body, scattered cloth and petals—one look and you read "this dog is still here"
  • Cantonese voice: A phrase like "it's... still here" only carries that "half-sorrowful, half-eerie" quality when spoken in Cantonese
  • Uncanny imagery: An animal spirit's atmosphere simply cannot be captured by phone. An AI-generated image delivers the "between grief and dread" composite, synchronized with narration

The result: the audience isn't afraid of the dog. They're afraid of the *lingering* itself.


Episode 3: Hong Kong Underground Bomb Shelter — The Extra Footsteps

Story excerpt

> "I stopped—it stopped too. I walked faster—so did it."

> "My flashlight showed... just an empty corridor. No one there. But the footsteps were still there."

Why this combination scares you

  • Cover composition: Damp concrete corridor, protagonist aiming a flashlight, shadowy claws faintly hanging from the ceiling—urbex viewers immediately recognize the scene
  • Cantonese voice: This episode's horror rests entirely on sound—"an extra pair of footsteps." Voice delivering "I stopped—it stopped" with tight sync is what makes it work
  • First-person POV: The visual can recreate the "flashlight beam illuminating an endless corridor" immersion—paired with narration, the viewer feels they're in the corridor too

In this episode, Text to Video achieves what live shooting cannot: turning sound into the protagonist.


Why Text to Video Makes Words So Powerful

Break these three episodes apart, and Text to Video gets four things right:

1. Voice is the soul

The same line, read in mechanical Mandarin TTS, is information. Read in Cantonese voice, it's atmosphere. Hey Subtitle's Cantonese voice preserves the texture—pauses, tone, rhythm—that neither plain text nor static images can reach.

2. Images fill in the scene

"I won't go back into that room." "That bomb shelter is too dangerous to return to." But viewers need the visual to step in. Upload an AI-generated atmosphere image as the background—no footage, still visuals.

3. Pauses build suspense

Place a pause marker at a critical beat like "...but—no one there." Tension doubles. Plain-text readers can't feel it.

4. All three in sync

Image transitions, voice rise and fall, pauses—they play together on one timeline. This is where Text to Video surpasses "dub and edit": you don't need editing skills. The text's rhythm drives the visual rhythm automatically.


How It's Done — Four Steps

  1. Paste your Cantonese script—use colloquial speech directly, no need to convert to formal writing
  2. Pick a voice—male/female, with emotion tags to set the mood
  3. Choose backgroundDefault background (Hey Subtitle built-in) or Custom upload (your own atmosphere image; JPG/PNG/HEIC/WEBP supported)
  4. One-click output—generates both video and audio files, ready for YouTube Shorts / IG Reels / TikTok

Closing

You thought "stories without footage" could only be told in text.

But when text, image, and Cantonese voice unite—the audience's hair stands on end. This isn't editing. This is recreation.

The three haunting episodes are a demo, not the endpoint. Your own story may be even more worth recreating.

Try Text to Video Now → (15 free minutes on signup, plus 5 free minutes every month)