NotebookLM-style podcast clones
Two-host podcast where one AI voice asks questions and the other explains. Build the script, generate Host A and Host B with contrasting voices, splice. No subscription, no waitlist.
Generate multi-character conversations with 54 distinct voices. Free, commercial-use, no signup. Build NotebookLM-style podcasts, audiobook scenes, and animated dialogue in minutes.
Free tier: 5,000 characters/month
You've used all 5,000 free characters for this month. Sign in with Google to get 500,000 characters per month — free, no credit card.
You've used your 500,000 characters for this 30-day window. Your allowance resets automatically — thanks for using FreeTextoSpeech.
Most AI dialogue tools lock multi-speaker mode behind a paid plan. FreeTextoSpeech runs the same workflow for free — generate each character with a different voice from the 54-voice catalog, splice the WAVs in any audio editor, ship the scene. Podcasts, audiobooks, animated shorts, language drills, training scenarios. One pipeline, no subscription.
Related use cases
Write the script with speaker tags for your own clarity. Generate Character A's lines as one WAV with one voice (Sarah), Character B's lines as a second WAV with a contrasting voice (Adam). Stitch them in Audacity or Reaper, leave 200–400 ms gaps between turns, and bounce a single dialogue track. Free, commercial-use, 24 kHz WAV per voice.
Format every line as SPEAKER: line. Keep turns short (1–3 sentences) so handoffs feel natural. Strip the speaker tags before pasting — TTS reads them out loud otherwise.
Paste only Character A's lines, pick a voice (e.g. Sarah), generate, save the WAV. Repeat for Character B with a contrasting voice (e.g. Adam). Same speed for both unless one character is meant to be faster.
Drop both WAVs onto separate tracks in Audacity, Reaper, or Logic. Cut each turn to its own clip and slide them into conversational order. Leave 200–400 ms gaps between turns — silence sells the back-and-forth.
Bounce to one stereo WAV or MP3. Optional: pan Character A 15% left and Character B 15% right for clearer separation in headphones. Loudness-normalize to -16 LUFS and you have a publish-ready dialogue.
Two-host podcast where one AI voice asks questions and the other explains. Build the script, generate Host A and Host B with contrasting voices, splice. No subscription, no waitlist.
Voice every character in your storyboard or Twine prototype with a different Kokoro voice. 54 voices means an entire small cast without hiring VAs for the rough cut.
Narrate the prose with one voice (River or Daniel), then swap in distinct character voices for direct quotes. Listeners track who is speaking without a "she said" tag every line.
Generate two-speaker dialogues in Spanish, French, Hindi, Japanese, Mandarin, and 4 more languages. Pair a male and female native voice for realistic A/B exchange drills.
Dialogue lives or dies on whether listeners can tell the two voices apart without thinking. These six pairings contrast on gender, accent, or character tone — pick one and start generating.
Host + expert guest
Best for
NotebookLM-style explainer podcasts. Sarah asks the questions, Adam delivers the authoritative answer. Most natural pairing for two-host US English shows.
Casual conversation
Best for
Lifestyle pods, friend-chat formats, café-scene dialogue in audiobooks. Both voices read warm, so the back-and-forth feels relaxed instead of formal.
Period drama or BBC-style
Best for
Audiobook scenes set in the UK, prestige documentary two-handers, history podcasts with a host-and-narrator structure. The matched accent keeps the world consistent.
Animation / game characters
Best for
Animated shorts, Twine prototypes, indie game NPC dialogue. Puck is mischievous, Sky is bright — clear character voices, not narrator voices.
Documentary narrator + interviewee
Best for
True-crime, history, science docs where a smooth narrator threads between recreated quotes. River carries the throughline, Nova plays the voice in the archive.
Co-host energy
Best for
Morning-show banter, news-and-chat formats, branded podcast intros where two hosts trade a cold open before the show kicks in.
Want to hear them? Browse all 54 voices →
The model handles the voice. The realism comes from how you script the beats, the silences, and the small reactions between characters. Six rules that cover the full pipeline.
Write the script as SARAH: line / ADAM: line for your own clarity, then delete the tags before generating. Otherwise the voice reads "Sarah colon" out loud. Same for stage directions in brackets — keep them in the script doc, not in the TTS input.
A comma is a quarter-beat, a period is a half-beat, an em dash is a real pause, an ellipsis is a held beat. If a character is hesitating, write "Well... I don't know if that's true." instead of "Well I don't know if that's true." The pause is what sells the hesitation, and Kokoro respects the punctuation.
Real conversational gap-time averages around 250 ms. Slap two TTS clips back-to-back and the dialogue sounds robotic; add 300 ms of true silence between turns in your editor and it sounds like two people talking. For tense scenes, drop to 100 ms. For thoughtful exchanges, push to 600 ms.
Truncate Character A's clip mid-word, then drag Character B's clip to start 50–100 ms before A ends. The brief overlap is the universal audio cue for "they cut in." A 100 ms crossfade on the overlap zone smooths the splice and the interruption reads as natural.
Pan Character A 15% left and Character B 15% right. Not enough to feel theatrical, enough that headphone listeners stop conflating who's speaking. Hard panning (50%+) only works for radio drama. For podcasts and audiobooks, 15% is the sweet spot.
No SSML, no <laugh> tag — write what you want spoken. "Hah hah hah" delivers a clean three-syllable laugh, "mmhm" reads as agreement, "ugh" lands as exasperation. Test the reaction as a 10-character solo generation, swap voices if one delivers it cleaner, then drop the WAV onto a separate track in your edit.
PlayHT and Murf both ship native multi-speaker modes — paste once, render one mix. The trade is monthly character caps, signup, and paid commercial license. Here is the honest read.
Multi-voice dialogue support
FreeTextoSpeech
Generate each character separately with any of 54 voices, splice in your editor — no character cap on number of voices in a scene.
PlayHT / Murf free tiers
Native multi-speaker dialogue feature, but locked behind paid tiers and capped voice roster on free.
Voice variety per scene
FreeTextoSpeech
54 voices across 9 languages — enough for an entire ensemble cast.
PlayHT / Murf free tiers
Smaller free voice pool with most distinct character voices paywalled.
Cost for a 5-minute dialogue
FreeTextoSpeech
Free. No character cap on number of generations.
PlayHT / Murf free tiers
Counts against monthly character cap on free tier; longer scenes push you to a paid plan.
Commercial license on free tier
FreeTextoSpeech
Full commercial use allowed, no attribution.
PlayHT / Murf free tiers
Commercial use restricted on free tier — paid plan required for monetized podcasts and indie games.
Output format
FreeTextoSpeech
24 kHz WAV per voice — drop straight onto separate tracks in any DAW.
PlayHT / Murf free tiers
Compressed MP3 on free tier, often single-mix only.
Signup
FreeTextoSpeech
None. Open the page, paste, generate.
PlayHT / Murf free tiers
Email signup required, credit card on file for commercial features.
One-click multi-voice generation
FreeTextoSpeech
Not native — you generate each voice separately and splice.
PlayHT / Murf free tiers
Native multi-speaker dialogue mode on paid tiers (one paste, one render).
If a single-paste multi-speaker render is non-negotiable, PlayHT and Murf own that feature. If you can spend 10 extra minutes splicing in Audacity to skip the subscription, the math here is straightforward.
Still wondering? Get in touch →
Studio-quality reads for two-host podcast intros and segments.
Indie audiobook narration with character-dialogue voice swaps.
Why Kokoro voices read as human across long dialogue scenes.
Drop dialogue scenes straight into Premiere or DaVinci Resolve.
Generate it free, in under 10 minutes, with full commercial rights.