Python Text to Speech pyttsx3 — Core Concepts

Understand pyttsx3's engine lifecycle, voice selection, property tuning, event callbacks, and file output for offline text-to-speech in Python.

What pyttsx3 does

pyttsx3 is a Python text-to-speech library that works offline. It delegates speech synthesis to the OS-native engine — SAPI5 on Windows, NSSpeechSynthesizer on macOS, and espeak on Linux. No API keys, no network calls, no usage quotas.

Install with pip install pyttsx3. On Linux, also install espeak: sudo apt install espeak.

Engine lifecycle

The entry point is pyttsx3.init(), which returns an Engine instance connected to the platform driver:

import pyttsx3

engine = pyttsx3.init()
engine.say("Hello, world")
engine.runAndWait()

say() queues text for synthesis. runAndWait() processes the queue and blocks until all speech finishes. You can queue multiple say() calls before a single runAndWait().

For non-blocking use, startLoop(False) starts the event loop in the background:

engine.startLoop(False)
engine.say("Background speech")
engine.iterate()   # process pending commands
# ... do other work, calling iterate() periodically ...
engine.endLoop()

Voice selection

List available voices and switch between them:

voices = engine.getProperty('voices')
for voice in voices:
    print(voice.id, voice.name, voice.languages)

# Set a specific voice by index
engine.setProperty('voice', voices[1].id)

The number and quality of voices depends on what is installed on the OS. Windows typically includes David (male) and Zira (female). macOS has dozens of high-quality voices. Linux espeak voices are robotic but cover many languages.

Properties

Three main properties control speech output:

# Speech rate (words per minute, default ~200)
engine.setProperty('rate', 150)

# Volume (0.0 to 1.0, default 1.0)
engine.setProperty('volume', 0.8)

# Voice (voice ID string)
engine.setProperty('voice', voice_id)

Read current values with getProperty():

current_rate = engine.getProperty('rate')

Saving to file

Instead of playing through speakers, save speech to an audio file:

engine.save_to_file("This will be saved", "output.mp3")
engine.runAndWait()

The output format depends on the platform driver. SAPI5 supports WAV and MP3. espeak outputs WAV by default. This is useful for generating audio content, pre-recording responses, or creating accessible versions of text documents.

Event callbacks

def on_start(name):
    print(f"Starting: {name}")

def on_word(name, location, length):
    print(f"Word at {location}, length {length}")

def on_end(name, completed):
    print(f"Finished: {name}, completed: {completed}")

engine.connect('started-utterance', on_start)
engine.connect('started-word', on_word)
engine.connect('finished-utterance', on_end)

The started-word callback fires for each word, providing character offset and length. This enables synchronized highlighting — display text and highlight each word as it is spoken.

Practical patterns

Reading a file aloud:

with open("article.txt") as f:
    text = f.read()
engine.say(text)
engine.runAndWait()

Cycling through voices for comparison:

for voice in voices:
    engine.setProperty('voice', voice.id)
    engine.say(f"This is {voice.name}")
    engine.runAndWait()

Combining with input for a basic assistant:

while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    response = get_response(user_input)  # your logic
    engine.say(response)
    engine.runAndWait()

Limitations

pyttsx3 uses whatever voices the OS provides — you cannot add neural or cloud-quality voices through the library itself. Speech quality varies significantly across platforms: macOS voices sound natural, Windows voices are decent, and Linux espeak is functional but robotic. For higher quality offline TTS, consider Coqui TTS or other neural TTS engines, though they require more setup and resources.

The one thing to remember: pyttsx3 gives you three-line offline TTS — init, say, runAndWait — with voice selection, rate/volume tuning, and file export, all powered by whatever speech engine your operating system already has.

pythonpyttsx3ttstext-to-speech