Troubleshooting Guide

This guide helps resolve common issues when using Script to Speech.

Common Issues

1. Silent Audio Detection

Problem: Generated audio clips are silent or nearly silent. This seems to occur most frequently with short / single-word clips in TTS providers other than ElevenLabs

Symptoms:

  • Warnings about silent clips in logs
  • Audio files with low dBFS readings
  • Partially silent audiobooks

Solutions:

  1. Identify Silent Clips

    uv run sts-generate-audio script.json config.yaml \
      --populate-cache --check-silence
  2. Generate Replacements Running sts-generate-audio with the --check-silence flag will generate pre-populated sts-generate-standalone-speech commands which just need to be copy and pasted. See the Standalone Speech Generation Guide for more details.

    # Copy reported text exactly
    # -v flag controls how many generations versions of each clip to generate
    uv run sts-generate-standalone-speech openai --voice echo \
      "silent text 1" "silent text 2" -v 3
  3. Manual Rename

    # Match exact cache filename from report
    mv standalone_speech/generated_file.mp3 \
      standalone_speech/[original_cache_filename].mp3
  4. Apply Overrides This will move any cache-matching files from standalone_speech to the screenplay’s cache directory

    uv run sts-generate-audio script.json config.yaml \
      --populate-cache --cache-overrides

Prevention:

  • Use --check-silence during initial generation
  • Consider alternative TTS providers for troublesome text (ElevenLabs rarely seems to have issues with silent clips)

2. Rate Limiting

Problem: API requests are being rate limited.

Symptoms:

  • Rate limit error messages
  • Delayed audio generation
  • Automatic retries and backoff

Solutions:

  1. Automatic Handling

    • The system automatically retries with exponential backoff
    • Each provider has separate rate limit handling
    • Future versions of Script to Speech will allow for more manual control of thread limits and backoff behavior
  2. **Reduce Global Concurrent Downloads Note that this will limit the overall (cross-provider) maximum concurrent downloads

    uv run sts-generate-audio --max-workers 5
  3. Distribute Across TTS Providers

    # Split voices across multiple TTS providers
    NARRATOR:
      provider: openai
    MAIN_CHARACTER:
      provider: elevenlabs
    SIDE_CHARACTER:
      provider: zonos

Prevention:

  • Use multiple TTS providers
  • Monitor provider-specific limits

3. API Key Issues

Problem: API authentication failures.

Symptoms:

  • “API key not set” errors
  • Authentication failed messages
  • 401 Unauthorized errors

Solutions:

  1. Check Environment Variables

    # Verify keys are set
    echo $OPENAI_API_KEY
    echo $ELEVEN_API_KEY
    echo $CARTESIA_API_KEY
    echo $MINIMAX_API_KEY
    echo $MINIMAX_GROUP_ID
    echo $ZONOS_API_KEY
  2. Set Keys Properly

    # In terminal session
    export OPENAI_API_KEY="your-key-here"
    
    # Or in .env file
    OPENAI_API_KEY=your-key-here
    ELEVEN_API_KEY=your-key-here
  3. Validate Key Format

    • OpenAI: Starts with sk-
    • ElevenLabs: 32-character string
    • Cartesia: Starts with sk_car_
    • Minimax (API key): Bearer token format, long string
    • Minimax (Group ID): String of digits
    • Zonos: Starts with zsk-

Prevention:

  • Use .env file for persistent keys
  • Check API dashboard for key validity

4. Voice Configuration Errors

Problem: Voice not found or invalid configuration.

Symptoms:

  • Voice ID errors
  • Missing provider configuration
  • Invalid voice parameters
  • Required fields missing

Solutions:

  1. Validate Configuration

    # Check for missing/extra/duplicate speakers
    uv run sts-tts-provider-yaml validate script.json config.yaml
    
    # Strict validation including provider field validation
    uv run sts-tts-provider-yaml validate script.json config.yaml --strict
  2. OpenAI

  • Ensure a valid voice option is being used
    # Valid voice options
    voices: [alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer]
  1. ElevenLabs
  1. Minimax

    # Validate voice_id is one of the valid voice IDs
    voice_id: Casual_Guy  # Must be one of system voices
    
    # If using voice_mix, ensure proper structure
    voice_mix:
      - voice_id: Casual_Guy  # Must be valid voice ID
        weight: 70  # Must be 1-100
      - voice_id: Deep_Voice_Man
        weight: 30
  2. Zonos

    # Validate voice is one of the default_voice_name from zonos documentation
    default_voice_name: american_male  # Must be one of 9 default voices

Prevention:

  • Use sts-tts-provider-yaml generate for templates
  • Validate configuration with --dry-run run mode and / or sts-tts-provider-yaml validate
  • Keep backup of working configurations

5. Memory and Disk Space Issues

Problem: System running out of memory or disk space.

Symptoms:

  • Generation stops unexpectedly
  • System slowdown
  • “No space left on device” errors

Solutions:

  1. Reduce Batch Size For memory constrained systems, reducing the concatenation batch size can help performance when combining audio segments

    uv run sts-generate-audio --concat-batch-size 150
  2. Reduce Maximum Concurrent Downloads Reducing the amount of concurrent downloads can help reduce memory usage

    uv run sts-generate-audio --max-workers 5
  3. Process in Segments

    # Process chapters separately
    uv run sts-generate-audio chapter1.json --populate-cache
    uv run sts-generate-audio chapter2.json --populate-cache
    # Manually combine later
  4. Clean Unnecessary Files

    # Remove temporary files
    rm -rf output/*/logs/old_logs_*.txt
    rm -rf standalone_speech/unused_*.mp3

Prevention:

  • Monitor disk space before large projects
  • Use --populate-cache for gradual processing
  • Consider processing on systems with adequate resources

6. Text Processor Configuration Issues

Problem: Text processors not working as expected.

Symptoms:

  • Text not being transformed
  • Wrong text processor precedence
  • Validation errors

Solutions:

  1. Check Text Processor Order
  • All preprocessors from all configs, will be run before processors

  • Multiple will be processed in order

  • Within a config, (pre)processors will be run top to bottom

  • Pay attention to “chain” mode (pre)processors (output of one (pre)processor becomes input of next) vs. “override” mode (last instance takes precedence)

    # config 1
    preprocessors:
      - name: extract_dialogue_parentheticals
    processors:
      - name: text_substitution
      - name: capitalization_transform_processor
    # config 2
    preprocessors:
      - name: speaker_merge_preprocessor
    processors:
      - name: pattern_replace_processor
    # Resultant processing pipeline ordering:
    # extract_dialogue_parentheticals -> speaker_merge_preprocessor -> text_substitution ->
    #  capitalization_transform_processor -> pattern_replace_processor
  1. Validate Configuration

    # Test text processor configuration
    uv run sts-apply-text-processors-json script.json \
      --text-processor-configs test_config.yaml
  2. Fix Syntax Errors

    • Check YAML indentation
    • Verify field names match exactly
    • Ensure required fields are present (check log output)

Prevention:

  • Start with default configuration
  • Add custom processors incrementally
  • Test with small examples first

Problem: Unexpected cache behavior.

Symptoms:

  • Audio not being reused
  • Cache files overwriting each other
  • Missing cache files

Solutions:

  1. Verify Cache Naming

    # Cache filename structure:
    # [original_hash]~~[processed_hash]~~[provider_id]~~[speaker_id].mp3
  2. Check File Paths

    # Ensure cache directory exists
    ls -la output/[script]/cache/
  3. Clear Problematic Cache

    # Remove specific cache files
    rm output/[script]/cache/problematic_*.mp3
    # Or clear all cache
    rm -rf output/[script]/cache/

Prevention:

  • Avoid modifying text processors / parser between runs of a screenplay
  • Maintain separate cache directories for different versions

8. ElevenLabs-Specific Issues

Problem: ElevenLabs voice management errors.

Symptoms:

  • “Voice not found in registry” errors
  • 30 voice limit exceeded
  • Monthly add/remove quota reached

Solutions:

  1. Use Public Library Voices

    # Only use public library voice IDs
    SPEAKER:
      provider: elevenlabs
      voice_id: ErXwobaYiN019PkySvjV  # Public library ID
  2. Monitor Voice Usage

    • Provider automatically manages 30 voice limit
    • Check if monthly quota is exceeded (check log file)
  3. Alternative Approach

    # If ElevenLabs issues persist, switch provider temporarily
    uv run sts-generate-audio backup_config.yaml

Prevention:

  • Minimize voice changes during development
  • Use recommended voice tags (narrative & story, conversational)
  • Plan voice allocation before large projects; try to reuse same 30 voices, as going above this will result in voice swapping from the library

9. Parser Issues

Note: Screenplay parsing is currently fragile. It works best with movie screenplays with “standard” formatting. Support of scanned in / OCR’d scripts is currently poor. Additional configuration, and better handling of edge-cases, is planned for a future release

Problem: Screenplay parsing errors.

Symptoms:

  • Incorrect speaker attribution
  • Merged dialogues
  • Missing text chunks

Solutions:

  1. Manual Text Extraction

    # Extract to text first for manual editing
    uv run sts-parse-screenplay script.pdf --text-only
    # Edit text file to remove headers/footers
    # Then parse cleaned text
    uv run sts-parse-screenplay cleaned_script.txt
  2. Check Parser Output

    # Analyze parsed structure
    uv run sts-analyze-json script.json
  3. Validate any Custom Parser Changes

    # When making changes to the parser, show differences in output between
    # parser version used to originally generate script.json and current parser logic
    uv run sts-parse-regression-check-json script.json

Prevention:

  • Clean PDF before parsing
  • Verify screenplay formatting
  • Review parsed output before audio generation

10. Network Connectivity Issues

Problem: Network errors during API calls.

Symptoms:

  • Connection timeout errors
  • Intermittent failures
  • SSL/TLS errors

Solutions:

  1. Retry with Backoff

    • System automatically retries failed requests
    • Check network stability
  2. Test Connectivity

    # Test basic connectivity to each provider
    curl https://api.openai.com/v1/models
    curl https://api.elevenlabs.io/v1/voices
  3. Configure Timeouts

    • Network issues are handled automatically
    • Consider VPN if regional restrictions apply

Prevention:

  • Stable internet connection
  • Use --populate-cache run mode to ensure all files downloaded before generation
  • Use local cache when possible

11. LLM Voice Casting Issues

Problem: Issues with LLM-assisted voice casting workflow.

Symptoms:

  • LLM returns invalid YAML
  • Missing speakers in LLM output
  • Configuration validation errors
  • Voice library IDs not recognized

Solutions:

For Character Notes Generation:

  1. Try a different LLM

    • Certain LLM providers / models struggle with the task of adding .yaml comments while leaving the rest of the structure intact
    • Claude Sonnet and Gemini Pro seem to work well
  2. Generate Proper Casting Prompt

    # Ensure prompt includes current configuration
    uv run sts-generate-character-notes-prompt \
      source_screenplays/script.pdf \
      input/script/script_voice_config.yaml
  3. Validate LLM Output

    # Check for structural issues
    uv run sts-tts-provider-yaml validate input/script/script.json \
      input/script/script_voice_config.yaml
    
    # Strict validation for provider fields
    uv run sts-tts-provider-yaml validate input/script/script.json \
      input/script/script_voice_config.yaml --strict

For Voice Library Casting:

  1. Ensure Character Notes Exist

    • Voice library casting works best when character descriptions are present as YAML comments
    • Either use character notes generation first, or manually add notes
  2. Generate Voice Library Casting Prompt

    # Include all providers you want to cast from
    uv run sts-generate-voice-library-casting-prompt \
      input/script/script_voice_config.yaml \
      openai elevenlabs cartesia
  3. Validate Voice Library IDs

    • LLM must use valid sts_id values from the voice libraries
    • Check that returned IDs exist in the provider’s voice library
    • Use --strict validation to catch invalid voice configurations
  4. Common Voice Library Casting Issues

    • Invalid sts_id: LLM may invent voice IDs not in the library
    • Missing sts_id: LLM may forget to add the sts_id field
    • Wrong provider: LLM may assign voices from wrong provider’s library
    • Overwriting config: LLM may remove existing provider-specific fields

Prevention:

  • Use the two-step workflow: character notes first, then voice library casting
  • For privacy-conscious workflows, manually add character notes instead of using LLM
  • Always validate configuration before proceeding with audio generation
  • Keep backup of working configurations
  • Try a reasoning LLM model if voice casting is producing incorrect / sub-par results

Debugging Tools

Command Line Tools

  1. Standalone Speech Testing

    # Test individual voice configurations
    uv run sts-generate-standalone-speech openai --voice echo "Test text"
  2. Dry Run Validation

    # Validate configuration without generation
    uv run sts-generate-audio script.json config.yaml --dry-run
  3. Configuration Validation

    # Check voice configuration against script
    uv run sts-tts-provider-yaml validate script.json config.yaml
    
    # Strict validation including provider fields
    uv run sts-tts-provider-yaml validate script.json config.yaml --strict
  4. Processor Testing

    # Test text processors independently
    uv run sts-apply-text-processors-json script.json \
      --text-processor-configs test_config.yaml \
      --output-path debug_output.json
  5. Parser Regression Testing

    # When making changes to the parser, show differences in output between
    # parser version used to originally generate script.json and current parser logic
    uv run sts-parse-regression-check-json script.json
  6. Voice Casting Utilities

    # Generate LLM prompt for voice casting
    uv run sts-generate-character-notes-prompt script.pdf config.yaml
    
    # Copy any file to clipboard
    uv run sts-copy-to-clipboard file.txt

Log Analysis

  1. Check Detailed Logs

    # View recent logs
    tail -f output/[script]/logs/[run mode]_log_YYYYMMDD_HHMMSS.txt
  2. Filter Errors

    # Find errors in logs
    grep -i error output/[script]/logs/[run mode]_log_*.txt
    grep -i warning output/[script]/logs/[run mode]_log_*.txt

Getting Help

Information to Include

When reporting issues, include:

  1. Full error message
  2. Command used
  3. Log file from output/[script]/logs
  4. Configuration files (tts config, any additional processor configs, dialogue chunk .json)
  5. System information (OS, Python version)
  6. UV version: uv --version

Best Practices for Avoiding Issues

  1. Incremental Development

    • Test with small scripts first
    • Build up to full-length projects
    • Use --dry-run frequently
  2. Configuration Management

    • Keep backup configurations
    • Version control YAML files
    • Document custom changes
    • Use sts-tts-provider-yaml validate to check configurations
  3. Resource Management

    • Monitor disk space
    • Use appropriate batch sizes
    • Clean up old files regularly
  4. Quality Assurance

    • Validate dialogue .json with sts-analyze-json
    • Use --populate-cache during audio generation to ensure all files downloaded without issue
    • Use --check-silence during audio generation
    • Use sts-generate-standalone-speech to test new voices and TTS providers
    • Validate voice configurations with sts-tts-provider-yaml validate
  5. Error Prevention

    • Set up API keys properly
    • Follow naming conventions
    • Use provided templates
    • Validate configurations before audio generation
  6. LLM-Assisted Workflows

    • Use sts-generate-character-notes-prompt for consistent prompts
    • Always validate LLM output with sts-tts-provider-yaml validate
    • Keep backup configurations before making LLM-suggested changes