Troubleshooting Guide

This guide helps resolve common issues when using Script to Speech.

Common Issues

1. Silent Audio Detection

Problem: Generated audio clips are silent or nearly silent. This seems to occur most frequently with short / single-word clips in TTS providers other than ElevenLabs

Symptoms:

Warnings about silent clips in logs
Audio files with low dBFS readings
Partially silent audiobooks

Solutions:

Identify Silent Clips

uv run sts-generate-audio script.json config.yaml \
  --populate-cache --check-silence

Generate Replacements Running sts-generate-audio with the --check-silence flag will generate pre-populated sts-generate-standalone-speech commands which just need to be copy and pasted. See the Standalone Speech Generation Guide for more details.
```
# Copy reported text exactly
# -v flag controls how many generations versions of each clip to generate
uv run sts-generate-standalone-speech openai --voice echo \
  "silent text 1" "silent text 2" -v 3
```

Manual Rename

# Match exact cache filename from report
mv standalone_speech/generated_file.mp3 \
  standalone_speech/[original_cache_filename].mp3

Apply Overrides This will move any cache-matching files from standalone_speech to the screenplay’s cache directory
```
uv run sts-generate-audio script.json config.yaml \
  --populate-cache --cache-overrides
```

Prevention:

Use --check-silence during initial generation
Consider alternative TTS providers for troublesome text (ElevenLabs rarely seems to have issues with silent clips)

2. Rate Limiting

Problem: API requests are being rate limited.

Symptoms:

Rate limit error messages
Delayed audio generation
Automatic retries and backoff

Solutions:

Automatic Handling
- The system automatically retries with exponential backoff
- Each provider has separate rate limit handling
- Future versions of Script to Speech will allow for more manual control of thread limits and backoff behavior
**Reduce Global Concurrent Downloads Note that this will limit the overall (cross-provider) maximum concurrent downloads
```
uv run sts-generate-audio --max-workers 5
```

Distribute Across TTS Providers

# Split voices across multiple TTS providers
NARRATOR:
  provider: openai
MAIN_CHARACTER:
  provider: elevenlabs
SIDE_CHARACTER:
  provider: zonos

Prevention:

Use multiple TTS providers
Monitor provider-specific limits

3. API Key Issues

Problem: API authentication failures.

Symptoms:

“API key not set” errors
Authentication failed messages
401 Unauthorized errors

Solutions:

Check Environment Variables

# Verify keys are set
echo $OPENAI_API_KEY
echo $ELEVEN_API_KEY
echo $CARTESIA_API_KEY
echo $MINIMAX_API_KEY
echo $MINIMAX_GROUP_ID
echo $ZONOS_API_KEY

Set Keys Properly

# In terminal session
export OPENAI_API_KEY="your-key-here"

# Or in .env file
OPENAI_API_KEY=your-key-here
ELEVEN_API_KEY=your-key-here

Validate Key Format
- OpenAI: Starts with sk-
- ElevenLabs: 32-character string
- Cartesia: Starts with sk_car_
- Minimax (API key): Bearer token format, long string
- Minimax (Group ID): String of digits
- Zonos: Starts with zsk-

Prevention:

Use .env file for persistent keys
Check API dashboard for key validity

4. Voice Configuration Errors

Problem: Voice not found or invalid configuration.

Symptoms:

Voice ID errors
Missing provider configuration
Invalid voice parameters
Required fields missing

Solutions:

Validate Configuration

# Check for missing/extra/duplicate speakers
uv run sts-tts-provider-yaml validate script.json config.yaml

# Strict validation including provider field validation
uv run sts-tts-provider-yaml validate script.json config.yaml --strict

OpenAI

Ensure a valid voice option is being used

# Valid voice options
voices: [alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer]

ElevenLabs

Check voice ID is from public voice library (https://elevenlabs.io/app/voice-library) and not the “My voices” library (https://elevenlabs.io/app/voice-lab)
Search for voice ID in the public voice library to make sure it still exists. Voices are some times removed from ElevenLabs
```
# Voice ID format: 21-character string
voice_id: ErXwobaYiN019PkySvjV  # ID must be from public library
```

Minimax

# Validate voice_id is one of the valid voice IDs
voice_id: Casual_Guy  # Must be one of system voices

# If using voice_mix, ensure proper structure
voice_mix:
  - voice_id: Casual_Guy  # Must be valid voice ID
    weight: 70  # Must be 1-100
  - voice_id: Deep_Voice_Man
    weight: 30

Zonos

# Validate voice is one of the default_voice_name from zonos documentation
default_voice_name: american_male  # Must be one of 9 default voices

Prevention:

Use sts-tts-provider-yaml generate for templates
Validate configuration with --dry-run run mode and / or sts-tts-provider-yaml validate
Keep backup of working configurations

5. Memory and Disk Space Issues

Problem: System running out of memory or disk space.

Symptoms:

Generation stops unexpectedly
System slowdown
“No space left on device” errors

Solutions:

Reduce Batch Size For memory constrained systems, reducing the concatenation batch size can help performance when combining audio segments
```
uv run sts-generate-audio --concat-batch-size 150
```
Reduce Maximum Concurrent Downloads Reducing the amount of concurrent downloads can help reduce memory usage
```
uv run sts-generate-audio --max-workers 5
```

Process in Segments

# Process chapters separately
uv run sts-generate-audio chapter1.json --populate-cache
uv run sts-generate-audio chapter2.json --populate-cache
# Manually combine later

Clean Unnecessary Files

# Remove temporary files
rm -rf output/*/logs/old_logs_*.txt
rm -rf standalone_speech/unused_*.mp3

Prevention:

Monitor disk space before large projects
Use --populate-cache for gradual processing
Consider processing on systems with adequate resources

6. Text Processor Configuration Issues

Problem: Text processors not working as expected.

Symptoms:

Text not being transformed
Wrong text processor precedence
Validation errors

Solutions:

Check Text Processor Order

All preprocessors from all configs, will be run before processors
Multiple will be processed in order
Within a config, (pre)processors will be run top to bottom

Pay attention to “chain” mode (pre)processors (output of one (pre)processor becomes input of next) vs. “override” mode (last instance takes precedence)

# config 1
preprocessors:
  - name: extract_dialogue_parentheticals
processors:
  - name: text_substitution
  - name: capitalization_transform_processor

# config 2
preprocessors:
  - name: speaker_merge_preprocessor
processors:
  - name: pattern_replace_processor

# Resultant processing pipeline ordering:
# extract_dialogue_parentheticals -> speaker_merge_preprocessor -> text_substitution ->
#  capitalization_transform_processor -> pattern_replace_processor

Validate Configuration

# Test text processor configuration
uv run sts-apply-text-processors-json script.json \
  --text-processor-configs test_config.yaml

Fix Syntax Errors
- Check YAML indentation
- Verify field names match exactly
- Ensure required fields are present (check log output)

Prevention:

Start with default configuration
Add custom processors incrementally
Test with small examples first

Problem: Unexpected cache behavior.

Symptoms:

Audio not being reused
Cache files overwriting each other
Missing cache files

Solutions:

Verify Cache Naming

# Cache filename structure:
# [original_hash]~~[processed_hash]~~[provider_id]~~[speaker_id].mp3

Check File Paths

# Ensure cache directory exists
ls -la output/[script]/cache/

Clear Problematic Cache

# Remove specific cache files
rm output/[script]/cache/problematic_*.mp3
# Or clear all cache
rm -rf output/[script]/cache/

Prevention:

Avoid modifying text processors / parser between runs of a screenplay
Maintain separate cache directories for different versions

8. ElevenLabs-Specific Issues

Problem: ElevenLabs voice management errors.

Symptoms:

“Voice not found in registry” errors
30 voice limit exceeded
Monthly add/remove quota reached

Solutions:

Use Public Library Voices

# Only use public library voice IDs
SPEAKER:
  provider: elevenlabs
  voice_id: ErXwobaYiN019PkySvjV  # Public library ID

Monitor Voice Usage
- Provider automatically manages 30 voice limit
- Check if monthly quota is exceeded (check log file)

Alternative Approach

# If ElevenLabs issues persist, switch provider temporarily
uv run sts-generate-audio backup_config.yaml

Prevention:

Minimize voice changes during development
Use recommended voice tags (narrative & story, conversational)
Plan voice allocation before large projects; try to reuse same 30 voices, as going above this will result in voice swapping from the library

9. Parser Issues

Note: Screenplay parsing is currently fragile. It works best with movie screenplays with “standard” formatting. Support of scanned in / OCR’d scripts is currently poor. Additional configuration, and better handling of edge-cases, is planned for a future release

Problem: Screenplay parsing errors.

Symptoms:

Incorrect speaker attribution
Merged dialogues
Missing text chunks

Solutions:

Manual Text Extraction

# Extract to text first for manual editing
uv run sts-parse-screenplay script.pdf --text-only
# Edit text file to remove headers/footers
# Then parse cleaned text
uv run sts-parse-screenplay cleaned_script.txt

Check Parser Output

# Analyze parsed structure
uv run sts-analyze-json script.json

Validate any Custom Parser Changes

# When making changes to the parser, show differences in output between
# parser version used to originally generate script.json and current parser logic
uv run sts-parse-regression-check-json script.json

Prevention:

Clean PDF before parsing
Verify screenplay formatting
Review parsed output before audio generation

10. Network Connectivity Issues

Problem: Network errors during API calls.

Symptoms:

Connection timeout errors
Intermittent failures
SSL/TLS errors

Solutions:

Retry with Backoff
- System automatically retries failed requests
- Check network stability

Test Connectivity

# Test basic connectivity to each provider
curl https://api.openai.com/v1/models
curl https://api.elevenlabs.io/v1/voices

Configure Timeouts
- Network issues are handled automatically
- Consider VPN if regional restrictions apply

Prevention:

Stable internet connection
Use --populate-cache run mode to ensure all files downloaded before generation
Use local cache when possible

11. LLM Voice Casting Issues

Problem: Issues with LLM-assisted voice casting workflow.

Symptoms:

LLM returns invalid YAML
Missing speakers in LLM output
Configuration validation errors
Voice library IDs not recognized

Solutions:

For Character Notes Generation:

Try a different LLM
- Certain LLM providers / models struggle with the task of adding .yaml comments while leaving the rest of the structure intact
- Claude Sonnet and Gemini Pro seem to work well

Generate Proper Casting Prompt

# Ensure prompt includes current configuration
uv run sts-generate-character-notes-prompt \
  source_screenplays/script.pdf \
  input/script/script_voice_config.yaml

Validate LLM Output

# Check for structural issues
uv run sts-tts-provider-yaml validate input/script/script.json \
  input/script/script_voice_config.yaml

# Strict validation for provider fields
uv run sts-tts-provider-yaml validate input/script/script.json \
  input/script/script_voice_config.yaml --strict

For Voice Library Casting:

Ensure Character Notes Exist
- Voice library casting works best when character descriptions are present as YAML comments
- Either use character notes generation first, or manually add notes

Generate Voice Library Casting Prompt

# Include all providers you want to cast from
uv run sts-generate-voice-library-casting-prompt \
  input/script/script_voice_config.yaml \
  openai elevenlabs cartesia

Validate Voice Library IDs
- LLM must use valid sts_id values from the voice libraries
- Check that returned IDs exist in the provider’s voice library
- Use --strict validation to catch invalid voice configurations
Common Voice Library Casting Issues
- Invalid sts_id: LLM may invent voice IDs not in the library
- Missing sts_id: LLM may forget to add the sts_id field
- Wrong provider: LLM may assign voices from wrong provider’s library
- Overwriting config: LLM may remove existing provider-specific fields

Prevention:

Use the two-step workflow: character notes first, then voice library casting
For privacy-conscious workflows, manually add character notes instead of using LLM
Always validate configuration before proceeding with audio generation
Keep backup of working configurations
Try a reasoning LLM model if voice casting is producing incorrect / sub-par results

Debugging Tools

Command Line Tools

Standalone Speech Testing

# Test individual voice configurations
uv run sts-generate-standalone-speech openai --voice echo "Test text"

Dry Run Validation

# Validate configuration without generation
uv run sts-generate-audio script.json config.yaml --dry-run

Configuration Validation

# Check voice configuration against script
uv run sts-tts-provider-yaml validate script.json config.yaml

# Strict validation including provider fields
uv run sts-tts-provider-yaml validate script.json config.yaml --strict

Processor Testing

# Test text processors independently
uv run sts-apply-text-processors-json script.json \
  --text-processor-configs test_config.yaml \
  --output-path debug_output.json

Parser Regression Testing

# When making changes to the parser, show differences in output between
# parser version used to originally generate script.json and current parser logic
uv run sts-parse-regression-check-json script.json

Voice Casting Utilities

# Generate LLM prompt for voice casting
uv run sts-generate-character-notes-prompt script.pdf config.yaml

# Copy any file to clipboard
uv run sts-copy-to-clipboard file.txt

Log Analysis

Check Detailed Logs

# View recent logs
tail -f output/[script]/logs/[run mode]_log_YYYYMMDD_HHMMSS.txt

Filter Errors

# Find errors in logs
grep -i error output/[script]/logs/[run mode]_log_*.txt
grep -i warning output/[script]/logs/[run mode]_log_*.txt

Getting Help

Information to Include

When reporting issues, include:

Full error message
Command used
Log file from output/[script]/logs
Configuration files (tts config, any additional processor configs, dialogue chunk .json)
System information (OS, Python version)
UV version: uv --version

Best Practices for Avoiding Issues

Incremental Development
- Test with small scripts first
- Build up to full-length projects
- Use --dry-run frequently
Configuration Management
- Keep backup configurations
- Version control YAML files
- Document custom changes
- Use sts-tts-provider-yaml validate to check configurations
Resource Management
- Monitor disk space
- Use appropriate batch sizes
- Clean up old files regularly
Quality Assurance
- Validate dialogue .json with sts-analyze-json
- Use --populate-cache during audio generation to ensure all files downloaded without issue
- Use --check-silence during audio generation
- Use sts-generate-standalone-speech to test new voices and TTS providers
- Validate voice configurations with sts-tts-provider-yaml validate
Error Prevention
- Set up API keys properly
- Follow naming conventions
- Use provided templates
- Validate configurations before audio generation
LLM-Assisted Workflows
- Use sts-generate-character-notes-prompt for consistent prompts
- Always validate LLM output with sts-tts-provider-yaml validate
- Keep backup configurations before making LLM-suggested changes

Troubleshooting Guide

Common Issues

1. Silent Audio Detection

2. Rate Limiting

3. API Key Issues

4. Voice Configuration Errors

5. Memory and Disk Space Issues

6. Text Processor Configuration Issues

7. Cache-Related Issues

8. ElevenLabs-Specific Issues

9. Parser Issues

10. Network Connectivity Issues

11. LLM Voice Casting Issues

For Character Notes Generation:

For Voice Library Casting:

Debugging Tools

Command Line Tools

Log Analysis

Getting Help

Information to Include

Best Practices for Avoiding Issues