Troubleshooting Guide
This guide helps resolve common issues when using Script to Speech.
Common Issues
1. Silent Audio Detection
Problem: Generated audio clips are silent or nearly silent. This seems to occur most frequently with short / single-word clips in TTS providers other than ElevenLabs
Symptoms:
- Warnings about silent clips in logs
- Audio files with low dBFS readings
- Partially silent audiobooks
Solutions:
-
Identify Silent Clips
uv run sts-generate-audio script.json config.yaml \ --populate-cache --check-silence -
Generate Replacements Running
sts-generate-audiowith the--check-silenceflag will generate pre-populatedsts-generate-standalone-speechcommands which just need to be copy and pasted. See the Standalone Speech Generation Guide for more details.# Copy reported text exactly # -v flag controls how many generations versions of each clip to generate uv run sts-generate-standalone-speech openai --voice echo \ "silent text 1" "silent text 2" -v 3 -
Manual Rename
# Match exact cache filename from report mv standalone_speech/generated_file.mp3 \ standalone_speech/[original_cache_filename].mp3 -
Apply Overrides This will move any cache-matching files from standalone_speech to the screenplay’s cache directory
uv run sts-generate-audio script.json config.yaml \ --populate-cache --cache-overrides
Prevention:
- Use
--check-silenceduring initial generation - Consider alternative TTS providers for troublesome text (ElevenLabs rarely seems to have issues with silent clips)
2. Rate Limiting
Problem: API requests are being rate limited.
Symptoms:
- Rate limit error messages
- Delayed audio generation
- Automatic retries and backoff
Solutions:
-
Automatic Handling
- The system automatically retries with exponential backoff
- Each provider has separate rate limit handling
- Future versions of Script to Speech will allow for more manual control of thread limits and backoff behavior
-
**Reduce Global Concurrent Downloads Note that this will limit the overall (cross-provider) maximum concurrent downloads
uv run sts-generate-audio --max-workers 5 -
Distribute Across TTS Providers
# Split voices across multiple TTS providers NARRATOR: provider: openai MAIN_CHARACTER: provider: elevenlabs SIDE_CHARACTER: provider: zonos
Prevention:
- Use multiple TTS providers
- Monitor provider-specific limits
3. API Key Issues
Problem: API authentication failures.
Symptoms:
- “API key not set” errors
- Authentication failed messages
- 401 Unauthorized errors
Solutions:
-
Check Environment Variables
# Verify keys are set echo $OPENAI_API_KEY echo $ELEVEN_API_KEY echo $CARTESIA_API_KEY echo $MINIMAX_API_KEY echo $MINIMAX_GROUP_ID echo $ZONOS_API_KEY -
Set Keys Properly
# In terminal session export OPENAI_API_KEY="your-key-here" # Or in .env file OPENAI_API_KEY=your-key-here ELEVEN_API_KEY=your-key-here -
Validate Key Format
- OpenAI: Starts with
sk- - ElevenLabs: 32-character string
- Cartesia: Starts with
sk_car_ - Minimax (API key): Bearer token format, long string
- Minimax (Group ID): String of digits
- Zonos: Starts with
zsk-
- OpenAI: Starts with
Prevention:
- Use
.envfile for persistent keys - Check API dashboard for key validity
4. Voice Configuration Errors
Problem: Voice not found or invalid configuration.
Symptoms:
- Voice ID errors
- Missing provider configuration
- Invalid voice parameters
- Required fields missing
Solutions:
-
Validate Configuration
# Check for missing/extra/duplicate speakers uv run sts-tts-provider-yaml validate script.json config.yaml # Strict validation including provider field validation uv run sts-tts-provider-yaml validate script.json config.yaml --strict -
OpenAI
- Ensure a valid voice option is being used
# Valid voice options voices: [alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer]
- ElevenLabs
- Check voice ID is from public voice library (https://elevenlabs.io/app/voice-library) and not the “My voices” library (https://elevenlabs.io/app/voice-lab)
- Search for voice ID in the public voice library to make sure it still exists. Voices are some times removed from ElevenLabs
# Voice ID format: 21-character string voice_id: ErXwobaYiN019PkySvjV # ID must be from public library
-
Minimax
# Validate voice_id is one of the valid voice IDs voice_id: Casual_Guy # Must be one of system voices # If using voice_mix, ensure proper structure voice_mix: - voice_id: Casual_Guy # Must be valid voice ID weight: 70 # Must be 1-100 - voice_id: Deep_Voice_Man weight: 30 -
Zonos
# Validate voice is one of the default_voice_name from zonos documentation default_voice_name: american_male # Must be one of 9 default voices
Prevention:
- Use
sts-tts-provider-yaml generatefor templates - Validate configuration with
--dry-runrun mode and / orsts-tts-provider-yaml validate - Keep backup of working configurations
5. Memory and Disk Space Issues
Problem: System running out of memory or disk space.
Symptoms:
- Generation stops unexpectedly
- System slowdown
- “No space left on device” errors
Solutions:
-
Reduce Batch Size For memory constrained systems, reducing the concatenation batch size can help performance when combining audio segments
uv run sts-generate-audio --concat-batch-size 150 -
Reduce Maximum Concurrent Downloads Reducing the amount of concurrent downloads can help reduce memory usage
uv run sts-generate-audio --max-workers 5 -
Process in Segments
# Process chapters separately uv run sts-generate-audio chapter1.json --populate-cache uv run sts-generate-audio chapter2.json --populate-cache # Manually combine later -
Clean Unnecessary Files
# Remove temporary files rm -rf output/*/logs/old_logs_*.txt rm -rf standalone_speech/unused_*.mp3
Prevention:
- Monitor disk space before large projects
- Use
--populate-cachefor gradual processing - Consider processing on systems with adequate resources
6. Text Processor Configuration Issues
Problem: Text processors not working as expected.
Symptoms:
- Text not being transformed
- Wrong text processor precedence
- Validation errors
Solutions:
- Check Text Processor Order
-
All preprocessors from all configs, will be run before processors
-
Multiple will be processed in order
-
Within a config, (pre)processors will be run top to bottom
-
Pay attention to “chain” mode (pre)processors (output of one (pre)processor becomes input of next) vs. “override” mode (last instance takes precedence)
# config 1 preprocessors: - name: extract_dialogue_parentheticals processors: - name: text_substitution - name: capitalization_transform_processor# config 2 preprocessors: - name: speaker_merge_preprocessor processors: - name: pattern_replace_processor# Resultant processing pipeline ordering: # extract_dialogue_parentheticals -> speaker_merge_preprocessor -> text_substitution -> # capitalization_transform_processor -> pattern_replace_processor
-
Validate Configuration
# Test text processor configuration uv run sts-apply-text-processors-json script.json \ --text-processor-configs test_config.yaml -
Fix Syntax Errors
- Check YAML indentation
- Verify field names match exactly
- Ensure required fields are present (check log output)
Prevention:
- Start with default configuration
- Add custom processors incrementally
- Test with small examples first
7. Cache-Related Issues
Problem: Unexpected cache behavior.
Symptoms:
- Audio not being reused
- Cache files overwriting each other
- Missing cache files
Solutions:
-
Verify Cache Naming
# Cache filename structure: # [original_hash]~~[processed_hash]~~[provider_id]~~[speaker_id].mp3 -
Check File Paths
# Ensure cache directory exists ls -la output/[script]/cache/ -
Clear Problematic Cache
# Remove specific cache files rm output/[script]/cache/problematic_*.mp3 # Or clear all cache rm -rf output/[script]/cache/
Prevention:
- Avoid modifying text processors / parser between runs of a screenplay
- Maintain separate cache directories for different versions
8. ElevenLabs-Specific Issues
Problem: ElevenLabs voice management errors.
Symptoms:
- “Voice not found in registry” errors
- 30 voice limit exceeded
- Monthly add/remove quota reached
Solutions:
-
Use Public Library Voices
# Only use public library voice IDs SPEAKER: provider: elevenlabs voice_id: ErXwobaYiN019PkySvjV # Public library ID -
Monitor Voice Usage
- Provider automatically manages 30 voice limit
- Check if monthly quota is exceeded (check log file)
-
Alternative Approach
# If ElevenLabs issues persist, switch provider temporarily uv run sts-generate-audio backup_config.yaml
Prevention:
- Minimize voice changes during development
- Use recommended voice tags (narrative & story, conversational)
- Plan voice allocation before large projects; try to reuse same 30 voices, as going above this will result in voice swapping from the library
9. Parser Issues
Note: Screenplay parsing is currently fragile. It works best with movie screenplays with “standard” formatting. Support of scanned in / OCR’d scripts is currently poor. Additional configuration, and better handling of edge-cases, is planned for a future release
Problem: Screenplay parsing errors.
Symptoms:
- Incorrect speaker attribution
- Merged dialogues
- Missing text chunks
Solutions:
-
Manual Text Extraction
# Extract to text first for manual editing uv run sts-parse-screenplay script.pdf --text-only # Edit text file to remove headers/footers # Then parse cleaned text uv run sts-parse-screenplay cleaned_script.txt -
Check Parser Output
# Analyze parsed structure uv run sts-analyze-json script.json -
Validate any Custom Parser Changes
# When making changes to the parser, show differences in output between # parser version used to originally generate script.json and current parser logic uv run sts-parse-regression-check-json script.json
Prevention:
- Clean PDF before parsing
- Verify screenplay formatting
- Review parsed output before audio generation
10. Network Connectivity Issues
Problem: Network errors during API calls.
Symptoms:
- Connection timeout errors
- Intermittent failures
- SSL/TLS errors
Solutions:
-
Retry with Backoff
- System automatically retries failed requests
- Check network stability
-
Test Connectivity
# Test basic connectivity to each provider curl https://api.openai.com/v1/models curl https://api.elevenlabs.io/v1/voices -
Configure Timeouts
- Network issues are handled automatically
- Consider VPN if regional restrictions apply
Prevention:
- Stable internet connection
- Use
--populate-cacherun mode to ensure all files downloaded before generation - Use local cache when possible
11. LLM Voice Casting Issues
Problem: Issues with LLM-assisted voice casting workflow.
Symptoms:
- LLM returns invalid YAML
- Missing speakers in LLM output
- Configuration validation errors
- Voice library IDs not recognized
Solutions:
For Character Notes Generation:
-
Try a different LLM
- Certain LLM providers / models struggle with the task of adding .yaml comments while leaving the rest of the structure intact
- Claude Sonnet and Gemini Pro seem to work well
-
Generate Proper Casting Prompt
# Ensure prompt includes current configuration uv run sts-generate-character-notes-prompt \ source_screenplays/script.pdf \ input/script/script_voice_config.yaml -
Validate LLM Output
# Check for structural issues uv run sts-tts-provider-yaml validate input/script/script.json \ input/script/script_voice_config.yaml # Strict validation for provider fields uv run sts-tts-provider-yaml validate input/script/script.json \ input/script/script_voice_config.yaml --strict
For Voice Library Casting:
-
Ensure Character Notes Exist
- Voice library casting works best when character descriptions are present as YAML comments
- Either use character notes generation first, or manually add notes
-
Generate Voice Library Casting Prompt
# Include all providers you want to cast from uv run sts-generate-voice-library-casting-prompt \ input/script/script_voice_config.yaml \ openai elevenlabs cartesia -
Validate Voice Library IDs
- LLM must use valid
sts_idvalues from the voice libraries - Check that returned IDs exist in the provider’s voice library
- Use
--strictvalidation to catch invalid voice configurations
- LLM must use valid
-
Common Voice Library Casting Issues
- Invalid sts_id: LLM may invent voice IDs not in the library
- Missing sts_id: LLM may forget to add the sts_id field
- Wrong provider: LLM may assign voices from wrong provider’s library
- Overwriting config: LLM may remove existing provider-specific fields
Prevention:
- Use the two-step workflow: character notes first, then voice library casting
- For privacy-conscious workflows, manually add character notes instead of using LLM
- Always validate configuration before proceeding with audio generation
- Keep backup of working configurations
- Try a reasoning LLM model if voice casting is producing incorrect / sub-par results
Debugging Tools
Command Line Tools
-
Standalone Speech Testing
# Test individual voice configurations uv run sts-generate-standalone-speech openai --voice echo "Test text" -
Dry Run Validation
# Validate configuration without generation uv run sts-generate-audio script.json config.yaml --dry-run -
Configuration Validation
# Check voice configuration against script uv run sts-tts-provider-yaml validate script.json config.yaml # Strict validation including provider fields uv run sts-tts-provider-yaml validate script.json config.yaml --strict -
Processor Testing
# Test text processors independently uv run sts-apply-text-processors-json script.json \ --text-processor-configs test_config.yaml \ --output-path debug_output.json -
Parser Regression Testing
# When making changes to the parser, show differences in output between # parser version used to originally generate script.json and current parser logic uv run sts-parse-regression-check-json script.json -
Voice Casting Utilities
# Generate LLM prompt for voice casting uv run sts-generate-character-notes-prompt script.pdf config.yaml # Copy any file to clipboard uv run sts-copy-to-clipboard file.txt
Log Analysis
-
Check Detailed Logs
# View recent logs tail -f output/[script]/logs/[run mode]_log_YYYYMMDD_HHMMSS.txt -
Filter Errors
# Find errors in logs grep -i error output/[script]/logs/[run mode]_log_*.txt grep -i warning output/[script]/logs/[run mode]_log_*.txt
Getting Help
Information to Include
When reporting issues, include:
- Full error message
- Command used
- Log file from output/[script]/logs
- Configuration files (tts config, any additional processor configs, dialogue chunk .json)
- System information (OS, Python version)
- UV version:
uv --version
Best Practices for Avoiding Issues
-
Incremental Development
- Test with small scripts first
- Build up to full-length projects
- Use
--dry-runfrequently
-
Configuration Management
- Keep backup configurations
- Version control YAML files
- Document custom changes
- Use
sts-tts-provider-yaml validateto check configurations
-
Resource Management
- Monitor disk space
- Use appropriate batch sizes
- Clean up old files regularly
-
Quality Assurance
- Validate dialogue .json with
sts-analyze-json - Use
--populate-cacheduring audio generation to ensure all files downloaded without issue - Use
--check-silenceduring audio generation - Use
sts-generate-standalone-speechto test new voices and TTS providers - Validate voice configurations with
sts-tts-provider-yaml validate
- Validate dialogue .json with
-
Error Prevention
- Set up API keys properly
- Follow naming conventions
- Use provided templates
- Validate configurations before audio generation
-
LLM-Assisted Workflows
- Use
sts-generate-character-notes-promptfor consistent prompts - Always validate LLM output with
sts-tts-provider-yaml validate - Keep backup configurations before making LLM-suggested changes
- Use