ybtts / README.md
masbudjj's picture
Update README.md - Ultimate TTS with 900+ voices (#17)
115ad94 verified
metadata
title: Ultimate TTS Studio - 900+ Premium Voices
emoji: ๐ŸŽ™๏ธ
colorFrom: blue
colorTo: purple
sdk: static
pinned: true
license: apache-2.0

๐ŸŽ™๏ธ Ultimate TTS Studio

900+ Premium Voices from 3 World-Class TTS Engines - All Running in Your Browser!

โœจ Features

๐ŸŽฏ 3 Premium TTS Engines

  1. ๐ŸŽฏ Piper TTS - 904 voices across 50+ languages

    • High-quality multilingual support
    • Multiple quality levels (High/Medium/Low)
    • 3-5x realtime generation speed
  2. โœจ Kokoro TTS - 21 expressive voices (Highest Quality)

    • 24kHz studio-quality audio
    • American & British accents
    • Most natural & expressive
  3. โšก Kitten TTS - 8 voices (Fastest)

    • Only 24MB model size
    • Lightning-fast generation
    • Perfect for quick tasks

๐Ÿš€ Key Capabilities

  • โœ… 900+ Professional Voices - Choose from massive variety
  • โœ… 50+ Languages - Speak in any language with Piper
  • โœ… Unlimited Text Length - Automatic smart chunking
  • โœ… WebGPU Acceleration - Hardware-accelerated when available
  • โœ… Zero Server Cost - 100% client-side processing
  • โœ… Offline Capable - Works after models cached
  • โœ… Privacy First - No data leaves your browser
  • โœ… Professional Quality - Up to 24kHz audio output

๐ŸŽฎ How to Use

1. Select Your Engine

For Maximum Variety: Choose Piper TTS

  • 904 voices across 50+ languages
  • Select quality level (High/Medium/Low)
  • Pick language and accent

For Best Quality: Choose Kokoro TTS

  • 21 expressive voices
  • Studio-quality 24kHz audio
  • Perfect for audiobooks & narration

For Speed: Choose Kitten TTS

  • 8 fast voices
  • Lightweight model (24MB)
  • Quick generation

2. Configure Voice

Piper Options:

  • Quality: High (22kHz) / Medium (16kHz) / Low (Fast)
  • Languages: English (US/GB), Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, + 40 more!
  • Top Voices: Lessac, Ryan (US) | Cori, Alan (GB)

Kokoro Options:

  • American: Bella, Nicole, Sarah, Sky, Adam, Michael
  • British: Emma, Isabella, George, Lewis

Kitten Options:

  • 8 voices (Voice 0-7) with different characteristics

3. Enter Text & Generate

  1. Type or paste your text (unlimited length)
  2. Adjust speed if needed (0.5x - 2.0x)
  3. Click "๐ŸŽค Generate Speech"
  4. Wait for generation (watch progress bar)
  5. Play audio or download as WAV

๐ŸŒ Supported Languages

Piper TTS - 50+ Languages:

Major Languages:

  • ๐Ÿ‡บ๐Ÿ‡ธ English (US) - 20+ voices
  • ๐Ÿ‡ฌ๐Ÿ‡ง English (UK) - 15+ voices
  • ๐Ÿ‡ช๐Ÿ‡ธ Spanish - 30+ voices
  • ๐Ÿ‡ซ๐Ÿ‡ท French - 25+ voices
  • ๐Ÿ‡ฉ๐Ÿ‡ช German - 20+ voices
  • ๐Ÿ‡ฎ๐Ÿ‡น Italian - 15+ voices
  • ๐Ÿ‡ต๐Ÿ‡น Portuguese - 10+ voices
  • ๐Ÿ‡จ๐Ÿ‡ณ Chinese - 10+ voices
  • ๐Ÿ‡ฏ๐Ÿ‡ต Japanese - 5+ voices
  • ๐Ÿ‡ฐ๐Ÿ‡ท Korean - 5+ voices

Plus: Dutch, Russian, Polish, Turkish, Arabic, Hindi, Vietnamese, Thai, and many more!

๐Ÿ“Š Engine Comparison

Feature Piper Kokoro Kitten
Voices 904 21 8
Quality โญโญโญโญ โญโญโญโญโญ โญโญโญ
Speed Medium Medium Fast
Model Size ~50MB ~80MB ~24MB
Languages 50+ English English
Sample Rate 16-22kHz 24kHz 16kHz
Best For Variety Quality Speed

๐ŸŽฏ Use Cases

Content Creation

  • ๐ŸŽฌ Video voiceovers & narration
  • ๐Ÿ“š Audiobook production
  • ๐ŸŽ™๏ธ Podcast intros/outros
  • ๐Ÿ“บ YouTube tutorials

Accessibility

  • ๐Ÿ‘๏ธ Screen reader alternatives
  • ๐Ÿ“– Reading assistance
  • ๐ŸŒ Language learning
  • ๐Ÿ“ฑ Audio content for visually impaired

Development

  • ๐Ÿค– Voice UI prototyping
  • ๐ŸŽฎ Game character voices
  • ๐Ÿ“ž IVR system testing
  • ๐Ÿ’ฌ Chatbot voice responses

๐Ÿ”ง Technical Details

Technology Stack

  • Frontend: Pure HTML5 + JavaScript (ES6+)
  • TTS Library: onnx-tts-web
  • Runtime: ONNX Runtime Web
  • Acceleration: WebGPU / WebAssembly
  • Audio: Web Audio API

Model Sources

Browser Requirements

  • Minimum: Chrome 90+ / Firefox 88+ / Safari 14+ / Edge 90+
  • Recommended: Latest Chrome/Edge with WebGPU enabled
  • Features Required: WebAssembly, Web Audio API
  • Optional: WebGPU for acceleration

Performance

  • Model Loading: 5-15 seconds (first time only, then cached)
  • Generation Speed: 2-5 seconds per 200 characters
  • Real-time Factor: 3-10x (depending on hardware & engine)
  • Memory Usage: ~200-500MB (with models loaded)

๐Ÿ’ก Performance Tips

For Best Quality:

  1. Use Kokoro TTS for English content
  2. Select High Quality in Piper settings
  3. Use well-punctuated text
  4. Keep sentences moderate length

For Best Speed:

  1. Use Kitten TTS for quick tasks
  2. Select Low Quality in Piper
  3. Enable WebGPU in browser settings
  4. Use shorter text inputs

For Most Options:

  1. Use Piper TTS for language variety
  2. Explore different accents/regions
  3. Compare quality levels
  4. Try multiple voices for same language

๐ŸŽฌ Quick Start Examples

Example 1: Professional Audiobook

Engine: Kokoro TTS
Voice: Bella (American Female)
Speed: 0.95x
Quality: 24kHz
Text: Your book chapter...

Example 2: Tutorial Narration

Engine: Piper TTS
Voice: Lessac (US, High Quality)
Speed: 1.0x
Quality: 22kHz
Text: Your tutorial script...

Example 3: Quick Announcement

Engine: Kitten TTS
Voice: Voice 4 (Clear)
Speed: 1.1x
Text: Your announcement...

Example 4: Spanish Content

Engine: Piper TTS
Voice: es_ES (Spain Spanish)
Speed: 1.0x
Quality: High
Text: Su texto en espaรฑol...

๐Ÿ› Troubleshooting

Model Loading Issues

Problem: "ERROR initializing" message

Solutions:

  • Check internet connection
  • Wait for download to complete
  • Try different quality level
  • Clear browser cache
  • Refresh page

No Audio Output

Problem: Player appears but no sound

Solutions:

  • Check browser audio permissions
  • Verify volume settings
  • Try different voice/engine
  • Check browser console (F12)
  • Test with different browser

Slow Performance

Problem: Generation takes too long

Solutions:

  • Switch to Kitten TTS for speed
  • Lower quality in Piper settings
  • Enable WebGPU (chrome://flags)
  • Update browser to latest version
  • Close other tabs/applications

WebGPU Not Available

Problem: Shows "WASM" instead of "WebGPU"

Solutions:

  • Update browser to latest version
  • Enable in chrome://flags โ†’ "WebGPU"
  • Check GPU driver updates
  • WebGPU optional, WASM works fine

๐ŸŽฏ Voice Recommendations

English (US) - Natural:

  • Lessac (Piper) - Professional, clear
  • Ryan (Piper) - Authoritative, deep
  • Bella (Kokoro) - Elegant, sophisticated

English (GB) - British:

  • Cori (Piper) - Refined, professional
  • Emma (Kokoro) - Elegant, polished
  • George (Kokoro) - Commanding, distinguished

Spanish:

  • es_ES (Piper) - Spain Spanish, multiple voices
  • es_MX (Piper) - Mexican Spanish

French:

  • fr_FR (Piper) - France French, multiple voices

German:

  • de_DE (Piper) - German, multiple voices

๐Ÿ“ Privacy & Security

โœ… 100% Client-Side - All processing in your browser โœ… No Server Upload - Text never leaves your device โœ… No Data Collection - Zero analytics or tracking โœ… No Account Required - Use instantly, no signup โœ… Offline Capable - Works without internet (after cache)

๐Ÿ“œ License & Credits

License

This project is released under the Apache 2.0 License.

Credits & Acknowledgments

Libraries & Tools:

Models:

  • Piper TTS models by Rhasspy team
  • Kokoro TTS by community contributors
  • Kitten TTS by community contributors

Inspiration:

๐Ÿš€ Future Enhancements

Planned features:

  • More TTS engines (Coqui, VITS)
  • Voice cloning with SpeechT5
  • SSML markup support
  • Batch processing
  • MP3/OGG export
  • Voice mixing/blending
  • Real-time streaming
  • Pronunciation dictionary

๐Ÿค Contributing

Found a bug or have a suggestion? Please open an issue or submit a pull request!

๐ŸŒŸ Star This Space!

If you find this useful, please give it a โญ star on HuggingFace!


Made with โค๏ธ for the open-source community

Enjoy creating amazing voice content! ๐ŸŽ™๏ธ