Spaces:

WSYBYT
/

ybtts

Running

App Files Files Community

ybtts / README.md

masbudjj

Update README.md - Ultimate TTS with 900+ voices (#17)

115ad94 verified about 2 months ago

preview code

raw

history blame contribute delete

8.89 kB

metadata

title: Ultimate TTS Studio - 900+ Premium Voices
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: static
pinned: true
license: apache-2.0

🎙️ Ultimate TTS Studio

900+ Premium Voices from 3 World-Class TTS Engines - All Running in Your Browser!

✨ Features

🎯 3 Premium TTS Engines

🎯 Piper TTS - 904 voices across 50+ languages
- High-quality multilingual support
- Multiple quality levels (High/Medium/Low)
- 3-5x realtime generation speed
✨ Kokoro TTS - 21 expressive voices (Highest Quality)
- 24kHz studio-quality audio
- American & British accents
- Most natural & expressive
⚡ Kitten TTS - 8 voices (Fastest)
- Only 24MB model size
- Lightning-fast generation
- Perfect for quick tasks

🚀 Key Capabilities

✅ 900+ Professional Voices - Choose from massive variety
✅ 50+ Languages - Speak in any language with Piper
✅ Unlimited Text Length - Automatic smart chunking
✅ WebGPU Acceleration - Hardware-accelerated when available
✅ Zero Server Cost - 100% client-side processing
✅ Offline Capable - Works after models cached
✅ Privacy First - No data leaves your browser
✅ Professional Quality - Up to 24kHz audio output

🎮 How to Use

1. Select Your Engine

For Maximum Variety: Choose Piper TTS

904 voices across 50+ languages
Select quality level (High/Medium/Low)
Pick language and accent

For Best Quality: Choose Kokoro TTS

21 expressive voices
Studio-quality 24kHz audio
Perfect for audiobooks & narration

For Speed: Choose Kitten TTS

8 fast voices
Lightweight model (24MB)
Quick generation

2. Configure Voice

Piper Options:

Quality: High (22kHz) / Medium (16kHz) / Low (Fast)
Languages: English (US/GB), Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, + 40 more!
Top Voices: Lessac, Ryan (US) | Cori, Alan (GB)

Kokoro Options:

American: Bella, Nicole, Sarah, Sky, Adam, Michael
British: Emma, Isabella, George, Lewis

Kitten Options:

8 voices (Voice 0-7) with different characteristics

3. Enter Text & Generate

Type or paste your text (unlimited length)
Adjust speed if needed (0.5x - 2.0x)
Click "🎤 Generate Speech"
Wait for generation (watch progress bar)
Play audio or download as WAV

🌐 Supported Languages

Piper TTS - 50+ Languages:

Major Languages:

🇺🇸 English (US) - 20+ voices
🇬🇧 English (UK) - 15+ voices
🇪🇸 Spanish - 30+ voices
🇫🇷 French - 25+ voices
🇩🇪 German - 20+ voices
🇮🇹 Italian - 15+ voices
🇵🇹 Portuguese - 10+ voices
🇨🇳 Chinese - 10+ voices
🇯🇵 Japanese - 5+ voices
🇰🇷 Korean - 5+ voices

Plus: Dutch, Russian, Polish, Turkish, Arabic, Hindi, Vietnamese, Thai, and many more!

📊 Engine Comparison

Feature	Piper	Kokoro	Kitten
Voices	904	21	8
Quality	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Speed	Medium	Medium	Fast
Model Size	~50MB	~80MB	~24MB
Languages	50+	English	English
Sample Rate	16-22kHz	24kHz	16kHz
Best For	Variety	Quality	Speed

🎯 Use Cases

Content Creation

🎬 Video voiceovers & narration
📚 Audiobook production
🎙️ Podcast intros/outros
📺 YouTube tutorials

Accessibility

👁️ Screen reader alternatives
📖 Reading assistance
🌍 Language learning
📱 Audio content for visually impaired

Development

🤖 Voice UI prototyping
🎮 Game character voices
📞 IVR system testing
💬 Chatbot voice responses

🔧 Technical Details

Technology Stack

Frontend: Pure HTML5 + JavaScript (ES6+)
TTS Library: onnx-tts-web
Runtime: ONNX Runtime Web
Acceleration: WebGPU / WebAssembly
Audio: Web Audio API

Model Sources

Browser Requirements

Minimum: Chrome 90+ / Firefox 88+ / Safari 14+ / Edge 90+
Recommended: Latest Chrome/Edge with WebGPU enabled
Features Required: WebAssembly, Web Audio API
Optional: WebGPU for acceleration

Performance

Model Loading: 5-15 seconds (first time only, then cached)
Generation Speed: 2-5 seconds per 200 characters
Real-time Factor: 3-10x (depending on hardware & engine)
Memory Usage: ~200-500MB (with models loaded)

💡 Performance Tips

For Best Quality:

Use Kokoro TTS for English content
Select High Quality in Piper settings
Use well-punctuated text
Keep sentences moderate length

For Best Speed:

Use Kitten TTS for quick tasks
Select Low Quality in Piper
Enable WebGPU in browser settings
Use shorter text inputs

For Most Options:

Use Piper TTS for language variety
Explore different accents/regions
Compare quality levels
Try multiple voices for same language

🎬 Quick Start Examples

Example 1: Professional Audiobook

Engine: Kokoro TTS
Voice: Bella (American Female)
Speed: 0.95x
Quality: 24kHz
Text: Your book chapter...

Example 2: Tutorial Narration

Engine: Piper TTS
Voice: Lessac (US, High Quality)
Speed: 1.0x
Quality: 22kHz
Text: Your tutorial script...

Example 3: Quick Announcement

Engine: Kitten TTS
Voice: Voice 4 (Clear)
Speed: 1.1x
Text: Your announcement...

Example 4: Spanish Content

Engine: Piper TTS
Voice: es_ES (Spain Spanish)
Speed: 1.0x
Quality: High
Text: Su texto en español...

🐛 Troubleshooting

Model Loading Issues

Problem: "ERROR initializing" message

Solutions:

Check internet connection
Wait for download to complete
Try different quality level
Clear browser cache
Refresh page

No Audio Output

Problem: Player appears but no sound

Solutions:

Check browser audio permissions
Verify volume settings
Try different voice/engine
Check browser console (F12)
Test with different browser

Slow Performance

Problem: Generation takes too long

Solutions:

Switch to Kitten TTS for speed
Lower quality in Piper settings
Enable WebGPU (chrome://flags)
Update browser to latest version
Close other tabs/applications

WebGPU Not Available

Problem: Shows "WASM" instead of "WebGPU"

Solutions:

Update browser to latest version
Enable in chrome://flags → "WebGPU"
Check GPU driver updates
WebGPU optional, WASM works fine

🎯 Voice Recommendations

English (US) - Natural:

Lessac (Piper) - Professional, clear
Ryan (Piper) - Authoritative, deep
Bella (Kokoro) - Elegant, sophisticated

English (GB) - British:

Cori (Piper) - Refined, professional
Emma (Kokoro) - Elegant, polished
George (Kokoro) - Commanding, distinguished

Spanish:

es_ES (Piper) - Spain Spanish, multiple voices
es_MX (Piper) - Mexican Spanish

French:

fr_FR (Piper) - France French, multiple voices

German:

de_DE (Piper) - German, multiple voices

📝 Privacy & Security

✅ 100% Client-Side - All processing in your browser ✅ No Server Upload - Text never leaves your device ✅ No Data Collection - Zero analytics or tracking ✅ No Account Required - Use instantly, no signup ✅ Offline Capable - Works without internet (after cache)

📜 License & Credits

License

This project is released under the Apache 2.0 License.

Credits & Acknowledgments

Libraries & Tools:

onnx-tts-web by @therealtimex
Piper TTS by Rhasspy
ONNX Runtime by Microsoft

Models:

Piper TTS models by Rhasspy team
Kokoro TTS by community contributors
Kitten TTS by community contributors

Inspiration:

TTS Studio by @clowerweb

🚀 Future Enhancements

Planned features:

More TTS engines (Coqui, VITS)
Voice cloning with SpeechT5
SSML markup support
Batch processing
MP3/OGG export
Voice mixing/blending
Real-time streaming
Pronunciation dictionary

🤝 Contributing

Found a bug or have a suggestion? Please open an issue or submit a pull request!

🌟 Star This Space!

If you find this useful, please give it a ⭐ star on HuggingFace!

Made with ❤️ for the open-source community

Enjoy creating amazing voice content! 🎙️