🎤 ChatterboxTTS - Dhivehi Text-to-Speech with Voice Cloning

Generate natural-sounding Dhivehi speech with voice cloning capabilities.

Quick Examples:

Reference Audio:

0 5
0.01 1
0 5
0 9999
Model

Select TTS model

Device

Select computation device

Note: This fine-tune is minimal, so some words may drop or sentences might not complete perfectly. You can experiment with the Advanced Settings to find what works best for your reference audio and to reduce any output issues. This Space uses ZeroGPU for processing, so if your text is long, the GPU might be released before completion, which could cause a timeout. For longer inputs, switch to CPU mode from the Advanced Settings and wait for it to finish. It will run a bit slower, but it should still complete reliably.

Examples

Click any example below to load pre-configured settings:

Preset Configurations
Text to Convert Reference Voice Audio (optional - for voice cloning) Exaggeration Temperature CFG Weight Seed Device

General Use (TTS and Voice Agents):

  • The default settings (exaggeration=0.5, cfg=0.5) work well for most prompts.
  • If the reference speaker has a fast speaking style, lowering cfg to around 0.3 can improve pacing.

Expressive or Dramatic Speech:

  • Try lower cfg values (e.g. ~0.3) and increase exaggeration to around 0.7 or higher.
  • Higher exaggeration tends to speed up speech; reducing cfg helps compensate with slower, more deliberate pacing.

Language Transfer Notes:

  • Ensure that the reference clip matches the specified language tag. Otherwise, language transfer outputs may inherit the accent of the reference clip's language.
  • To mitigate this, set the CFG weight to 0.

Additional Tips:

  • For best voice cloning results, use clear audio with minimal background noise
  • The reference audio should be 3-10 seconds long
  • Use the same seed value for reproducible results