🎤 ChatterboxTTS - Dhivehi Text-to-Speech with Voice Cloning

Generate natural-sounding Dhivehi speech with voice cloning capabilities.

Apply Dhivehi text normalization before TTS generation

Quick Examples:

Reference Audio:

0 5
0.01 1
0 5
0 9999
Model

Select TTS model

Device

Select computation device

Examples

Click any example below to load pre-configured settings:

Preset Configurations
Text to Convert Reference Voice Audio (optional - for voice cloning) Exaggeration Temperature CFG Weight Seed Device Enable Text Normalization

General Use (TTS and Voice Agents):

  • The default settings (exaggeration=0.5, cfg=0.5) work well for most prompts.
  • If the reference speaker has a fast speaking style, lowering cfg to around 0.3 can improve pacing.

Expressive or Dramatic Speech:

  • Try lower cfg values (e.g. ~0.3) and increase exaggeration to around 0.7 or higher.
  • Higher exaggeration tends to speed up speech; reducing cfg helps compensate with slower, more deliberate pacing.

Language Transfer Notes:

  • Ensure that the reference clip matches the specified language tag. Otherwise, language transfer outputs may inherit the accent of the reference clip's language.
  • To mitigate this, set the CFG weight to 0.

Additional Tips:

  • For best voice cloning results, use clear audio with minimal background noise
  • The reference audio should be 3-10 seconds long
  • Use the same seed value for reproducible results