🎤 F5-TTS Voice Cloning and 🔬 Denoising Process Visualization

Clone any voice with just 5-30 seconds of reference audio and see how noise transforms into speech step by step.

Developed by Noel Triguero. Model by SWivid


See how the model transforms pure noise into clean audio step by step. The F5-TTS model uses 32 "denoising" steps to generate the final audio.

Input

Intermediate Denoising Steps

0 4


💡 Tips for Better Results

Clean audio: No background noise, music or echo
Duration: 5-30 seconds is ideal
Exact transcription: The transcription must match the audio exactly
Clear speech: Constant volume and clear pronunciation
Language: Reference audio and text should be in english or chinese


🔧 Technical Information

Model: F5-TTS (Flow Matching Text-to-Speech)
Vocoder: Vocos
Device: CPU (may take a while...)