MegaTTS3

Upload a speech clip as a reference for timbre, upload the pre-extracted latent file, input the target text, and receive the cloned voice. Tip: a generation process should be within 120s (check if your input text are too long). Please use the system gently, as excessive load or languages other than English or Chinese may cause crashes and disrupt access for other users.

Examples

Upload .wav	Upload .npy	inp_text	infer timestep	Intelligibility Weight	Similarity Weight