Upload a speech clip as a reference for timbre, upload the pre-extracted latent file, input the target text, and receive the cloned voice. Tip: a generation process should be within 120s (check if your input text are too long). Please use the system gently, as excessive load or languages other than English or Chinese may cause crashes and disrupt access for other users.