mirror of
https://github.com/neonbjb/tortoise-tts.git
synced 2026-02-01 13:24:25 +01:00
After training a similar model for a different purpose, I realized that this model is faulty: the contrastive loss it uses only pays attention to high-frequency details which do not contribute meaningfully to output quality. I validated this by comparing a no-CVVP output with a baseline using tts-scores and found no differences. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| arch_util.py | ||
| autoregressive.py | ||
| classifier.py | ||
| clvp.py | ||
| diffusion_decoder.py | ||
| random_latent_generator.py | ||
| transformer.py | ||
| vocoder.py | ||
| xtransformers.py | ||