From 3e79eca190558daf37ed1a0442765186bdeb190e Mon Sep 17 00:00:00 2001 From: Sayantan Biswas Date: Fri, 10 Nov 2023 19:52:59 +0530 Subject: [PATCH] Update tortoise_v2_examples.html solved issue #677 Removed the raw github links of the audio files and directly linking the audio files internally. --- tortoise_v2_examples.html | 1606 ++++++++++++++++++++++++++++++++++--- 1 file changed, 1493 insertions(+), 113 deletions(-) diff --git a/tortoise_v2_examples.html b/tortoise_v2_examples.html index 1a457d1..0810792 100644 --- a/tortoise_v2_examples.html +++ b/tortoise_v2_examples.html @@ -1,128 +1,1508 @@ -TorToiSe - These words were never spoken. + + + + + TorToiSe - These words were never spoken. + + -

Introduction 🐢

-

TorToiSe is a text-to-speech program built in April 2022 by jbetker@. TorToiSe is open source, with trained model weights -available at https://github.com/neonbjb/tortoise-tts

+

Introduction 🐢

+

TorToiSe is a text-to-speech program built in April 2022 by jbetker@. TorToiSe is open source, with trained model + weights + available at https://github.com/neonbjb/tortoise-tts

-

This page demonstrates some of the results of TorToiSe.

+

This page demonstrates some of the results of TorToiSe.

-

Handpicked results 🐢

-

Following are several particularly good results generated by the model.

+

Handpicked results 🐢

+

Following are several particularly good results generated by the model.

-

Short-form

-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+

Short-form

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-

Long-form

-
+

Long-form

+
-

Comparisons (with the LJSpeech voice): 🐢

-

LJSpeech is a popular dataset used to train small-scale TTS models. TorToiSe is a multi-voice model, following is how -it renders the LJSpeech voice with and without fine-tuning, compared with results for the same text from the popular Tacotron2 -model paired with the Waveglow vocoder.

- - - - - - +

Comparisons (with the LJSpeech voice): 🐢

+

LJSpeech is a popular dataset used to train small-scale TTS models. TorToiSe is a multi-voice model, following is + how + it renders the LJSpeech voice with and without fine-tuning, compared with results for the same text from the + popular Tacotron2 + model paired with the Waveglow vocoder.

+
Tacotron2+WaveglowTorToiSeTorToiSe Finetuned

-




+ + + + + + + + + + + - - + + + + + - -
Tacotron2+WaveglowTorToiSeTorToiSe Finetuned

+






-



+


-

NaturalVoice is a SOTA TTS engine developed by Microsoft Research Asia in May 2022. It features realistic prosody -and end-to-end generation with no need for a vocoder. While not much has actually been released about this model other -than five samples, those samples are quite good and I would consider this the most competitive TTS engine out there -right now.

- - - - - - -
Natural VoiceTorToiSe Finetuned





-

-

It is important to note that it is not actually fair to compare any of these models: Tortoise is a multi-voice probabilistic -model trained on millions of hours of speech with an exceptionally slow inference time. Tacotron and NaturalVoice are efficient, -fast, single-voice models trained on 24 hours of speech. Unfortunately, there isn't much in the way of actually comparable -research to Tortoise.

+
+ + +

NaturalVoice is a SOTA TTS engine developed by Microsoft Research Asia in May 2022. It features realistic prosody + and end-to-end generation with no need for a vocoder. While not much has actually been released about this model + other + than five samples, those samples are quite good and I would consider this the most competitive TTS engine out + there + right now.

+ + + + + + + + + + + + + + + +
Natural VoiceTorToiSe Finetuned





+

+

It is important to note that it is not actually fair to compare any of these models: Tortoise is a multi-voice + probabilistic + model trained on millions of hours of speech with an exceptionally slow inference time. Tacotron and + NaturalVoice are efficient, + fast, single-voice models trained on 24 hours of speech. Unfortunately, there isn't much in the way of actually + comparable + research to Tortoise.

-

All Results 🐢

-

Following are all the results from which the hand-picked results were drawn from. Also included is the reference - audio that the program is trying to mimic. This will give you a better sense of how TorToiSe really performs.

+

All Results 🐢

+

Following are all the results from which the hand-picked results were drawn from. Also included is the reference + audio that the program is trying to mimic. This will give you a better sense of how TorToiSe really performs. +

-

Short-form

- - - - - - - - - - - - - - - - - - - - - -
textangiedanieldeniroemmafreemangeralthallejlawljmyselfpatsnakestomtrain_atkinstrain_dotricetrain_kennardweaverwilliam
reference clip
autoregressive_ml
bengio_it_needs_to_know_what_is_bad
dickinson_stop_for_death
espn_basketball
frost_oar_to_oar
frost_road_not_taken
gatsby_and_so_we_beat_on
harrypotter_differences_of_habit_and_language
i_am_a_language_model
melodie_kao
nyt_covid
real_courage_is_when_you_know_your_licked
rolling_stone_review
spacecraft_interview
tacotron2_sample1
tacotron2_sample2
tacotron2_sample3
tacotron2_sample4
watts_this_is_the_real_secret_of_life
wilde_nowadays_people_know_the_price
+

Short-form

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
textangiedanieldeniroemmafreemangeralthallejlawljmyselfpatsnakestomtrain_atkinstrain_dotricetrain_kennardweaverwilliam
reference clip
autoregressive_ml
bengio_it_needs_to_know_what_is_bad
dickinson_stop_for_death
espn_basketball
frost_oar_to_oar
frost_road_not_taken
gatsby_and_so_we_beat_on
harrypotter_differences_of_habit_and_language
i_am_a_language_model
melodie_kao
nyt_covid
real_courage_is_when_you_know_your_licked
rolling_stone_review
spacecraft_interview
tacotron2_sample1
tacotron2_sample2
tacotron2_sample3
tacotron2_sample4
watts_this_is_the_real_secret_of_life
wilde_nowadays_people_know_the_price
-

Long-form

-Angelina:
-Craig:
-Deniro:
-Emma:
-Freeman:
-Geralt:
-Halle:
-Jlaw:
-LJ:
-Myself:
-Pat:
-Snakes:
-Tom:
-Weaver:
-William:
+

Long-form

+ Angelina:
+ Craig:
+ Deniro:
+ Emma:
+ Freeman:
+ Geralt:
+ Halle:
+ Jlaw:
+ LJ:
+ Myself:
+ Pat:
+ Snakes:
+ Tom:
+ Weaver:
+ William:
-

Prompt Engineering 🐢

-

Tortoise is capable of "prompt-engineering" in that tone and prosody is affected by the emotions inflected in the words -fed to the program. For example, prompting the model with "[I am so angry,] I went to the park and threw a ball" will -result in it outputting "I went to the park and threw the ball" with an angry tone.

+

Prompt Engineering 🐢

+

Tortoise is capable of "prompt-engineering" in that tone and prosody is affected by the emotions inflected in the + words + fed to the program. For example, prompting the model with "[I am so angry,] I went to the park and threw a ball" + will + result in it outputting "I went to the park and threw the ball" with an angry tone.

-

Following are a few examples of different prompts. The effect is subtle, but is definitely there. Many voices are -less effected by this.

+

Following are a few examples of different prompts. The effect is subtle, but is definitely there. Many voices are + less effected by this.

-Angry:
-Sad:
-Happy:
-Scared:
+ Angry:
+ Sad:
+ Happy:
+ Scared:
- + + + \ No newline at end of file