@@ -82,6 +82,9 @@ <h2>Contents</h2>
8282 < li > < a href ="#flow-vae-vs-vae-comparison "> Flow-VAE vs. VAE Comparison</ a > </ li >
8383 < li > < a href ="#professional-voice-clone-pvc-demonstration "> Professional Voice Clone (PVC) Demonstration</ a > </ li >
8484 < li > < a href ="#emotion-control-demonstration "> Emotion Control Demonstration</ a > </ li >
85+ < li > < a href ="#text-prompted-voice-generation-demonstration "> Text-Prompted Voice Generation Demonstration</ a > </ li >
86+ < li > < a href ="#comparison-of-voice-naturalness "> Comparison of voice
87+ naturalness with the previous generation product</ a > </ li >
8588 </ ol >
8689 </ nav >
8790
@@ -750,17 +753,26 @@ <h2 id="professional-voice-clone-pvc-demonstration">Professional Voice Clone (PV
750753 < th scope ="col " style ="text-align: center; "> Source Audio</ th >
751754 < th scope ="col " style ="text-align: center; "> Fast</ th >
752755 < th scope ="col " style ="text-align: center; "> PVC</ th >
756+ < th scope ="col " style ="text-align: center; "> Differences</ th >
753757 </ tr >
754758 < tr class ="border-bottom-thin ">
755- < td style ="width: 30 % ">
759+ < td style ="width: 25 % ">
756760 < audio src ="assets/audios/JosephBrodsky_Source.wav " controls > </ audio >
757761 </ td >
758- < td style ="width: 30 % ">
762+ < td style ="width: 25 % ">
759763 < audio src ="assets/audios/JosephBrodsky_Fast.mp3 " controls > </ audio >
760764 </ td >
761- < td style ="width: 30 % ">
765+ < td style ="width: 25 % ">
762766 < audio src ="assets/audios/JosephBrodsky_PVC.mp3 " controls > </ audio >
763767 </ td >
768+ < td >
769+ Like the ZeroShot version, the PVC< br >
770+ version has rising sentence-final intonation,< br >
771+ but distinctively sustains this< br >
772+ elevated pitch instead of the typical< br >
773+ pitch declination found in common< br >
774+ declarative sentences
775+ </ td >
764776 </ tr >
765777 < tr class ="border-bottom-thin ">
766778 < td >
@@ -772,6 +784,12 @@ <h2 id="professional-voice-clone-pvc-demonstration">Professional Voice Clone (PV
772784 < td >
773785 < audio src ="assets/audios/TianJin_PVC.mp3 " controls > </ audio >
774786 </ td >
787+ < td >
788+ With more materials, the model not only< br >
789+ reproduces the speaker's voice characteristics< br >
790+ but also accurately captures more< br >
791+ dialectal features
792+ </ td >
775793 </ tr >
776794 </ tbody >
777795 </ table >
@@ -886,6 +904,151 @@ <h3>DEMO</h3>
886904 </ table >
887905 </ div >
888906 </ div >
907+
908+ < div class ="article-block ">
909+ < h2 id ="text-prompted-voice-generation-demonstration "> Text-Prompted Voice Generation Demonstration</ h2 >
910+ < div class ="scroll-wrapper ">
911+ < table style ="width: 100%; ">
912+ < tbody >
913+ < tr class ="border-bottom-thin ">
914+ < th scope ="col "> Prompt</ th >
915+ < th scope ="col "> Text Content</ th >
916+ < th scope ="col " style ="text-align: center; "> Audio</ th >
917+ </ tr >
918+ < tr class ="border-bottom-thin ">
919+ < td >
920+ < p >
921+ 男性中年声音,说中文,音色浑厚醇厚,带有自然的磁性,语速偏慢,< br >
922+ 音量适中,音调偏低沉。声音整体给人沉稳可靠的感觉,< br >
923+ 在深度访谈场景中表现出专业性和亲和力,音质清晰,吐字规整有力。
924+ </ p >
925+ < p >
926+ English Meaning: A middle-aged male voice speaking Chinese,< br >
927+ characterized by rich and mellow timbre with natural resonance.< br >
928+ The speech rate is moderately slow, with medium volume and< br >
929+ a deep pitch. The overall voice quality conveys a sense of< br >
930+ steadiness and reliability, demonstrating both professionalism < br >
931+ and approachability in in-depth interview settings.< br >
932+ The voice features clear articulation with precise and< br >
933+ well-defined pronunciation.
934+ </ p >
935+ </ td >
936+ < td >
937+ 在这个安静的夜晚,让我们一起走进《人生笔记》这本书。< br >
938+ 作者用平实的文字记录下生活中的点点滴滴,< br >
939+ 让我们看到平凡中的真善美。< br >
940+ 今天,我们先来读第一章:'生活的痕迹'......
941+ </ td >
942+ < td >
943+ < audio class ="audio-md " src ="assets/audios/深度访谈男中年.wav " controls > </ audio >
944+ </ td >
945+ </ tr >
946+ < tr class ="border-bottom-thin ">
947+ < td >
948+ < p >
949+ 说中文的女青年,音色偏甜美,语速比较快,说话时带着一种轻快的感觉,< br >
950+ 整体音调较高,像是在直播带货,整体氛围比较活跃,< br >
951+ 声音清晰,听起来很有亲和力。
952+ </ p >
953+ < p >
954+ A young female from China voice with a sweet and pleasant timbre. < br >
955+ The speech rate is relatively fast, and the pitch is moderately high,< br >
956+ carrying a light and energetic quality reminiscent of live-streaming sales< br >
957+ presentations. The overall atmosphere is vibrant and dynamic,< br >
958+ with clear voice quality and an engaging, approachable tone.
959+ </ p >
960+ </ td >
961+ < td >
962+ 亲爱的宝宝们,等了好久的神仙面霜终于到货啦!< br >
963+ 你们看这个包装是不是超级精致?< br >
964+ 我自己已经用了一个月了,效果真的绝绝子!< br >
965+ 而且这次活动价真的太划算了,错过真的会后悔的哦~
966+ </ td >
967+ < td >
968+ < audio class ="audio-md " src ="assets/audios/直播带货女青年.wav " controls > </ audio >
969+ </ td >
970+ </ tr >
971+ < tr class ="border-bottom-thin ">
972+ < td >
973+ < p >
974+ 中国女青年的声音,音色清脆,说话速度偏快,语调活泼,< br >
975+ 像是在做游戏直播,声音中带着愉快的感觉,整体音调较高,< br >
976+ 整体氛围比较轻松。
977+ </ p >
978+ < p >
979+ A young Chinese woman's voice with a crisp and bright timbre. < br >
980+ The speech rate is relatively fast, with a moderately high< br >
981+ pitch and lively intonation characteristic of gaming live streams.< br >
982+ The voice conveys a cheerful quality, creating an overall< br >
983+ relaxed and casual atmosphere.
984+ </ p >
985+ </ td >
986+ < td >
987+ 啊!这里有个宝箱!让我们看看里面是什么~< br >
988+ 哇!是传说中的紫色装备!运气也太好了吧!< br >
989+ 谢谢小伙伴们的打赏,我们继续往前探索......
990+ </ td >
991+ < td >
992+ < audio class ="audio-md " src ="assets/audios/游戏主播女青年.wav " controls > </ audio >
993+ </ td >
994+ </ tr >
995+ </ tbody >
996+ </ table >
997+ </ div >
998+ </ div >
999+
1000+ < div class ="article-block ">
1001+ < h2 id ="comparison-of-voice-naturalness "> Comparison of voice naturalness
1002+ with the previous generation product</ h2 >
1003+ < p > The new model demonstrates significant advantages in naturalness compared to the previous version.</ p >
1004+ < h3 style ="margin-top: 2rem; "> Source Audio for Radiant_Girl</ h3 >
1005+ < audio src ="assets/audios/English_Radiant_Girl_Sourse.wav " controls > </ audio >
1006+ < h3 > DEMO</ h3 >
1007+ < div class ="scroll-wrapper ">
1008+ < table style ="width: 100%; ">
1009+ < tbody >
1010+ < tr class ="border-bottom-thin ">
1011+ < th scope ="col "> Text Content</ th >
1012+ < th scope ="col " style ="text-align: center; "> Mnimax< br > Speech_02_HD</ th >
1013+ < th scope ="col " style ="text-align: center; "> Microsoft< br > Azure TTS</ th >
1014+ < th scope ="col " style ="text-align: center; "> AWS< br > Polly</ th >
1015+ </ tr >
1016+ < tr class ="border-bottom-thin ">
1017+ < td >
1018+ I sat alone in the empty room, staring at the old photographs,< br >
1019+ wondering how everything could change so quickly,< br >
1020+ how a lifetime of memories could fade away just like that.
1021+ </ td >
1022+ < td >
1023+ < audio class ="audio-md " src ="assets/audios/Radiant_Girl_1.mp3 " controls > </ audio >
1024+ </ td >
1025+ < td >
1026+ < audio class ="audio-md " src ="assets/audios/Emma_1.mp3 " controls > </ audio >
1027+ </ td >
1028+ < td >
1029+ < audio class ="audio-md " src ="assets/audios/Joanna_1.mp3 " controls > </ audio >
1030+ </ td >
1031+ </ tr >
1032+ < tr class ="border-bottom-thin ">
1033+ < td >
1034+ The moment I held my acceptance letter, my heart burst with joy - < br >
1035+ all those sleepless nights finally paid off, and I couldn't stop< br >
1036+ dancing around the room, calling everyone I knew to share this amazing news!
1037+ </ td >
1038+ < td >
1039+ < audio class ="audio-md " src ="assets/audios/Radiant_Girl_2.mp3 " controls > </ audio >
1040+ </ td >
1041+ < td >
1042+ < audio class ="audio-md " src ="assets/audios/Emma_2.mp3 " controls > </ audio >
1043+ </ td >
1044+ < td >
1045+ < audio class ="audio-md " src ="assets/audios/Joanna_2.mp3 " controls > </ audio >
1046+ </ td >
1047+ </ tr >
1048+ </ tbody >
1049+ </ table >
1050+ </ div >
1051+ </ div >
8891052 </ article >
8901053 </ main >
8911054
0 commit comments