Below are some samples from my VALL-E implementation:
Unlike the original VALL-E demo page, I'm placing emphasis on the input prompt, as the model adheres to it stronger than others.
Objective metrics are computed by transcribing (openai/whisper-base) then comparing the word error rate on transcriptions (WER/CER), and computing the cosine similarities on embeddings through a speaker feature extraction model (microsoft/wavlm-base-sv) (SIM-O) between the output audio and the prompt.
Tables marked as "Validation" are samples not seen to the model.

Other model demo pages: [ar+nar-len-llama-8 (ar+nar)] [ar+nar-len-llama-8 (nar-len)] [ar+nar-llama-8 (ar+nar)] | Old demo pages: [1] [2] [3]

Model (


Average WER: 0.067
Average CER: 0.020
Average SIM-O: 0.891

Text WER↓ CER↓ SIM-O↑ Prompt Our VALL-E Original VALL-E Ground Truth
Number ten, fresh nelly is waiting on you, good night husband. 0.1820.0320.781
Yea, his honourable worship is within, but he hath a godly minister or two with him, and likewise a leech. 0.0000.0000.961
Instead of shoes, the old man wore boots with turnover tops, and his blue coat had wide cuffs of gold braid. 0.1430.0510.914
The army found the people in poverty and left them in comparative wealth. 0.0000.0000.879
Thus did this humane and right minded father comfort his unhappy daughter, and her mother embracing her again, did all she could to soothe her feelings. 0.0770.0130.770
He was in deep converse with the clerk and entered the hall holding him by the arm. 0.0000.0000.922
They moved thereafter cautiously about the hut groping before and about them to find something to show that Warrenton had fulfilled his mission. 0.2610.1060.750
And lay me down in thy cold bed and leave my shining lot. 0.0770.0450.887
LibriSpeech (Validation)
Text WER↓ CER↓ SIM-O↑ Prompt Our VALL-E Ground Truth
If any boys have special confessors, perhaps it will be better for them not to change. 0.0000.0000.836
It can't be for money or balls or play and he has no honest business anywhere. 0.0000.0000.953
How long are we likely to be separated? Why are we to be denied each other's society? 0.0670.0550.906
Unless you speak more fully senora, I cannot understand you. replied her husband. 0.2310.0450.805
What did it avail to pray when he knew that his soul lusted after its own destruction? 0.0000.0000.867
You may wonder to hear me speak thus, being so young. 0.0910.0350.906
Here, he proceeded to swallow his prize head first. 0.0000.0000.898
Happily for him, he, on his part, caught sight of the hawk and dropped like lead into the point. 0.1110.0710.809
Put them away quick before Andela and Rosalie see them. 0.2000.0330.965
Humble man that he was, he will not now take a back seat. 0.1540.0320.965
We lay there, our blood running cold with unspeakable terror. 0.0000.0000.965
Irene Vergoyne, one of her family, told me in confidence that there was a romance somewhere back in the beginning. 0.0000.0000.965
I like trees because they seem more resigned to the way they have to live than other things do. 0.0000.0000.887
You are mad. She has given birth to a princess. 0.0000.0000.957
They sat down side by side and Dorcas held her hand. 0.1820.0510.879
The Latter-day Saints were long regarded as a polygamous people. 0.2000.0300.879
Stop it, said Buttonbright. I can't hear myself think. 0.2220.0350.871
But look here, at Carpaccio, even in my copy. 0.0000.0000.859
A sudden wave of scarlets swept over Eliza's face. 0.1110.0180.961
No, I am quite proud of my person, was the reply. 0.0000.0000.902
I telephoned again and felt something would happen, but fortunately it did not. 0.0000.0000.879
A veiled sunlight lit up faintly the gray sheet of water where the river was embayed. 0.0000.0000.887
The wind was flapping her big hat and teasing a curl of her chestnut-colored hair. 0.0000.0000.961
Then the curtain rises and it is apparent that we are assisting at an at-home of considerable splendor. 0.2220.0780.910
I want to be something, to make myself something, to do something. 0.0000.0000.922
No, it was not their colors. It was the poise and balance of the period itself. 0.0000.0000.898
Dearest, Teach me so to pour out gratitude, As thou dost good. 0.0000.0000.906
Burbank and his tribe represent in the vegetable world Edison in the mechanical 0.0000.0000.934
Very soon after dinner, Charles Smith excused himself. 0.0000.0000.723
Well then, I must make some suggestions to you. 0.0000.0000.906
It won't be much, but I'm grateful to find a friend. I'm guilty, you know, and there's no one to blame but myself. 0.1360.0510.902
She then rose, humming the air to which she was presently going to dance. 0.0000.0000.930

Emilia (EN)

Average WER: 0.061
Average CER: 0.027
Average SIM-O: 0.848

Text WER↓ CER↓ SIM-O↑ Prompt Our VALL-E Ground Truth
Had a really good identity card, national identity card system. 0.0000.0000.703
Web on desktop has seen path breaking innovation in the last few years. 0.0000.0000.668
Quite prevalent here, here in Alberta and a lot of people just don't see it. 0.0000.0000.797
Hence, integrated pest management in cotton was a subject matter chosen for the Farm Field School 0.0000.0000.648
Or more importantly, some people may naturally heal over time. 0.0000.0000.688
Man, this is a fresh, something's happening. Something is stirring. 0.1000.1580.848
Depending like justice and aging and the AARP and the. 0.5000.1430.957
The peer to peer aspect of Bitcoin is not in the people, it's in the computers. 0.2140.1030.848
Very, very critical in how it actually plays a role in so many different ways. We need for you. 0.0000.0000.871
Tuesday, Thursday, and Saturday. Like space out the day, use if that makes sense. 0.0710.0240.777
Because the statute says whenever an allegation is raised, basically. 0.0000.0000.895
They gave us a list and I think I'm gonna go with something called direct relief. 0.1250.0710.789
Making resolution you need, and the composition automatically adjust to it. 0.0910.0130.840
State I won't read that comment. So before I start my command, 0.0000.0000.832
Best of all, they are a low cost, simple solution. 0.0000.0000.926
== Departments and programs == 0.0000.0000.906
During the Renaissance, the Age of Reason, and the Enlightenment, categorizing organisms became more prevalent 0.0000.0000.930
We've divided the project as a whole into three work streams. 0.0000.0000.902
Newton's laws of motion and Newton's law of universal gravitation were formulated in the 17th century 0.0670.0330.977
Due to fire suppression, forests have replaced many areas of former oak woodland 0.0000.0000.957
To equip young people with the tools needed to lead healthy and productive lives. 0.0000.0000.898
Because we didn't anticipate at the time of the original consent. 0.0000.0000.824
Before you, uh, as you're putting your screws in, okay? And then you're, you're gonna be all good there. 0.0530.0310.906
Since we are going long, we set our stop loss at the green ATR moving average. 0.0000.0000.844
Academic journals publish collections of articles throughout the year. 0.0000.0000.914
London 2012-2012 Microsoft Windows, PlayStation 3, Xbox 360 0.0000.0000.977
Once we've calculated the number out required for our job, we can use that value to calculate the total number of sheets needed. 0.1300.0740.934
Satellite Pay TV is offered by the Wenanchi Group 0.1110.0530.660
I was just thinking, uhm, back in undergrad I learned about so many different... 0.0710.0600.961
A rear-facing television was mounted in the roof just behind 0.0000.0000.648
Uh, but this work also incorporates a significant portion on gender. 0.0910.0390.785
Cardinal Flavio began work on the Villa Chiga Versalia at Fermello in 1664 0.0620.0160.957

Emilia (FR)

Average WER: 0.774
Average CER: 0.532
Average SIM-O: 0.895

Text WER↓ CER↓ SIM-O↑ Prompt Our VALL-E Ground Truth
Car on joue de rencontre un team avec la personne du Saint-Esprit, où le Saint-Esprit l'a instruit. 0.6470.3560.633
La course à pied, ce n'est pas vraiment mon sport préféré. 0.0000.0000.891
Notre maison de rêve est là. 0.5000.3080.895
Voilà. Tu, tu archives. Ensuite, tu fais un making of. 20.50015.6030.887
Il ne faut pas croire cependant que tous les mormons soient polygammes. 0.1670.0660.957
Il lui faut trente heures pour faire le tour de Mars. 0.2730.1630.934
Il répondit, je n'ai été envoyé qu'au brebis perdu de la maison d'Israël, hein. 0.4290.2280.727
Cette belle fille 0.6670.8460.887
Le troisième besoin, c'est le besoin de rapports sociaux avec les autres êtres humains. 0.1430.0590.914
Oui, j'en ai pas mal. Avez-vous du temps maintenant ? 0.0000.0000.961
Comme vous l'a expliqué Patrick, euh, une des volontés. 0.6670.4470.914
Il se levait très tôt et venait tous les matins dans sa chambre. 0.0000.0000.973
C'est vraiment le cœur du système qu'on va changer et c'est très compliqué. 0.0770.0140.844
cet engrais vert doit également me permettre, au-delà de nourrir mon sol, il doit permettre de... 0.4380.1460.922
Parce que voilà, euh, c'est vrai que des fois, ça peut être un peu épuisant de donner, donner, donner, mais en fait, je sais pas faire autrement. 0.3460.1750.965
Ça me paraît, je veux pas, j'ai pas de boule de cristal et je suis pas expert en l'occurrence devant la justice. 0.1820.1030.898
euh, si vous voulez une version un petit peu plus difficile, vous montez. 0.4620.2540.906
Voilà. Et si, toi, tu n'exerces pas la miséricorde dans la droide? Ton frère. 0.6430.2740.891
C'est lui qui a l'ennite ou pas? C'est malheureusement moi. 0.3000.1000.898
être debout être debout 0.0000.0000.922
Et la nuit, en sasser dehors et en parler durant des heures. 0.1670.0930.965
Bastien a trois yeux. 0.5000.1430.895
Donc, euh, qu'il y a des gens qui savent très bien trader et qui vous donnent des bons conseils. Donc. 0.2500.2120.945
de s'en extraire, ou en tout cas de, de questionner cette violence ou de la mettre à distance. 0.1110.0690.938
Il y a juste une petite mani, donc, si ça reste un fichier SPSS, vous allez voir un fichier .dat. 0.2500.1520.922
euh, cet argent ne servira qu'à l'achat de matériel ou aux frais de déplacement de cas échéant. 0.2350.0780.910
Voilà, euh, c'est un peu psychos, hein. Un spot, ok. 0.4000.2360.914
Mais les grandes maisons de luxe n'existent pas encore. 0.0000.0000.875
l'organisation générale des Antiquités, des manuscrits et des musées, apparue dans les années 1970. 0.3120.0640.922
Et dans ces, cette époque-là, c'était, euh, il recommandait des livres de développement personnel. 0.5710.3800.973
L'eau aide à l'élimination des toxines et protège le foie contre le développement des calculs billaires. 0.5620.2530.965
Ouais, qui base. Alors, autant, on l'a vu au tout début, hein. RDF, il n'y en a qu'un. Type. 0.4210.1950.891

Emilia (DE)

Average WER: 0.252
Average CER: 0.121
Average SIM-O: 0.896

Text WER↓ CER↓ SIM-O↑ Prompt Our VALL-E Ground Truth
Und das ist noch lange nicht alles. Es kommt noch viel schlimmer. 0.0000.0000.969
Ja mega, Ruhe. Bitte vermeiden Sie Lärm. 0.2860.0670.812
Und ich schlage vor, dass wir Getränke einkaufen. 0.0000.0000.934
Und die Reparatur hat viel Geld gekostet. 0.0000.0000.918
Die seinerzeit zu den Borschen des Dorfes gehört hatten, hießen nun Altbauern. 0.2500.0470.941
Ich habe ihn vorhin vor dem Kurs gesehen. Ich denke er wird gleich kommen. Ah, da ist er ja. 0.1580.0500.934
Das heißt, ich komme auch nicht wieder weg und kann holen gehen. Kann ich auf die schießen, Junge? 0.0000.0000.910
Aber sie zeigt mir, wie unbedeutend viele Dinge sind und dass ich mir keine Sorgen machen muss. 0.0590.0300.953
Machen wir weiter mit dem Präteritum. 0.3330.2270.816
Dort erwartete mich die Stuvades mit verwehrtem Gesicht. 0.5000.1500.965
Wo willst du denn grillen? Auf deinem Balkon? 0.0000.0000.918
Gut, dann auf Seite 94 haben wir einen. 0.2220.0170.871
Ach, Frau Meyer, guten Tag. Nein, ich wohne nicht hier. Wie geht es Ihnen? 0.1430.1170.926
Uh, wenn wir gegessen haben werden, werden wir schwimmen gehen. 0.2000.0760.852
Ja, das war, das war ein interessanter zu... 0.2500.1960.895
Gott ist für mich selber sehr vieles, aber die Wahrheit ist eins der... 0.0770.0990.750
Am Morgen nach der Party befindet sich Eddy in der Abstellkammer. 0.1820.1890.961
Die Straßenbahn. Die Straßenbahn. 0.7500.5430.715
Sie brauchen ihn wie die Mami. Täglich als Helfer bei den Schularbeiten. 0.0830.0130.891
Er kommt aus Nigeria. 1.0000.6300.926
Klar, aber jetzt fahren wir erst einmal in die Berge. 0.2000.1430.895
Es war einmal eine Kröte mit Warzen. 0.2860.0450.840
Damit ist das Wandern die absolute Nummer 1. 0.1250.0590.918
Nein, das ist von Joyce Carol Oates aus einem Essay über Boxen. 0.5000.3290.871
Martha spielt gut Gitarre. 0.2500.1000.906
Ich hab gemerkt, das hab ich letztens erst rausgekriegt, dass ich diesen Schiebebalken zwischen 0.2860.1880.832
Aus die, wegen der Kreuzungen und aus diversen Gründen. 0.3330.1150.895
Sobald es was zu berichten gibt, melden wir uns. Danke für eure anhaltende Unterstützung, eure Piranhas. 0.1880.0670.949
Im Moment geht ihr beide ja noch zur Schule und wohnt mit eurer Familie zusammen. 0.1330.0220.887
Erlassen wir dabei. Jo. Schön, Manuel. War eine nette Unterhaltung heute mit dir. 0.3850.1930.977
Euren WhatsApp-Verlauf interpretiert ihr sicher nicht im deutschen Unterricht, oder? 0.1000.0730.926
Auf ein Girokonto. Das Girokonto. 0.8000.0980.930

Emilia (JA)

Average WER: 1.182
Average CER: 0.198
Average SIM-O: 0.939

Text WER↓ CER↓ SIM-O↑ Prompt Our VALL-E Ground Truth
他にも動画がありますので、ぜひご覧ください。 0.0000.0000.918
あ、そっちか。ね、ここは、今、セルバーさんに、 1.0000.0220.820
今年の大切なものについて紹介したいと思います。 1.0000.7220.973
じゃあ決まったらこの紙に書いて持ってきます。 1.0000.1300.949
隣の部屋、10時までに片付けて下さい。 0.3330.0920.934
自動車の修理会社の人からメッセージが入っています。 0.7220.5990.953
おとうさん 0.0000.0000.930
さて、今月も皆さんから頂いたお便りを紹介しようと思います。 2.0000.1810.980
この魚は生では食べられないから、煮るなり焼くなりしてください。 2.0000.0230.949
ただ、費用がかなりかかるから、新聞ってのはね。 1.0000.2880.918
できている、完璧に、というふうに思っています。 2.0000.0330.918
そうよ、うちのうえのこは。 0.0000.0000.918
トコの部分に、あの、土入ってないとね、根っこ腐って枯れちゃう。 3.0000.2820.938
はい。えっと、最近は、えっと、人間失格。 1.0000.3260.781
あ、ちょっと待ってください。私は、 0.0000.0000.961
ミニバンだからと言って、もうセッタカな車だからと言って、 2.0000.0430.930
すみません。赤いボールペンありませんか? 2.0000.2610.984
で、これ、なんで赤くするのかっていうのはですね、え、赤色っていうのは目の刺激が少ないんですよ。 2.0000.3460.945
全く頼りにならない国のままだという印象を与えてしまうのではないでしょうか。 2.0000.0650.953
会社によっては、間の平日も休みにしたり、 0.0000.0000.965
私の国のカレーです。あまり辛くないですよ。 1.0000.1790.953
ついに新しいクロップトップを切ることができます。 2.0000.1620.953
俺も職員の一人として一緒にくっついていくんだ。 1.0000.1690.949
問題が難しすぎるんじゃないですか? 1.0000.0580.941
ね、あの基本的にその自動調理器は一回スイッチ入れたら途中で、 5.0000.4890.969
ヘアピンク感というか少しこうタイトなコーナーなんですけれども、すごく車が 1.0000.2240.883
彼は一目で彼女を大好きになってしまった。 0.0000.0000.973
諸外国に比べて日本は食料品 が高いと言われている 1.0000.3930.953
手元に届くまで2日しかかからなかったし送料も無料でしたし 1.0000.3430.973
お父さん、洋服はもうあまり着ないから。 0.0000.0000.977
大事なのは毒を盛る方法が他にあったかどうか。 1.0000.1270.930
手触りの良いコットンのトートバッグが入ってまして、 0.7670.7710.977

Emilia (KO)

Average WER: 0.273
Average CER: 0.082
Average SIM-O: 0.921

Text WER↓ CER↓ SIM-O↑ Prompt Our VALL-E Ground Truth
부해의 법도 그 힘이 미치지 못합니다. 0.2500.0410.902
이 기념 아시겠어요? 그래서 콜렉터와 어그리게이터는 굉장히 다른 겁니다. 0.1110.0100.934
일본 활동 중 가장 추억에 남는 일은? 0.0000.0000.926
도토리목 부추전, 도토리목 탕수육. 도토리목 빼렸는데 더 늘었네. 0.0000.0000.945
11과 사과 5개 주세요. 0.0000.0000.922
저리 가, 못생긴 늑대야! 내 집에는 절대 들어올 수 없을 거야! 0.2500.0370.930
시골지는 음식을 먹을 생각에 매우 신이 났죠. 0.0000.0000.832
뭐든 많이 하면 좋잖아요. 1.0000.1940.957
그러다 보니까 아날로그틱한 액션신들을 좀 많이 넣어서 이제 0.5000.1140.941
그리고 보증금은 적으면 적을수록 좋습니다. 0.0000.0000.980
두 번째 세트는 어떻게 진행될 것인지 잠시 후 양 팀의 2세트로 돌아오겠습니다. 0.0000.0000.730
남녀노소 누구나 좋아하는 영양식으로 인기를 끌고 있습니다. 0.4000.0260.945
우리 아기를 구한되다가 아직도 이렇게 훌륭하다는 걸 증명하는구나. 0.0000.0000.883
아, 그래요? 회사에서 일본어가 필요해요? 0.0000.0000.852
내일 비가 오지 않는다면 피크닉을 가질 거예요. 0.0000.0000.895
상상할 수 없는 나쁜 놈이잖아요. 정말 인류 역사 속에 이렇게 나쁜 놈이 있을 수가 있겠어요? 0.1820.0570.926
주말에 친구들과 같이 놀이공원에 가려고 해요. 0.0000.0000.977
네, 이상 강연을 마치도록 하겠습니다. 셰프 김소봉이었습니다. 감사합니다. 0.0000.0000.926
영진씨, 그래서 고속도로를 비롯한 모든 도로 및 철도가 0.3330.0920.949
어, 뭐라고, 뭐라고. 1.0000.5500.855
오늘도 그 말씀과 동행하다가 주님 만나시기를 바랍니다. 0.4000.1330.934
아, 이쪽은 이제 저거를 해와야 되는 거구나. 뭐지, 어떻게 하지? 0.0000.0000.910
물을 많이 마시고 훅 쉬세요. 0.5000.0610.953
지금 제가 일식 요리를 하게 된 바로 그 원동력입니다. 0.2220.0680.938
화면으로 보이는 것와 같이 대면 교육을 진행합니다. 0.2860.0560.953
읽으려고, 아, 읽으려고 도서관에 갔어요. 0.4000.0830.949
아, 제목 어렵다. 뭐라고 하지? 숨 막히는 순간? 댓글에 그럴 것 같은데, 진짜 숨 막히는 영상이네요. 0.1540.0150.961
음, 약간 냉팔보채 느낌? 2.0000.5000.934
적어서 아직 말을 잘 못해요. 0.0000.0000.938
여행가는 가까이 다가가서 더 자세히 관찰했어요. 0.0000.0000.934
63빌딩이라고 들어봤어요. 0.0000.0000.945
記者, 린다이, 메고 있어요. 0.7500.5740.922

Emilia (ZH)

Average WER: 0.117
Average CER: 0.055
Average SIM-O: 0.898

Text WER↓ CER↓ SIM-O↑ Prompt Our VALL-E Ground Truth
嗯,还行,天霸混进去一个,嗯,灵树好像混进去。 0.2780.1230.977
工作人员以为他在开玩笑,便也故作一本正经的说。 0.0000.0000.945
这张啊名字叫难以忘怀的老师。 0.0770.0490.836
吃羊脸儿去把咱们饭店最好的羊脸拿上来。 0.0560.0430.934
咱们冒兴为这个密点,死的人还少吗? 0.0670.0110.723
好再来community刚才说community的交流,那们在同一个社区的全体居民或叫社区。 0.0710.0420.938
快递个we搞撒拉好,然后老我再放,你抢给我个time. 0.3500.2000.914
刚才打开的那个保险库的门啊,然后这边可以看一下一些日志。 0.0380.0250.898
哦,那么快,这个家伙给我发了四十个。 0.1330.0220.957
常哥没关系,就算了吧,你就帮乐言打个架。 0.1180.0190.922
这是什么呢?是一场已经发生了改变的年代。 0.0000.0000.930
这个里面是周楠和一个叫东东的女孩的照片。 0.0000.0000.934
送你八字真年得之我幸失之我命。 0.2140.0450.820
我跟他说,让他假装你女朋友。 0.0000.0000.898
在导读当中我提到,它是我们十一章当中最重要的一个章节。 0.0400.0060.953
好好好,行,行,行,那上去吧,走吧。 0.0830.0560.883
Chinese translation service?你需要帮助吗? 0.4440.2930.938
董董事长最近常跟一个舞舞女接触。 0.4670.1340.691
在下深夜前来,想必大人知道所谓何事。 0.0620.0100.973
所以大家一定把七个声调念准确。 0.0710.0110.918
It's not magic maomo. It's physics. Come with me. I'll show you. 0.0670.0170.918
不是,我说实话,宣老师,哎,你这个科法,你后面的词,别慌。 0.1820.1180.844
流程图作业图,生产过程图解。 0.0000.0000.949
就更不能让佳阳为了逃避去选择文晓华。 0.1180.0190.938
今天中午食堂有什么好吃的吗? 0.0000.0000.898
由于后面足语比较长,所以句子构成了道庄。 0.1110.0430.941
送入两瓶好酒,我没收啊。 0.3000.2460.848
哎,言归正传啊,我今天来聊聊我的大学生活哦,我当时是在美国上的。 0.1070.0750.922
为了不让你再犯错呢,今天又是星期天,我放你一天,你让我清清静。 0.1110.0880.906
经过一些处理,经过一些加工,就是 0.0000.0000.961
这是白水禅院的法源送来的,想请大人通融放了他。 0.0950.0150.703
给少瑜送出的信发出几天了,已经四天了,可是什么消息都没有。 0.0770.0490.918