The same happens with whisper-large-v3 on Chinese transcription: silence is transcribed to something like "please upvote, share and favourite this video". I suspect they trained the model on some random YouTube video without carefully picking really useful data.
Whisper is unusable IMO because of the hallucinations. Widely documented. Removing silence from audio clips helps, but even then it will auto correct grammar, translating bilingual speech, etc. Improved in the latest audio models but not solved [1]
In Russian it often hallucinates "Субтитры сделал DimaTorzok" ("Subtitles by DimaTorzok") at the end of things. Interestingly, I wasn't able to find any YouTube videos with that name in the subtitles, so it's not like it's in a lot of training data.
I wonder if hallucinated copyright claims (esp. like the ZDF one at the bottom of the OP) will be introduced as evidence in one of the court cases against "big AI"
Well, I fail to see how the LLM is in the wrong here. Surely if a sufficiently large part of the training data comes from a single source, it is correct to credit them for the output.
Interesting! I used whipser last year to attempt to build an audio transcription tool but gave up due to excessive amount of hallucinated output no matter what model I used.
It would produce seemingly ok output until you started paying attention.
One example, it insisted that Biggie Smalls sings "Puttin five carrots in my baby girl ear". (its "carats").
It's apparently not useful in transcription as it don't reason [sic].
this happens in Turkish too. I believe the reason is that the movie subtitles were used for training without cleaning up the comments / intros subtitle authors leave in them.
leaving personal comments, jokes, reactions, intros in subtitles is very common in eastern cultures.
Turkish readers will probably remember “esekadam iyi seyirler diler” :)
I've found if the first 30 seconds of a recorded phone call is ringing and/or DTMF (almost always happens if you call a business) the system with either select Nynorsk or Welsh as the language. Never bothered to check what the text translated to but it's probably something similar. Not a practical issue for me but I can see it being a pain for any bilingual business or call center.
Looks like it's some random user who has generated some lyricslyrics translations between Arabic and English. It's strange, they don't seem to have many contributions. I would have imagined them to be more prolific.
Interesting. This is similar to the Google Translate bug where it would translate lorem ipsum as bits of political text (because it found most of its lorem ipsum examples flipping between languages on sites where one language was a news story but the not-yet-translated languages would output a lorem-ipsum file instead of a 404 when you toggled over to them).
Just to add some trivia: ChatGpt interprets(/ed) silence as "Sottotitoli e Revisione a cura di QTSS". Now many videos (mainly dailymotion) with autogenerated subtitles have their Transcripts full of the same message
Since it says "Translated by Nancy Qanqar" i'd be willing to bet they're training on some audiobooks with a transcript and somewhere in there it consistently has "Translated by Nancy Qanqar" in the transcript where there is dead air in the audiobook.
It's a common problem with many languages. If you speak gibberish fake Chinese at chatgpt and ask it to translate, it'll happily say you're saying coherent things.
Yeah, the subtitle "credits" occur very frequently. I found with whisper-2, they're also triggered by music.
I suppose the cause is the same, generally subtitle creators adding all kinds of stuff during the credits that is NOT a transcript.
Seems to me it could have been filtered out relatively easily during training, by clipping the first and last few minutes of all audios. But I guess that's just in hindsight.
Whisper also likes to transcribe cut off speech or unintelligible noise as "Thank you". I have no idea where that is coming from, but I guess it's a very polite model...
Garbage in, garbage out. If the training dataset (accidentally) paired silence (`X_train`) with `رجمة نانسي قنقر` tokens (`y_pred`), then any silence will always be translated to that. Fortunately, this particular problem is easy to fix--just detect and remove silent parts before API call. This also has a side benefit of saving you money on transcription.
When using ChatGPT audio transcription, sometimes it adds to the end “Subtitles created by ...”, and then some username. Obviously, an artefact of training on subtitiles dataset.
Interesting that this happens even on large v3. I had once done a deep dive into STT and Whisper Large was the only model that could correctly transcribe Yann LeCun
(it was a Lex Friedman podcast), ever since I held the belief that it was the best STT model, this was over 2 years ago
Using Whisper to sub Japanese vtuber concerts for my enjoyment, I've noticed a similar trend. Not one specific phrase, but several. Some are strange ("I'm going to make a hole in the back of the head"), some are clearly from lyrics websites.
I get the same with Welsh, when having some network issues in voice chat it hallucinated me saying "Diolch yn fawr am wylio'r fideo." which translates as "Thank you very much for watching the video."
The fork that I've been using, WhisperX, seems to do better. I've used it on clean splits of mic tracks (ie total silence when the other is talking) with far fewer hallucinations.
This is a nice reminder that there is no real reasoning in the "AI" it is just still guessing the next word. After being trained on subtitle files which I guess is actually a clever idea as they convey real conversations without pirating, subtitles are freely distributed after all by dedicated translators.
Good to see they're the ones getting credit though!
Hey guys, AI by 2027 is going to be superhuman AGI Agentic mega-intelligence, you better fire all your employees and get ready for AI to take your job and embrace your spouse at a Coldplay concert.
Big data. Machine learning. Blockchain. Artificial intelligence. Digital manufacturing. Big data analysis. Quantum communication and…Internet of things.
This time the hype cycle won’t be a massive exaggerated disappointment, for real this time.
Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabic
(github.com)562 points by edent 22 July 2025 | 338 comments
Comments
It's the LLM equivalent of thinking that an out-of-office reply is the translation: https://www.theguardian.com/theguardian/2008/nov/01/5
The Arabic text "رجمة نانسي قنقر" translates to English as: "Nancy Qanqar's translation" or "Translation by Nancy Qanqar"
"رجمة" means "translation" and "نانسي قنقر" is the name "Nancy Qanqar"
1. https://news.ycombinator.com/item?id=43427376
Way to go Nancy! Keep up the good work, ya crazy bastard!
"[ sub by sk cn2 ]"
or
"Anyways, thanks for watching! Please subscribe and like! Thanks for watching! Bye!"
or
"This is the end of the video. Thank you for watching. If you enjoyed this video, please subscribe to the channel. Thank you."
It would produce seemingly ok output until you started paying attention.
One example, it insisted that Biggie Smalls sings "Puttin five carrots in my baby girl ear". (its "carats").
It's apparently not useful in transcription as it don't reason [sic].
violets are blue
unregistered hypercam 2
But honestly, this is the AI equivalent of “please send for translating” in Welsh on a Welsh street sign.
https://www.theguardian.com/theguardian/2008/nov/01/5
Well now I know how I’m going to start filling awkward silences in meetings.
leaving personal comments, jokes, reactions, intros in subtitles is very common in eastern cultures.
Turkish readers will probably remember “esekadam iyi seyirler diler” :)
https://lyricstranslate.com/en/translator/nancy-qunqar
i.e. https://www.dailymotion.com/video/x9g9d6u
- they indeed seem to have trained on movies/subtitles
- you absolutely positively must use Voice Activity Detection (VAD) in front of whisper
I suppose the cause is the same, generally subtitle creators adding all kinds of stuff during the credits that is NOT a transcript.
Seems to me it could have been filtered out relatively easily during training, by clipping the first and last few minutes of all audios. But I guess that's just in hindsight.
Whisper also likes to transcribe cut off speech or unintelligible noise as "Thank you". I have no idea where that is coming from, but I guess it's a very polite model...
``` text = "helo helo hello ." target_phrase = "ترجمة نانسي قنقر" replacement = ""
updated_text = text. Replace(target_phrase, replacement)
print(updated_text) ```
I suspected as others mentioned, these were extracted from torrents movies.
Big data. Machine learning. Blockchain. Artificial intelligence. Digital manufacturing. Big data analysis. Quantum communication and…Internet of things.
This time the hype cycle won’t be a massive exaggerated disappointment, for real this time.