I don't know much about Swedish, but I am learning Korean, and Google Translate is dangerous in a much more subtle way with Korean. In particular, in Korean you conjugate verbs (and often, choose different nouns) based on the relative age and social standing of the speaker and listener. In Korean specific translation tools (e.g. Naver) there is a toggle to select whether to use "honorifics" or not. But, Google tends to default to the form of speech (banmal) reserved for talking to young children or close friends. But, if I am using a translation tool, I probably don't know the person I am conversing with very well. As a result the translations tend to come off as very rude.
If I used Google Translate to talk to a shopkeeper, it would be roughly equivalent to saying "Hey, little buddy, how much for this?" as opposed to "Excuse me sir, what is the cost of this item?"
And this is all without considering all the weird mistranslations you can get because Korean is much more heavily context dependent than English. Korean speakers often leave out the subject or object if it can be understood from context (context that the translation tools are likely missing). So Google translate will insert pronouns (it, him, her...) to make the English flow better, but are not based on anything in the original Korean. So, if it guesses wrong, you could imagine the level of confusion that could ensue.
And then all the homonyms in Korean combined with the heavy context dependence makes for some weird translation. I once tried checking my Korean homework with Google translate, and before I knew it, I was drinking a car.
ChatGPT beats DeepL for me as well (especially GPT-4), although I like DeepL's UX better for now where it quickly shows you related words which helps me better understand the meaning.
Very impressed with GPT-4 translation though -- especially the ability to steer it between "transliteration", "keep the meaning", "keep the tone", "use local idioms where appropriate", "explain different possible meanings/intentions", etc.
I've tried it when I had the time to compare (DeepL vs GPT4), and find them to be pretty equal.
But DeepL easily win on speed. 5 paragraphs would take just some seconds with DeepL and be almost 100% correct, while GPT4 would take almost a minute (sometimes more) while being about the same amount of correct.
> Very impressed with GPT-4 translation though -- especially the ability to steer it between "transliteration", "keep the meaning", "keep the tone", "use local idioms where appropriate", "explain different possible meanings/intentions", etc.
I've found that DeepL already does this well even thought it's not a LLM (as far as I know).
Yes. DeepL is very good! But with ChatGPT I can "tune" it more towards one way, whereas DeepL just only does whatever it does. DeepL has very very sensible defaults and the UI is great. But in fairness, DeepL basically will never just insert an appropriate idiom. Also GPT-3.5 is still worth comparing to DeepL as well.
I've also found that DeepL's consistently better or equal when it applies for my personal usage. Some users will care that DeepL doesn't support as many languages, doesn't have TTS, and doesn't offer transliterations (such as pinyin for Chinese).
Seems to me like it's ignoring context and indirectly applying literal synonyms in the wrong direction. That is, if screwing->fucking (sorry, it's necessary to make the point) but not the other way around except in one context and screwing can be synonym for tighten something in another context, then bracing->tighten and tighten->screwing, it could have "walked backwards" doing something like brace->tighten->screw->fuck, where the last synonym is not valid in the context where the first synonym could be.
Sorry about language and the poor description; I seem to have an idea of the problem, but it's not my field and have no way of describing it in formal way.
There's a difference between Björn (the name) and björn (the animal).
Capitalization gives additional context in this case, if it were in the beginning of the sentence though, then one would hope it contains other clues as well
I do not know your language, but at least in Polish the translation makes some sense.
They have the word "czarny" which means "black". "czarny kot" is "black cat", "on jest czarny" is "he is black". The latter is purely descriptive.
There is also the word "czarnuch" which is the VERY offensive word for Blacks, best translated by "negro".
Now, the last name "Czarnuch" is a normal last name, without any connotations to the color black (except probably in its etymology) and does not sound weird/offensive.
The translation of this capitalized word would naturally yield "Negro".
Crnjak (tsr̩̂ɲak) has usages in Croatian language, one of them is "black" wine, the other is black joke.
There might be a reference somewhere where it is used as a description of a person, but probably not in Croatian.
I did notice Google Translate hallucinates words, some very amusing, when I translate from Any -> Croatian (this happens automatically when reading Google Maps place reviews). There has been quite a lot of words that naturally map to Croatian but there's no text (outside of blogspam) that uses it on the Internet.
Google. I feel like they were great for a while but they fell off at some point and just started copying everyone and failing at doing anything better. So yeah, I think the answer to "What's wrong at Google?" is "Google"
"Foda-se Björn" in Portuguese, which is, interestingly, a bit different in meaning. For the lack of a better description, it's closer to disappointment than anger. "Well, fuck me, Bjorn " rather than "Fuck you, Bjorn" (which would've been "vai-te foder, Björn"), that sort of thing.
tangentially i’m still sad about the loss of the easter egg one used to obtain when attempting to translate “Wenn ist das Nunstück git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput!” into english.
This reminds me of the recent so-called "Glitch Token" phenomenon[1]. When GPT-3 was presented with reserved tokens it never encountered during training, it reacted in extremely unpredictable ways -- often with a simple "fuck you".
For those unfamiliar with LLM architecture: "tokens" are the smallest unit of lexical information available to the model. Common words often have their own token (e.g.: Every word in the phrase "The quick brown fox jumped over the lazy dog" has a dedicated token), but this is a coincidence of compression and not how the model understands language (e.g.: GPT-3 understands "defenestration" even though it's composed of 4 apparently unrelated tokens: "def", "en", "est", "ration").
The actual mechanism of understanding is in learned associations between tokens. In other words: the model understands the meaning of "def","en","est","ration" because it learns through training that this cluster of tokens has something to do with the literary concept of violently removing a human via window. When a model encounters unexpected arrangements of tokens ("en","ration","est","def"), it behaves much like a human might: it infers the meaning through context or otherwise voices confusion (e.g.: "I'm sorry, what's 'enrationestdef'?"). This is distinctly different from what the model does when it encounters a completely alien form of stimulation like the aforementioned "Glitch Tokens".
At the risk of anthropomorphizing, try imagining if you were having a conversation with a fellow human and they uttered the following sentence "Hey, did you catch the [MODEM NOISES]?". You've probably never before heard a human vocalize a 2400Hz tone during casual conversation -- much like GPT-3 has never before encountered the token "SolidGoldMagicarp". Not only is the stimulus unintelligble, it exists completely beyond the perceived realm of possible stimulus.
This is pretty analagous to what we'd call "undefined behavior" in more traditional programming. The model still has a strong preference for producing a convincingly human response, yet it doesn't have any pathways set up for categorizing the stimulus, so the model kind of just regurgitates a learned lowest-common-denominator response (insults are common).
This oddly aggressive stock response is interesting, because it's actually the exact same type of behavior that was coded into one of the first chatbots to (tenuously) pass a Turing test. I'm of course referring to the "MGonz" chatbot created in 1989[2]. The MGonz chatbot never truly engaged in conversation -- rather, it continuously piled on invective after invective whilst criticizing the human's intelligence and sex life. People seem predisposed to interpreting aggression as human, even when the underlying language is, at best, barely coherent.
It was actually on-topic. I guess not many people got the joke, or maybe some found it offensive (was it?) but it was actually relevant to the topic, if only a bit tangentially.
Google incorrectly translates "get well soon" as "eff you", but "effing" can actually heal some health problems, so it may as well mean "get well soon".
If I used Google Translate to talk to a shopkeeper, it would be roughly equivalent to saying "Hey, little buddy, how much for this?" as opposed to "Excuse me sir, what is the cost of this item?"
And this is all without considering all the weird mistranslations you can get because Korean is much more heavily context dependent than English. Korean speakers often leave out the subject or object if it can be understood from context (context that the translation tools are likely missing). So Google translate will insert pronouns (it, him, her...) to make the English flow better, but are not based on anything in the original Korean. So, if it guesses wrong, you could imagine the level of confusion that could ensue.
And then all the homonyms in Korean combined with the heavy context dependence makes for some weird translation. I once tried checking my Korean homework with Google translate, and before I knew it, I was drinking a car.