Translation Performance Gap | Learn AI Together | Page 1

tepid palm Sep 19, 2023, 3:43 AM

#

It is generally acknowledged that NMT models' translation performance for English-->XX direction is worse than XX-->English direction.

I am wondering if there is a reason for this because I have found none that are more than simple speculation. Any help would be amazing!

shadow geyser Oct 18, 2023, 10:40 AM

#

Interesting. I haven't heard of that.

#

It really depend on the other language. If you go from English to French you'll be inventing a lot of stuff (e.g., you in English is both singular and plural) while going to French you just chop things out.

#

I wonder if it is an artifact of the BLEU score and other n-gram overlap metrics rather than a real thing.

#

(I'm thinking the BLEU metric has a length penalty and English is very concise, that'd impact the scores.)

silver cargo Oct 29, 2023, 8:32 AM

#

tepid palm It is generally acknowledged that NMT models' translation performance for Englis...

I’m not sure what you’re referring to is inherent or merely observed, I would think it’s an issue with data. There’s lots of English -> French and English to -> Spanish translations but perhaps less Spanish -> English translation so a multi lingual NMT system might learn a better representation of English simply because there’s a larger available corpus?

Another issue that might be better explored is the data distribution of the tokenizer which there’s almost certainly an over representation of English compared to other languages. https://aclanthology.org/2022.amta-research.8.pdf

If I had to theorize on an inherent cause; there might also be an issue with information density, spoken language is often around 39 bits per second regardless of language. https://www.sciencedaily.com/releases/2019/09/190905124520.htm However English is a relatively information dense language compared to Spanish and French (I think this is the correct paper https://muse.jhu.edu/article/449938/pdf) so it might also be harder to translate more information dense languages into information sparse languages? Especially when languages like Spanish have pronouns, articles and vocal stresses that English does not?

Most of this is speculative. I’m tired and probably rambling but I hope this is helpful gibberish.

ScienceDaily

Similar information rates across languages, despite divergent speec...

Spanish may seem to be spoken at a higher speed than Vietnamese, but that doesn't make it any more 'efficient'. Researchers have shown that human languages are equally effective at transmitting information, even if the speeds at which they are spoken differ.

shadow geyser Oct 30, 2023, 6:08 AM

#

silver cargo I’m not sure what you’re referring to is inherent or merely observed, I would th...

English -> French is just French -> English. Translation is bidirectional as it is meaning preserving.

silver cargo Oct 30, 2023, 8:57 PM

#

shadow geyser English -> French is just French -> English. Translation is bidirectional as it ...

Could you elaborate? I am not very versed in machine translation but that doesn’t make sense. Some information is lost through translation such as with Spanish where you can infer the addressed (us, we, y’all, I, you, they) off of verb conjugations where a translation into English would omit this detail.

silver cargo Oct 30, 2023, 9:00 PM

#

shadow geyser English -> French is just French -> English. Translation is bidirectional as it ...

Plus if this was true wouldn’t that discount your original contemplations and the original commenters assertion?

shadow geyser Oct 31, 2023, 6:11 AM

#

silver cargo Plus if this was true wouldn’t that discount your original contemplations and th...

Nope. I was talking about the complexity of learning a model to go from a language that does not express certain information (say, relative age difference between speakers) to one that does. You're talking about training data.

#Translation Performance Gap