#Translation Performance Gap

10 messages · Page 1 of 1 (latest)

tepid palm
#

It is generally acknowledged that NMT models' translation performance for English-->XX direction is worse than XX-->English direction.

I am wondering if there is a reason for this because I have found none that are more than simple speculation. Any help would be amazing!

shadow geyser
#

Interesting. I haven't heard of that.

#

It really depend on the other language. If you go from English to French you'll be inventing a lot of stuff (e.g., you in English is both singular and plural) while going to French you just chop things out.

#

I wonder if it is an artifact of the BLEU score and other n-gram overlap metrics rather than a real thing.

#

(I'm thinking the BLEU metric has a length penalty and English is very concise, that'd impact the scores.)

silver cargo
# tepid palm It is generally acknowledged that NMT models' translation performance for Englis...

I’m not sure what you’re referring to is inherent or merely observed, I would think it’s an issue with data. There’s lots of English -> French and English to -> Spanish translations but perhaps less Spanish -> English translation so a multi lingual NMT system might learn a better representation of English simply because there’s a larger available corpus?

Another issue that might be better explored is the data distribution of the tokenizer which there’s almost certainly an over representation of English compared to other languages. https://aclanthology.org/2022.amta-research.8.pdf

If I had to theorize on an inherent cause; there might also be an issue with information density, spoken language is often around 39 bits per second regardless of language. https://www.sciencedaily.com/releases/2019/09/190905124520.htm However English is a relatively information dense language compared to Spanish and French (I think this is the correct paper https://muse.jhu.edu/article/449938/pdf) so it might also be harder to translate more information dense languages into information sparse languages? Especially when languages like Spanish have pronouns, articles and vocal stresses that English does not?

Most of this is speculative. I’m tired and probably rambling but I hope this is helpful gibberish.

shadow geyser
silver cargo
silver cargo
shadow geyser