#Translation Performance Gap
10 messages · Page 1 of 1 (latest)
Interesting. I haven't heard of that.
It really depend on the other language. If you go from English to French you'll be inventing a lot of stuff (e.g., you in English is both singular and plural) while going to French you just chop things out.
I wonder if it is an artifact of the BLEU score and other n-gram overlap metrics rather than a real thing.
(I'm thinking the BLEU metric has a length penalty and English is very concise, that'd impact the scores.)
I’m not sure what you’re referring to is inherent or merely observed, I would think it’s an issue with data. There’s lots of English -> French and English to -> Spanish translations but perhaps less Spanish -> English translation so a multi lingual NMT system might learn a better representation of English simply because there’s a larger available corpus?
Another issue that might be better explored is the data distribution of the tokenizer which there’s almost certainly an over representation of English compared to other languages. https://aclanthology.org/2022.amta-research.8.pdf
If I had to theorize on an inherent cause; there might also be an issue with information density, spoken language is often around 39 bits per second regardless of language. https://www.sciencedaily.com/releases/2019/09/190905124520.htm However English is a relatively information dense language compared to Spanish and French (I think this is the correct paper https://muse.jhu.edu/article/449938/pdf) so it might also be harder to translate more information dense languages into information sparse languages? Especially when languages like Spanish have pronouns, articles and vocal stresses that English does not?
Most of this is speculative. I’m tired and probably rambling but I hope this is helpful gibberish.
English -> French is just French -> English. Translation is bidirectional as it is meaning preserving.
Could you elaborate? I am not very versed in machine translation but that doesn’t make sense. Some information is lost through translation such as with Spanish where you can infer the addressed (us, we, y’all, I, you, they) off of verb conjugations where a translation into English would omit this detail.
Plus if this was true wouldn’t that discount your original contemplations and the original commenters assertion?
Nope. I was talking about the complexity of learning a model to go from a language that does not express certain information (say, relative age difference between speakers) to one that does. You're talking about training data.