Translation bot | French - Learn French in a friendly community! | Page 1

gloomy laurel Feb 15, 2023, 3:42 PM

#

What would you use as a translation service?

tawny light Feb 15, 2023, 3:43 PM

#

Two options, DeepL or a Mac Mini running the Meta NLLB model.

#

Well there’s plenty more options, but alas.

subtle gorge Feb 15, 2023, 3:44 PM

#

deepl api isn't free iirc

tawny light Feb 15, 2023, 3:45 PM

#

That’s correct, I currently run a free little service that runs on it. The cheaper option would be to run a self-hosted model as it isn’t subject to third-party fees. You’re just paying for electricity.

blazing knoll Feb 16, 2023, 2:57 AM

#

it might even be possible to host the model on the same instance that hosts Nostra

tawny light Feb 16, 2023, 10:16 PM

#

Depends how powerful it is, would be best to have something with an okay GPU so that it could translate relatively quickly.

#

I'm coming up with a POC right now.

blazing knoll Feb 16, 2023, 11:31 PM

#

idk if things have changed with modern models, but iirc unless you wanna run big batches of translations, its totally fine to run inference on CPUs

gloomy laurel Feb 16, 2023, 11:53 PM

#

I would be okay with that as long as I host the code on my VPS, for security concerns

tawny light Feb 17, 2023, 12:06 AM

#

blazing knoll idk if things have changed with modern models, but iirc unless you wanna run big...

I'll check, running on Mac the difference between CPU & GPU was pretty intense, and translations were quite slow on CPU, but I'm going to go ahead and try on a Linux machine with a high-end GPU and CPU and benchmark there to confirm that it's not just another fun M1 quirk

blazing knoll Feb 17, 2023, 12:22 AM

#

tawny light I'll check, running on Mac the difference between CPU & GPU was pretty intense, ...

ah shit, could you tell me which model you are using?

#

might be able to help

tawny light Feb 17, 2023, 3:54 PM

#

@blazing knoll

CPU vs. GPU, respectively.

{
    "translated_text": "Hello, what has prevented us so far from adding this kind of functionality is not the lack of bot but rather because we would not want this functionality. It would clutter the showrooms and introduce English to the \"French only\" showrooms. It is not a very big effort to copy and paste in an automatic translator",
    "translation_time": 11.155297994613647
}

#

{
    "translated_text": "Hello, what has prevented us so far from adding this kind of functionality is not the lack of bot but rather because we would not want this functionality. It would clutter the showrooms and introduce English to the \"French only\" showrooms. It is not a very big effort to copy and paste in an automatic translator",
    "translation_time": 1.233346939086914
}

#

This uses facebook/nllb-200-distilled-1.3B, which I have observed as giving the best performance & most accurate translations.

#

CPU is a 10700K, GPU is an RTX3070.

blazing knoll Feb 17, 2023, 5:05 PM

#

damn looks like model sizes have increased dramatically since I last messed with NLP models

tawny light Feb 17, 2023, 5:09 PM

#

This one is certainly quite meaty.

blazing knoll Feb 17, 2023, 5:09 PM

#

ok welp i dont think hosting the model is a practical solution

#

i just realised that deepl's API is pretty cheap, hosting a model will most likely end up costing more

tawny light Feb 17, 2023, 5:17 PM

#

I'll sponsor the first $100 of translations so we can test the water, if we want.

#

Honestly that'll probably last a year. 😆

#

Probably more

tawny light Feb 17, 2023, 6:14 PM

#

Here is a poc built with the NLLB model

#

but could easily be adapted not to use it

blazing knoll Feb 17, 2023, 7:14 PM

#

just for reference, deepl pro costs 7,49 $CA per month 👀

tawny light Feb 17, 2023, 7:17 PM

#

API has a cost-per-character attached to it.

#

So the flat rate + cost-per-char

blazing knoll Feb 17, 2023, 7:25 PM

#

ah shit... I didn't notice that

tawny light Feb 18, 2023, 1:39 PM

#

@blazing knoll @gloomy laurel Voilà, a poc
https://github.com/ridafkih/francois

GitHub

GitHub - ridafkih/francois: A French-to-English integrated translat...

A French-to-English integrated translation bot for usage in Discord. - GitHub - ridafkih/francois: A French-to-English integrated translation bot for usage in Discord.

#

Allows both using DeepL as well as an in-house translation model.

#

If the DEEPL_API_KEY environment variable is set, it will use DEEPL.

#

Otherwise it will try to use the Python service hosted at whatever URI you specify.

#

Built this quick message store as well which makes it so that if multiple users translate a message within a 10 minute period, it will only translate it once and provide that original translation. If users do it in quick-succession, it will do it once and provide the translation to all users at the same time.

If the message is edited and the content is translated again, it will run it again since the caching is done through the md5 hash of the message content.

https://github.com/ridafkih/francois/blob/main/src/utils/message-store.ts

#

The python service also has a language classifier built in, which is a much more lightweight model. This could be used to prevent wasting characters trying to translate non-French messages.

blazing knoll Feb 18, 2023, 8:43 PM

#

hey great stuff 👀

#

maybe we could try running it with a free API key

#

on this server i mean

#

I don't mean to criticise the code or anything, but wouldn't it make more sense for the translation service to make use of the message store instead of the other way around

tawny light Feb 19, 2023, 12:26 AM

#

blazing knoll I don't mean to criticise the code or anything, but wouldn't it make more sense ...

Yup. I was just knee-deep into the « too far » before realizing the way it should be. 😂

#

It kind of just ended up that way.

#

I can change it later.

#

I will either change it entirely or just change it to « translationStore » instead and change the method names. 🤷‍♂️

tawny light Feb 19, 2023, 3:04 PM

#

blazing knoll I don't mean to criticise the code or anything, but wouldn't it make more sense ...

Andddd fixed.

#Translation bot