#ilo Token - Rule based toki pona to english translator

992 messages · Page 1 of 1 (latest)

frozen bear
#

toki a! ilo sin pi ante toki li lon. ona li ante e toki pona tawa toki Inli mute a. toki Inli mute li kon ante pi toki pona.

taso, ona li sona lili li ken pakala. ni la, mi wile pona e ilo ni.

Hello! I made a Toki Pona to English translator. This tool translates into multiple English outputs showing many possible grammatical and semantical interpretation of the text.

However, it can only handle phrases, not sentences, at the moment.

https://ilo-token.github.io/

narrow helm
#

oh, I see. by "multiple sentences" you mean multiple different translations to English stemming from the multiple English words that each toki pona word could mean.

frozen bear
#

yep

narrow helm
#

that's cool! I'd thought about this idea in the past, and I think it may turn out more accurate more often than machine learning

frozen bear
#

ooh, thanks

narrow helm
#

basically by making the human figure out context lol

frozen bear
#

lol yes

frozen bear
#

hmm, I'm considering giving this project versioning and changelog so you can track the progress for this project

desert abyss
#

ah dang

frozen bear
#

yep, seme is currently unrecognized because that would make the punctuation a question mark, and I don't have a mechanism for that

desert abyss
#

just treat all punctuation the same?

frozen bear
#

what do you mean?

desert abyss
#

If the issue would be that it makes the punctuation a question mark, there's an easy solution: ignoring punctuation beyond sentence separation is entirely possible in toki pona

frozen bear
#

hmm

frozen bear
#

oh, this translator already ignores punctuation from the input, what I mean is that having "seme" would make the output translations have question marks, and I haven't implemented that

silver bay
#

@frozen bear Hi, developer of ilo Kukole here. How does this translator work? Are you parsing a sentence into parts and translating those individually then attaching them together again? Would you be interested in putting this into ilo Kukole?

silver bay
frozen bear
# silver bay <@428787756991381504> Hi, developer of ilo Kukole here. How does this translator...

hi, yes you correctly described what it does. the parser also handles some nuances: currently, it handles "a" whether it emphasizes the single word, the whole phrase, or the whole sentence, and then it treats those as possible as interpretations that will be translated (this atm is bugged, I'll fix it later). as putting it in ilo Kukole, I'm not sure how you'd do that but feel free, it's MIT licensed

silver bay
frozen bear
silver bay
#

I have already written sentence parsing in javascript if you’re interested

#

i can dig that up if it’s useful

frozen bear
#

oh nice

silver bay
#

the one problem is simply just that i can’t consistently parse prepositions

#

everything else is easily possible

frozen bear
#

I see

silver bay
# frozen bear I see

there’s always one solution… just… parse it both ways and give all the results… but this would get really long really fast

#

like is “tomo tawa mi” “my car” or “a building from my perspective”? you could just… do both…

frozen bear
#

yep, that's what I have in mind

#

btw, I'm going to separate parser and translation from frontend script so that I could publish it to npm

narrow helm
west vector
#

does it translate sentences or just words?

#

I tried "mi toki pona", translation stopped after mi.

#

then I tried "ona li toki", same

#

it seems it doesn't know about sentences with "li" ?

#

but keep it up, I'm looking forward to test it again in the future.

frozen bear
#

yep it can only translate phrases at the moment

frozen bear
#

this may take a lot of time to be complete

frozen bear
#

version 0.0.1 has been released. this is a small but substantial release where I made the word list and translation list a bit better by:

  • adding missed words (ali, pali, palisa, seme, and uta)
  • simplifying translation list such as deduplicating translation words that have mostly the same meaning. for example "other" and "different" are mostly similar, "different" has been removed and "other" has been kept
  • adding more translation words

more details here: https://github.com/neverRare/toki-pona-translator/blob/master/CHANGELOG.md#001

you may need to force restart the webpage to see the changes: (shift + click the restart button or ctrl + shift + R)

the webpage: https://neverrare.github.io/toki-pona-translator/

#

next stop: fixing bugs then implementing preverbs

frozen bear
#

along with preverbs, I think preverbs and extended numbering system would be easy to implement so I'll implement these soon

frozen bear
#

I think i'll be extra conservative about nanpa particle, it'll only accept "nanpa wan" and "nanpa tu" for now

#

oh, and actually, "nanpa tu wan", then "nanpa tu tu" then so on

#

for extended numbering system, lone "mute" and lone "ale" won't be recognized as specific number

#

since those are rarely recognized as specific number

#

hmm, should I handle X ala X construction? maybe later

frozen bear
frozen bear
#

.... there's a lot of bugs as it turns out

narrow helm
frozen bear
#

hmm, I could parse it but idk how to translate it

rough atlas
#

pona pi ilo Kukole la ni li ken pona tan ni: ilo Kukole li pana e ante toki wan taso. taso, ilo ni li pana e ante toki mute

#

wile mi la ilo ni li kama pona 👍

lament solar
#

tenpo pini la mi lukin pali e ilo sama

#

taso mi awen ala li pini ala

#

sina pini la wawa

fickle stag
#

@frozen bear o, ilo li ken ala sona e toki sama ni: "o ... la ..."

o pana e kili la mi ken kipisi

somewhat uncommon structure but I've seen a number of folks use it and arrive at it independently, so it seems worth considering

frozen bear
fickle stag
#

When it can't parse with la there, attempt parsing both sides as their own sentences

frozen bear
#

I could parse it, but I'm not sure with translation
something like "please give food. then I can cut"? hmm, maybe I overestimated it lol. this is programming, it's hard to estimate difficulty

#

btw, ilo Kokule uses machine learning, while this translator uses more traditional programming, everything is hardcoded. I have a feeling that the ideal translator for toki pona is somewhere between machine learning and traditional programming

#

something like

toki pona sentence -> hardcoded translator with a bit of AI or Machine Learning for edge cases -> grammar checker through AI -> Engligh sentences

though, this is a project for another time, I don't think I'll make this in the future

fickle stag
#

Yeah decision tree with some basic lexing should do the job for a glosser

#

Being explainable is a feature, one that ML models lack

frozen bear
#

TIL, verbs acting like noun is called gerund, this is something I always use for this translator, things like translating "kama li" into "arriving is". I'll use the term gerund for documents like changelog

silver bay
#

ilo Kukole currently works like this

toki pona sentence -> machine learning to toki pona -> parse parts of english sentence out -> "ungenderize" english output if the toki pona input doesn't include meli or mije -> grammar checker through AI -> english sentence```
#

but i think the ideal translator would be

toki pona sentence -> parse parts of toki pona sentence out -> translate parts of sentence individually and in multiple ways -> put them back together in multiple different ways while trying to keep english grammar intact -> grammar checker through AI -> english sentence```
rocky pulsar
#

concept that idk if it'll work: put all the translations through a language model to measure perplexity (inverse likelihood), then pick the one with the lowest

silver bay
#

i have done some work on parsing parts of speech out of a toki pona sentence

#

but i got stuck on prepositions

#

i have no good way to determine when something is a preposition and when it is not

silver bay
# silver bay

the input sentence here was "sina ken pana e sona mute tawa mi li ken moku e kili." (I needed a sentence with like... everything it could possibly want to parse out of it so i rambled lol)

#

but is "sona mute tawa mi" "my moving much knowledge" or "much knowledge to me"? a human understands that there is no universe where "my moving much knowledge" is correct, but how do you reliably determine which it is programatically?

#

remove "mute" from the equation and it becomes much harder for a computer to determine if theres a preposition

rocky pulsar
#

gonna copy some random code from reddit

#

kinda eh i'll admit

#

now adjusted for length

#

i can't really tell how well it works tbh

#

sentences could help with determining that

#

for multiple sentences perhaps process just translations of the first one, then the best translation of those combined with all of the second, etc
to save on computation because exponentials are fun

#

or perhaps instead of best 1 maybe best n
but perhaps somehow filter out similar ones?

fickle stag
#

I know when I was in early learning, the harder parts of sentences were when they were longer and I had to try to juggle the grammar

#

li and pi and la and e and prepositions and prepreds

narrow helm
fickle stag
#

ambiguity la that's why it helps that this project aims to provide multiple outputs for a given input

frozen bear
#

Version 0.0.2 has been released

For this version. Major bugs related to phrase translation has been fixed. The translation lists has been updated as well.

I'm mainly procrastinating from actually implementing whole sentence translator lol.

You may need to force restart the webpage: shift + click the restart button; or ctrl + shift + R.

More about the version: https://github.com/neverRare/toki-pona-translator/blob/master/CHANGELOG.md#002
Toki Pona Translator: https://neverrare.github.io/toki-pona-translator/

rocky pulsar
# rocky pulsar there used to be a website that compressed english text using gpt-2 but it doesn...

i wonder how well really tiny models would fare, since that's kinda necessary for client-side processing
what comes to mind is pythia-19m, which is 166mb by default but half of that is only used for training and half of the other half can be trimmed by going down to 16-bit (and another half if 8-bit)
there's also bert-tiny but i'm not sure how to get that to work, perhaps mask each word individually and get likelihoods from that?

frozen bear
frozen bear
#

so for noun phrase translations, there can be "(adjective) (noun)" and "(noun) of (noun)". for adjective phrase, there can be "(adverb) (adjective)". I figured there can also be "(adjective) in a (adjective) way". this is also applicable for verb phrase tho it's currently in todo list. I'll implement this later

frozen bear
# frozen bear

here we have "sweetly animal-like", I believe this is a valid translation of "soweli suwi" lol, for next update or later, there will be "animal-like in a sweet/cute/innocent way"

#

soweli lon nasin suwi wan /toki ike

narrow helm
#

suggestion: instead of two translations for single/plural, use tool(s) etc.

frozen bear
#

this has been a second time this is suggested, sure, I'll consider this

#

but for now, I want to make things done

frozen bear
narrow helm
#

yah it's not strictly necessary, it's just a way to cut the number of translations in half

frozen bear
#

whew, it's been a while

narrow helm
#

toki :3

frozen bear
#

I have so many projects. the development is going to be slow

fickle stag
#

pali li awen pona

frozen bear
#

okay, I think it's time to parse predicates, maybe I'll go simple for now, maybe not now, there's still needed to parse to phrases like preverbs, numbers, and ordinals

frozen bear
#

it definitely needs improvement

#

maybe I should use pu definition

#

pu and ku that is

#

I changed my mind, I need to work on parser and translator than to bikeshed on translation list

#

I'll make an assumption that a phrase can only have a single string of number words

#

I mean, single string of extended numbering system, so for things like "jan tu wan pona tu wan" won't be translated as "3 good 3 person", whatever this means

#

maybe I should ban it

#

only for tu and wan that is, luka may still be translated as "hand", hmm, I should also ban multiple instance of "mute" and "ale/ali"

#

by multiple instance, I mean something like "jan mute pona mute"

#

hmm, I'll try to formalize this

#

luka luka may still be translated as hand of hand, is it a finger?

#

hmm, this is quite complicated lol

#

of course, normally toki ponners won't say something like "jan tu wan pona tu wan" but I still need to account for it

#

yeah, I definitely need to formalize this before implementing it

fickle stag
#

it's free, json format

frozen bear
#

boop

#

I may return working on this project

oblique idol
#

oh i thought this was something new
really cool idea! i can see something like this being very useful down the line

#

silly question, does it not know prepositions? I don't think i see that in the limitation list

frozen bear
#

foe now, it can only translate phrases, I should be clear about this

frozen bear
#

Limitations

The following are currently unrecognized (non-definitive but pedantic).

  • Full sentences: It can only translate phrases for now. The following limitations pretends the translator can translate full sentences, this is because these are planned limitations.
  • ...

updated the limitation list. I should have added this way before lol

frozen bear
#

boop

#

I currently have semestral break

#

so I think I'll develop this further

#

I'm going to rewrite the parser so it's more declarative

#

there has been an update that's just sitting there while I was away so I'm going to release it

0.1.0

  • Add "(adjective) in (adjective) way" translation.
  • Handle complicated phrase as error.
  • Rearrange output to make adjective phrase appear first.
  • Add basic "la" translation: "given X, Y" and "if X, then Y".
  • Fix multiple o error being triggered when there's only one o.
  • Update translation list:
    • ante – change "other" into "different", "different" have broader meaning than "other".
frozen bear
#

oh btw, this project is at least 1 years old now! I am not proud of that

frozen bear
#

I'll try to make it parser combinator style, it outputs all possible interpretation so it's going to be a little different

#

I might also use typescript

frozen bear
#

I'm rewriting it and oh my god, I love this style of programming, I wish I did this earlier

frozen bear
#

hmm, I'm planning to implement nanpa particle. should I limit what word it follows? if I allow any words, how should I translate it?

#

I'd allow all number words as well as ale and pini for last

#

hmm, I'll limit it for now, nanpa can only be followed by number words (wan, tu, luke, mute, ale) as well as ale and pini for last

frozen bear
#

maybe I'll add "open" as well

narrow helm
#

lipu tenpo uses nanpa for "edition", which i think is really handy. "fire edition", "sky edition", etc

frozen bear
#

how can I translate it? we need a general way to do so

#

"book in position fire" "book in position sky", how does this sounds?

narrow helm
#

that's a pretty good compromise if you're trying to make nanpa translate to a single word only

#

cause "position one" etc clearly means "first" etc

#

sounds good

frozen bear
#

hmm, alright, I'll implement this

frozen bear
#

I'm going full typescript

#

it catches a lot of error, nice

frozen bear
#

I'll drop the support for "a" at the moment

#

a tenpo pi toki pona taso li lon

#

kin la mi weka e nimi "anu"

frozen bear
#

look at this, it's so declarative!

function modifier(): Parser<Modifier> {
  return choice(
    wordFrom(CONTENTWORD, "modifier").map(
      (word) =>
        ({
          type: "word",
          word,
        } as Modifier)
    ),
    properWords().map((words) => ({
      type: "proper words",
      words,
    })),
    specificWord("pi")
      .with(fullPhrase())
      .map((phrase) => ({
        type: "pi",
        phrase,
      })),
    specificWord("nanpa")
      .with(fullPhrase())
      .map((phrase) => ({
        type: "nanpa ordinal",
        phrase,
      }))
    // TODO: cardinal modifier
  );
}
#

ni li wawa mute. mi ken pali kepeken tenpo lili

frozen bear
#

and that's where declarative programming bites, I now have infinite loop

#

it's now resolved

frozen bear
#

okay I did it, The whole parser (minus "a" and "anu") is now complete! Now all that's left is the translator

narrow helm
#

looks cool, ... wish i knew what those words mean cause i like programming and i'm good at it but i only know BASIC.

#

anyway, is there anywhere/any way i can try it out?

frozen bear
#

I'm going to prepare that

#

be warned, the parser is a little too loose, it will output things that I plan to filter out when writing the translator. Anyway, here's a way to test it out, it's a little bit convoluted:

  1. you'll need deno, install it
  2. git clone the repo https://github.com/neverRare/toki-pona-translator.git
  3. make a file called "test.txt" within the cloned repo, this is where you're going to input a toki pona sentence
  4. run deno run --allow-read path/to/test-parser.ts be sure to replace path/to/test-parser.ts

oh and btw, you'll need to know how the ast is structured, you can see it here: https://github.com/neverRare/toki-pona-translator/blob/master/src/ast.ts

frozen bear
#

I keep forgetting about "X ala X" construction, I think I can easily implement it in the parser but not for now

frozen bear
#

I've forgotten about "e" particle

#

I think I'll look at toki pona cheat sheet now

frozen bear
#

things that I've missed

  • "a" particle (intentional, this particle is too flexible)
  • "anu" particle
  • X ala X constructions
  • Extended numbering system
  • Comma
  • "e" particle
  • "anu seme" as special suffix
#

Before going to making the translator, I'll finish up the parser. dang I thought I was almost complete with the parser

#

I'll double check to see if there's more that I've missed

#

@thorny current you were the one who made the parser right? I'm in the process of rewritting mine! I now use parser combinator, typescript instead of javascript, and modules to seperate out the codes into different files

edgy treeBOT
#

aaaaa pona a!

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) @thorny current you were the one who made the parser right? I'm in the process of rewritting m…

#

mi sitelen sin e ilo sona mi a a a

frozen bear
#

sama a

edgy treeBOT
#

sina toki e ni a a a

#

mi wile pini e sitelen sin lon tenpo lili kama

frozen bear
#

a! o pali pona a

edgy treeBOT
#

mi pini e ni la mi o pana ala pana e ni o mu tawa sina?

frozen bear
#

o ni

edgy treeBOT
#

I can help you with that!!! I’ve made a relatively fast pre-parser grammar checker.

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) be warned, the parser is a little too loose, it will output things that I plan to filter out when wr…

frozen bear
edgy treeBOT
#

it’s relatively accurate too!

#

that’s the word I was looking for 💀

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) ah don't worry, I plan on making a filter

#

mi lon tomo pali lon tenpo ni la mi ken ala pana e ni. taso mi lon tomo mi la mi ken ni!!!

frozen bear
#

ah, you said pre-parser filter, I meant I was going to make a post-parser filter. by "it's a little loose" I meant it will parse something like "ilo pi kalama suli" into "ilo (pi kalama) suli" where suli modifies ilo. it also correctly parses it, the parser takes account of all possibilities. I will filter out badly parsed AST's

edgy treeBOT
#

aaaa mi sona a!

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) ah, you said pre-parser filter, I meant I was going to make a post-parser filter. by "it's a little …

#

When you get some time, could you explain how your system works?

frozen bear
#

sure!

#

maybe later, soon I have appointment to my doctor

edgy treeBOT
#

I’d love to compare yours to mine

#

Sounds good!!!

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) maybe later, soon I have appointment to my doctor

frozen bear
#

the code is now a lot cleaner now making use of functional programming

#

I might put documentation comments in it to make it even more accessible

#

seriously, it's very functional, you won't see for loops nor while anywhere in the new code

frozen bear
#

parser combinator is really good especially if you're used to functional programming, but it can take a while to explain

frozen bear
#

@thorny current alright, I made an explanation, but it's very much incomplete, it only explains an introduction to parser combinator as well as the Output data type which represents all possible outcome of parsing, translation, etc. Later I'll make an explanation for each combinator and parser. Here's the link: https://github.com/neverRare/toki-pona-translator/wiki

edgy treeBOT
#

Oh!!! You’re using the same method I used!!!

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) @thorny current alright, I made an explanation, but it's very much incomplete, it only explain…

frozen bear
#

wait... really?

#

that's so cool

edgy treeBOT
#

Yeah, I think!

#

I’ve had no formal practice doing this but

#

I generate “interpretations” which give each token a potential part of speech

#

So, something like “toki pona” returns four interpretations - both of them as interjections, one of them as an interjection and one as a content word (x2) and one where both are content words

#

Assuming both toki and pona can be content words and interjections.

frozen bear
#

can you give another example? I don't get it 😅

edgy treeBOT
#

Okay, okay, okay, so:

#
toki -> ["content", "interjection"]
pona -> ["content", "interjection"]

"toki pona"
 -> [("toki", "content"), ("pona", "content")]
 -> [("toki", "content"), ("pona", "interjection")]
 -> [("toki", "interjection"), ("pona", "content")]
 -> [("toki", "interjection"), ("pona", "interjection")]
#

Something like this.

frozen bear
#

I don't get how words become interjection. seems like a special grammar rule

edgy treeBOT
#

It is, I'm just giving an example.

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) I don't get how words become interjection. seems like a special grammar rule

#

In sona toki v4 I'm rewriting a bunch of rules.

frozen bear
#

ahh, nice

edgy treeBOT
#

I've also included a filter, which lets me just

#
----------------------------

[Interpretation(('ContentToken', 'toki'), ('ContentToken', 'pona'))]
-> [ContentPhrase(['toki', 'pona'])]

Total interpretations: 1

----------------------------


Completed parse in 0.0s (1.0ms).
#

In Python, too.

#

(well, pypy (JIT-Compiled Python 3.9)

#

I need to update the sona toki repo on Github soon.

#

"sina ken ala ken toki tawa mi"

-> [IgnoreLiSubject('sina'), ContentPhrase(['ken', 'ala', 'ken', 'sona']), DirectObjectParticle('e'), ContentPhrase(['toki', 'mi'])]

-> [IgnoreLiSubject('sina'), VerbPhrase(['ken', 'ala', 'ken', 'sona']), DirectObjectParticle('e'), ContentPhrase(['toki', 'mi'])]

-> [IgnoreLiSubject('sina'), ContentPhrase(['ken']), YNQuestionParticle('ala'), ContentPhrase(['ken', 'sona']), DirectObjectParticle('e'), ContentPhrase(['toki', 'mi'])]
frozen bear
#

ooh, I see

edgy treeBOT
#

mhm!

#

the final thing I need to do is add all of the parser rules

#

specifically the condesnation rules

#

*condensation

#

then I'm done with the v4 parser!

frozen bear
#

what's condensation rules?

edgy treeBOT
#

(sans new grammar-checker, etc. might talk to jan Nikola)

#

again - self taught, so I don't know the proper term

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) what's condensation rules?

#

but I take a sentence and condense tokens into meta-tokens or "token groups"

#

start from the base parts and combine them

frozen bear
#

ahh

edgy treeBOT
#

like alchemy, or chemistry, or any other sort of composition-related field

#

it's a shift-reduce parser (or so I've been told)

frozen bear
#

can I see the codes?

edgy treeBOT
#

old repo

#

will open the new one in an hour or so

frozen bear
#

ah, ping me if you do so

edgy treeBOT
#

will do!

narrow helm
frozen bear
#

ah sure

frozen bear
#

okay, I think these are the things that I missed and hopefully I'm not missing more

  • 🔲 "a" particle (intentional, this particle is too flexible)
  • ✅ "anu" particle
  • ✅ X ala X constructions
  • ✅ Extended numbering system
  • ✅ Comma
  • ✅ "e" particle
  • ✅ "anu seme" as special suffix
  • ✅ Quotation
edgy treeBOT
#

oh gods the comma

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) okay, I think these are the things that I missed and hopefully I'm not missing more

  • 🔲 "a" particl…
#

don’t touch that if you wish to remain sane or your program to be fast

frozen bear
#

that way I've dealt with it before is to simply ignore it like it's a space

edgy treeBOT
#

I treat it as a full stop.

#

Although, there are certainly better ways.

frozen bear
#

thanks for the warning, but I think I know how to handle it

edgy treeBOT
#

Let me know what you come up with! I’d love to improve how sona toki deals with them - the main issue I find is speed.

frozen bear
#

sure!

frozen bear
#

OMG, I tried parsing a complicated sentence:

"mi la tenpo kama la ma li kama ante tan kasi pi mute lili tawa kasi pi mute suli."

and it outputs 1014468 different AST's

#

it's quite slow as well if you've given it complicated sentences

frozen bear
#

I could speed it up if I implemented a filter inside the parser instead of making them separate. hmm 🤔, yeah I'll go this route

#

I'll still write the filters at a separate file then it'll be used by parser in different places

frozen bear
#

Implementation of comma: more info

  • ✅ In place of period
  • ✅ Before "en"
  • ✅ Before "li"
  • ✅ After "o" vocative
  • ✅ Before "o" imperative predicate
  • ✅ Before "e"
  • ✅ Before prepositions
  • ✅ Before or after "la"
  • ✅ After "taso"
  • ✅ Before "anu seme"
frozen bear
frozen bear
#

@thorny current I have just sped through implementing comma in my parser... And this explanation might be boring but uhh, I used parser combinator to define it.

For example, I have anu seme parser and I wanted for it to accept commas.

sequence(specificWord("anu"), specificWord("seme"))

I simply added optional(comma()) parser to it.

sequence(optional(comma()), specificWord("anu"), specificWord("seme"))

People might use commas in place of periods, in that case, I simply allow sentences to end in comma:

many(sequence(sentence(), match(/[.,:;!?]/)));

The code is a lot different than this btw. This is just for demonstration

As you can see, I simply defined comma in a very declarative way. I keep saying declarative so I'll explain what it is just so we're on the same page.

Imperative programming means telling your computer what to do; Declarative programming means telling your computer what you want.

Of course that just can't happen, you'll still need make something complicated for it to do something complicated. In my case, all the complexities are hidden away inside the combinator. I haven't really explained how combinators works and I'll explain it some time...

frozen bear
#

I think this number is going to vary a lot everytime I make changes to the parser and will only get meaningful once I add a filter

#

it just dropped down to 357

frozen bear
#

"a" and "anu" are two things I haven't implemented in the parser, and I think I'll move on now, I'll be making:

  • duplicate checker - this will check if there's duplicate AST's. this will be only used in development and wouldn't be used by the final code. this ensures our parser is as optimal as possible
  • filter - since our parser is too permissive. we'll need a filter. this also speeds up our parser (that's the term I was looking for, I was using the word "too loose")
  • translator - no description needed
frozen bear
#

there are more thins that I missed in parser lol:

  • nested preverb
  • preverb then preposition
frozen bear
#

I'm just realizing there's probably more

#

I'll ignore them for now

#

I feel compelled to implement them for whatever reason lmao

frozen bear
#

here's one big thing that I missed. e objects could be associated with which li predicates? I think I'll handle all possibilities since it's becoming a theme that my parser is very permissive

#

examples:
first: "ona li jo li pali e ilo" they have and make tools
second "jan ike li ike li pakala e ijo mi" evil person is evil and breaks my things

frozen bear
#

prepositions may also be associated with many li predicates, but I'll handle it the same way as objects

#

I'm going to assume prepositions is always followed by objects, if there's any

#

dang, this is hard

frozen bear
edgy treeBOT
#

it is 😔

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) dang, this is hard

frozen bear
#

"making a parser for toki pona should be easy, it have simple grammar rules!"
famous last words

edgy treeBOT
#

very much so

soweli Koko ᜃᜓᜓᜃᜓᜓ ↩️

[Reply to:](#1053538532993548320 message) "making a parser for toki pona should be easy, it have simple grammar rules!"
famous last words

dull warren
frozen bear
#

I can implement "anu" in subject, not yet in predicate, I think I'll do that

#

Also, I'm planning to make two separate filters:

  • the obvious filter where it will filter obvious parsing mistakes such as parsing "ilo pi kalama suli" as "ilo (pi kalama) suli", this filter is going to be tightly integrated into the parser, I think this is enough to speed up the parser
  • the optional filter such as filtering multiple pi or filtering multiple seme. It will be less integrated and will be individually manageable in case I decided to support them later.
    I need better names for these two filters
#

I think the obvious filter doesn't need a name, it's inside the parser. and optional filter will just be named "filter"

#

and I think I also need to take a rest, a whole day off

#

call it two days off

#

I'll leave this thread, see ya!

frozen bear
#

oookay, tommorrow I'll continur

frozen bear
#

"anu" has finally been implemented

#

all that's left is "a"

frozen bear
#

okay, there's a bug that prevents parsing "anu" in predicate

frozen bear
#

okay, I fixed it

frozen bear
#

I think I said I'm going to skip parsing "a", I think I'll do it, I'll implement it

frozen bear
#

I think I'll also implement reduplication as well

frozen bear
#

eh, I'll skip implementing "a"

#

I don't think it'll add much to the AST, I mean, if I implement it late, I'll hopefully only need to change the parser, not much will change in the translator, filter, or other part

#

"a" is very very flexible

frozen bear
#

I'm starting to write filter functions

#

I need a proper AST walker

frozen bear
#

The filter is now done, well I could add more to there but all basic filtering is done now

#

Now finally onto translator

#

Btw, the parser is so so slow, I thought adding filters would speed it up a bit, it's still slow as hell

#

and the error message is so unreliable

#

I've actually added useful error messages but the parser deals with multiple errors and it have to choose one, and it chooses bad ones lol

#

maybe I could aggregrate error messages, and sort them by rank

#

oh well, I guess these are all compromises I've chosen for speedy development

frozen bear
#

for translator, I think I'll just port the old code

frozen bear
#

the parser combinator is so versatile that most of the time, you don't need to make a lexer for it. so I did it, but I realized I need to make a lexer to be able to parse "a"

#

the lexer could also pre-parse some of the thing

frozen bear
#

here's our final-ish overview of how our translator works

#

it's "our" because it's MIT licensed epiku_kijetesantakalu

frozen bear
frozen bear
#

before making the lexer, I'll make the small part of the translator first

frozen bear
#

my next milestone will be to be on par with the old code, meaning being able to translate phrases, then I'll release it on the website (I really wish github pages supports deno for bundling, I haven't tested it lol)

frozen bear
#

I could add more since we have full blown parser unlike before, I could support "en", "anu", and "o" vocative

frozen bear
#

I need your help, we have a general way to translate "nanpa" constructions: "lipu nanpa wan" is "book in position one". what if it's an adjective? let's say "pona nanpa wan" which usually means "best". how can we translate this in general way?

#

"good in position one" doesn't make sense lol

#

hmm, or does it 🤔

frozen bear
#

I'll stick with "good in position one" for now

frozen bear
#

the code is very much ready to ship now, although I'm still figuring out how to deploy it making use github pages and deno

#

btw, deno only bundles the code, it doesn't serve anything so hopefully it can work in github pages

frozen bear
#

6 pull requests later... it is now deployed!

#

0.2.0

For this version. The whole code has been rewritten. The translator can now translate few more things! Although it's still not capable of translating full sentences.

  • Implement translator for:
    • Extended numbering system
    • Reduplication
    • nanpa particle
    • en and anu
    • o vocative like "jan Koko o"
  • Add button for translating, replacing auto-translate when typing.
  • (Downgrade) Drop support for "a" particle.
  • (Downgrade) Error messages are now very unreliable.
  • (Downgrade) Translator is somewhat slower.
  • Remove Discord DM as contact option.
  • Update translation list:
    • tonsi – change nouns "transgender", "transgenders", "non-binary", and "non-binaries" into "transgender person", "transgender people", "non-binary person", and "non-binary people" (I DIDN'T MEAN TO OBJECTIFY THEM OMFG I'M SO SORRY 😭😭😭)

Inside update (intended for developers):

  • Rewritten whole code to use TypeScript, module, and functional programming.
  • Rewritten parser to use parser combinator.
  • Add language codes to html.
  • New wiki for contributors and thinkerers.
  • Overhaul README.md, only including build instruction. Information about the translator is now moved to wiki.

https://neverrare.github.io/toki-pona-translator/

You may need to force restart the page

frozen bear
#

I'm proud of this one

frozen bear
#

the definition list needs updating, I think that's what I'll solely do for the next update

frozen bear
rocky pulsar
frozen bear
sly prism
#

ni li wawa

rich fulcrum
#

it doesn't understand X ala X yet it appears

frozen bear
#

ah yeah it doesn't, I should put that in limitation list to be clear

#

it probably recognizes X ala X as series of modifiers

rich fulcrum
#

yeah

rich fulcrum
#

taso and kepeken seem to be completely broken, nothing appears when i use them

#

a lot of phrases over three words break with the output being "no error provided"

#

actually

#

it happens with words that are only "of _"

#

soweli tomo ilo for example breaks

#

soweli tomo tomo does not tho

#

pilin and lukin should have verb forms. and pilin should also mean heart

#

also position words too

#

anpa is not treated as an adjective

#

er

#

hmm

#

the current definitons of anpa should be given to noka instead

#

hmm

#

nvm

#

but lowly, humble should be added to anpa

#

as well as below

#

yeah linku doesn't define noka as below which is strange cus its pu

#

so ok

frozen bear
#

thank you so much for testing it out

rich fulcrum
#

its very cool though! i like the approach

frozen bear
#

thanks

frozen bear
# rich fulcrum taso and kepeken seem to be completely broken, nothing appears when i use them

"taso" is intended to be only recognized as either modifier or particle at the beginning of the sentence, it cannot be a headnoun. I've tested "ijo taso" and it doesn't work, so that's a problem. I've also tested "taso pona" and it also doesn't work. grr, I'll investigate this later

I intend "kepeken" to be recognized solely as a preposition. translating preposition isn't implemented so yeah

frozen bear
#

about the definition, I'm working on it! and thanks for the feedback, i'll keep these in mind

#

oh btw, if it doesn't output anything, not even an error, that's a problem

#

ah I see, if it outputs nothing not even an error, the error is logged in the console in the developer window. dang! I thought this would never happen

frozen bear
#

I think using "noka" to mean "down" is a non-widespread usage. people nowadays use "anpa" instead

frozen bear
#

The translator gives up on the phrase "soweli tomo ilo", this should be possible! the reason it gives up is there's no adjective translation for "tomo" and "ilo", so there's no <adjective> <adjective> animal and no<adjective> animal of <noun>. Should I just use multiple "of"? With multiple "of", it would be translated to "animal of house of tool". hmmm

#

maybe with "and" instead? "animal of house and tool"

#

I don't want to use "and" because phrases can have "en", this would complicate the translation. buut, I suppose I have to resort to this, it's better to have output than none. maybe "and" will only be used if it gave up with only having a single phrase after "of"

#

what do ya'll think?

rocky pulsar
#

anu "tool house animal" taso

frozen bear
#

ken

#

I'm actually using "-like" already

frozen bear
frozen bear
#

I might use them all lol

frozen bear
# rocky pulsar anu "tool house animal" taso

I'm afraid this is going to produce ambiguity. "ilo kalama suli" would be "big sound tool", "big" can either modify just "sound" or the whole "sound tool". so that's off the list

frozen bear
# frozen bear I might use them all lol

I'm not actually going to do this, I think some form have overlapping semantic meaning:

  • "tool house animal" overlaps with "animal of house and tool" - I'll choose the latter since the first one is undesired due to previous comments
  • "-ish" overlaps with "-like" - I'll choose "-like" since this feels more formal
  • "-related" - I think this one have extra semantics so I'll add this as well
sly prism
#

big sound-tool

frozen bear
#

a

frozen bear
#

hmm, in cases where there's 3 or more noun words, it gets a bit nasa: "tool-house-animal"

frozen bear
#

yeah, I think this is enough

frozen bear
#

I'm thinking of adding options for the users:

  • singular and plural options: show all, condense thing(s), show only singular
  • grammar check output translations
  • randomize order
frozen bear
frozen bear
narrow helm
narrow helm
#

mi pali e toki musi pi kalama musi e nimi "olin sina mi" pi kon mute

#

ilo sina li sona e kon mute ni ale

#

my your love
our your love
my love of you
my love of you all
our love of you
our love of you all
your love of I
your love of me
your love of we
your love of us

frozen bear
#

oh wow, these are great translations, the first two is a bit eh

narrow helm
#

ona li ken sama nimi "love of yours and mine" "love of yours and ours"

#

taso ilo li kama sona e ni la ... ilo li kama sona e toki Inli. pali suli

frozen bear
#

I'm kinda getting burned out by this project. expect another hiatus. my semestral break is about to end anyway.

P.S I hate working on on the definition list, everytime I make changes on it, e.g. finally adding verb list, I have to edit like 120 of them, sometimes I could automate it but sometimes I have to do it manually

frozen bear
#

notes for future me: maybe it's better to divide the translator in to two parts: tokiponish AST into englishy AST then finally into english sentences

frozen bear
#
sona pona

Reduplication is repeating a word exactly or almost exactly to form a new word or phrase. In English, it occurs in words like "bye-bye" and "zigzag", in contrastive focus reduplication, and with "schm-".
Reduplication has often been suggested for Toki Pona, generally as an intensifier (for example, suwi suwi for "really cute"). However, it is no...

narrow helm
#

sona pona la. ona li sona pona ( ͡° ͜ʖ ͡°)

frozen bear
#

I found a better name for this project: ilo Token, from ISO codes of toki pona and english: TOK and EN. Token is also a jargon in parsing, which is fitting, I love it

#

I will rename it in the next update

#

and this will be the icon or name glyph

#

ilo Token - Rule based toki pona to english translator

frozen bear
#

you know what, I'll rename it now

#

0.2.1

The project has been renamed to ilo Token. The definition list has been given a huge overhaul.

  • Change name to ilo Token.
  • Remove unintended commas, these were found when translating "en" with more than 3 phrases.
  • Remove copyright and license footer.
  • Update definition list:
    • It now uses latest Linku definition as the base.
    • Include verbs for later use.
    • Include interjection for later use.

https://neverrare.github.io/ilo-token/

sly prism
#

wawa

#

do I send this on ma pi nasin sitelen

#

or have you already

desert abyss
rocky pulsar
desert abyss
#

a

#

taso

#

mi pana ala e nimi ni

#

nimi La anu seme

rocky pulsar
#

ona li lon insa pi nimi mi

#

kulupu en taso li ken

#

lon ilo pi tenpo lon

frozen bear
frozen bear
#

real quick, I'm trying to improve the error messages, although it still sucks

cerulean pike
#

kkkpo
kj

sly prism
#

so true bestie

frozen bear
#

okay, I fixed a bug, this isn't the best error messages but eh

#

I could borrow some codes from telo misikeke if I really want better error messages

#

oh no, I found another bug

frozen bear
frozen bear
#

I'm going to implement a lexer, with it I will implement "a" and ucsur

frozen bear
#

one thing I like about developing ilo Token is I discover interesting edge cases of toki pona

#

here's an example, do people do this?

lament solar
#

ken

#

nimi 'ala' li nasa lili

desert abyss
#

ken

frozen bear
#

ni la ilo Token o sona e ni

frozen bear
#
===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 CSS                     1           54           54            0            0
 HTML                    1          132          132            0            0
 JSON                    1           17           17            0            0
 Markdown                3          242            0          191           51
 Plain Text              1          138            0          138            0
 TypeScript             17         4465         3911          528           26
===============================================================================
 Total                  24         5048         4114          857           77
===============================================================================
frozen bear
#

ilo Token now uses telo misikeke for error messages! you can turn this off if you want real ilo Token error messages (it sucks). this will be available on next update

frozen bear
#

parser is a monad

#

it's a fucking monad

#

what the parser returns, at least in my case, is also a monad

#

everything's a monad

frozen bear
#

so parser is some kind of State monad

#

I'm being infected by Haskell, help

#

I should focus on implementing whats missing

#

but hey, with these theoretical knowledge, I'm able to make things a little more easier and a little more efficient

#

like now the sequence combinator is now defined in terms of monadic apply functions, instead of whatever I did there last time

#

I could use applicative functor instead but its only really useful if we solely use currying, typescript ain't that

#

damn, I wish I could use the applicative functor, people says monad is overrated

#

anyway, anyway, focus! I still need to implement combined glyphs, long gylphs, and "a"

frozen bear
#

ilo Token is quite faster now, all of those monad senanigans were for good

#

learn monad kids

#

hmm, I might experiment with making use of lazy iterators 🤔

#

okay, okay, focus!

#

I'll implement combined glyphs first

frozen bear
frozen bear
frozen bear
#

actually, I've skipped these

frozen bear
#

starting to implement long glyphs!

vagrant girder
#

ilo ni li lukin wawa a

lament solar
#

long glyphs are not grammatically relevant

frozen bear
#

I disagree, it can be used for disambiguation for example

edgy treeBOT
#

I should seriously get back to work on my parser jeez

#

I can't believe I haven't touched it in ~6 MONTHS

#

time has truly flown

#

this is seriously awesome, @frozen bear!

frozen bear
#

aww, thanks

edgy treeBOT
#

I think I'm going to need to catch up to date on all of the things you've been up to

#

And finally add those last parsing steps to my parser

#

Then finally (finally!!!) update to sona toki V4

#

and possibly change the name lol but you know it's fine for now

#

the backlog is getting extensive so I should get back to it soon

frozen bear
#

about the names, I can remember being hesitant to give it a name, it was "Toki Pona Translator", but then I found a perfect name "ilo Token", I love it so much so changed it lol. "Toki Pona Translator" sounds fucking boring, I'm glad I found a name

fickle stag
#

the book Functional Programming in Scala has an entire section on deriving the monadic shape for parsers

lament solar
rocky pulsar
upbeat salmon
#

ante la nimi "lo"

frozen bear
#

I should stopmyself from starting this another project but will a tuku tiki translator would be easier or harder to make? I haven't looked much at tuku tiki. eh, I wouldn't do this, just another work

rocky pulsar
frozen bear
#

aa, sona a

frozen bear
#

okay, I'll start with "a" parser. acording to sona pona: "a" is usually used to emphasize a word, a phrase, or a whole sentence

#

I'll do words and sentence first before phrase

frozen bear
#

I've considered "a" and "a a a", but I need to consider "aaaaa" as well, hmm

frozen bear
#

the parser is almost complete now, I'm going to make english AST now

frozen bear
#

preverbs are perhaps the hardest to translate

frozen bear
frozen bear
#

UI design is my passion

sly prism
#

wawa

frozen bear
#

I might redesign the UI

#

yeah, to make it more touch friendly, I might make it even mobile-first

frozen bear
#

this is much better

rocky pulsar
#

waw a

frozen bear
#

sina pona

#

I should make it darkmode friendly as well

#

hmm, I think this is good enough

vagrant girder
frozen bear
#

sure

frozen bear
#

the english output needs to be as unambiguous as possible

frozen bear
#

it's fun relearning english for this

#

the transphobes are right, pronouns are hard (kidding)

#

here are all words that translates into english pronouns

  • mi
  • sina
  • ona
  • ni
  • seme
  • ale (everything, anything)
  • ala (nothing)
  • jan (somebody)
#

words like everything and somebody can technically be treated as noun except they cannot have adjectives nor determiner

rocky pulsar
#

ijo li "something" anu seme

frozen bear
#

ah, nice catch

#

what's nice about improving ilo Token's dictionary is that such improvement can be applied into contributing to lipu Linku.

frozen bear
#

hmm

  ijo: [
    noun("phenomenon(s)"),
    noun("object(s)"),
    noun("matter(s)"),
  ],
#

I think "something" is already under "object" though

frozen bear
#

going to implement nasin nanpa pona

frozen bear
#

it's actually tricky to implement nasin nanpa pona but I've done it, its untested however, I should test it

frozen bear
#

yeah, it works, but I need to work on another bug

#

my parser has a filter, it filters out multiple numbers in a single phrase, but it doesn't work

#

the translator is also kinda borked but its bound to be replaced by a better translator anyway

#

oooh I think I know what's the problem

#

the old translator still relies on the old unmaintained dictionary, which contains number definitions, that's why its putting numbers again on already filtered AST

#

to fix this, the old translator must be replaced

frozen bear
#

I'm going to take a break from ilo Token

desert abyss
#

take care!

frozen bear
#

thanks

frozen bear
sly prism
#

mi tawa e sina

frozen bear
#

because of ilo Token, I learned that there is a class of words called determiners that can be considered separate from adjectives

frozen bear
#

there's already the parser, it's mostly done, all it needs is a neat way to display ASTs

frozen bear
#

I'm going to make my own nasin for displaying AST

#

I'll call it nasin kulupu

#

it'll use bento box style grid

#

hmm, it is similar to sitelen sitelen

frozen bear
#

hmm, I'm just realizing its weakness

frozen bear
#

you know what, naahh, this is more work

#

hmm

#

I'll revisit the bento box one

frozen bear
#

#toki-ale message

#

ilo Token assumes modifier order doesn't matter, this is an interesting poll

#

aside from "ala", ordering of "ala" matters

#

well, the poll gets buried, I'll ask in #sona-musi

frozen bear
#

an ilo Token discussion #pali-nanpa message

frozen bear
#

we got a competitor now #pali-nanpa message

#

which is not a bad thing

frozen bear
#

there are few hurdles that I need to overcome before focusing on finally implementing translation of whole sentences

#

namely, rewriting the translator module that I had

edgy treeBOT
#

oh hey neat

ko Koko ᜃᜓᜃᜓ ↩️

[Reply to:](#1053538532993548320 message) which is not a bad thing

#

I’m currently working on my parser again

frozen bear
#

nice!

edgy treeBOT
#

seems like this will not be that difficult

#

(to finish)

#

(this time)

#

and yeah it’s good to have some competition

#

more incentive to actually release sona toki v5 in a timely manner

frozen bear
#

yeah

#

I'm kindof motivated in the wrong time, I'm not planning on working on ilo Token this July cuz I wanna dedicate that time for art fight T◡T

edgy treeBOT
#

oh dear ;o;

frozen bear
#

well, its not happening yet so I could work on ilo Token for a moment

edgy treeBOT
#

sounds like an idea!

frozen bear
#

well that's funny

edgy treeBOT
#

mmm

desert abyss
#

k

frozen bear
#

okay, I fixed that now

#

there's a settings for condensing number and tense

#

hmm

#

it can be improved

#

it can be further condensed but I think this is fine

fickle stag
#

mark "fuck" as an interjection would be good

frozen bear
#

ah yeah, it has double meaning

frozen bear
#

I'm just realizing ilo Token could be notorious to calqueing if it can translate sentences

frozen bear
#

I did done some measures to prevent this like "jo" doesn't have "has" or "have" in the dictionary, but I'm probably missed some

#

well, I'll worry about this later

frozen bear
#

will deal with this later

#

or perhaps it can be replaced with another expletive word

#

oh, and to be clear, the dictionary also contains what part of speech the words are, and this is properly used by the translator, although the translator is currently work in progress. pakala can be translated as "fuck" when it stands alone as a sentence but "mi pakala e mama sina" won't be translated to "I fuck your parent", or something like that

#

"I fuck up your parent", I think this is a valid translation but ehh

frozen bear
#

I'm starting to realize, verbs has too many conjugation, I forgot to consider singular and plural verbs

#

the expected complexity of this project just keeps growing

frozen bear
#

this is what I'm thinking, there might be dictionaries online tailored for computers, e.g. I give it a noun and ask what's it plural form and it'll output that. I'm thinking of using that if it exists just to ease up the development

If you know such tool that doesn't require scraping, please tell me

#

inb4 I search up "dictionary for computer" and it gives data structures

#

I could just scrape some data from wiktionary but uhhh

#

(this might be more work than just manually encoding the conjugations and stuffs lol)

#

dictionaries of verb conjugation exists

#

but I'm not sure if they can be scrapped

frozen bear
#

I might just as well scrape wiktionary, I think this is best choice for now

#

I can also scrape the pronunciation in case I need articles "a" and "an"

#

okay, api for wiktionary exists!

#

there's no need for scraping!

#

ah wait, I celebrated too early lol

#

this only determines whether the word exists or not

#

dang

#

oh, there's a latest api update where it can query definitions, well I don't need that

frozen bear
frozen bear
#

ilo Token is becoming my own english teacher

#

well, I suppose this should be expected

frozen bear
#

pona a -> good!
pona aa -> good!!
pona aaa -> good!!!

#

ni li ike ala ike?

#

pona a -> so good
pona aa -> soo good
pona aaa -> sooo good

dense lichen
#

a a a = so so so thonk

dense lichen
#

but i always thought oh so would be a better fit

#

since oh is a Noise too

frozen bear
#

oh good

#

ohh good

#

hmm

frozen bear
#

why things are taking so long: because everything is fucking generalized and toki pona has hidden complexities, we have to consider those

say for example: the "taso" in the beginning of the sentence. that's easy, just add "but" in the beginning of the sentence!

but what if it is "taso a", we can add "a" to any word right? what if its "taso ala taso"?, should we even consider this? "taso ala taso" is kinda pointless imo so I filtered that out

maybe I have overthought this

frozen bear
#

all this to say, programming can come with unexpected complexities

#

I sometimes wonder if I'm kinda overachieving

#

I'm planning on making "kin la" and "kin" have similar translations but can have different translations

rocky pulsar
frozen bear
#

wtf is that website, I thought the css were broken

#

thanks for suggesting this tool btw

#

I might say, this website is innaccessible

#

okay, the github readme is much more better

#

okay, this looks very useful, thanks for this

#

OMG YES!

#

thank goodness deno supports npm

worthy rivet
#

@frozen bear

#

This is technically a valid phrase

frozen bear
#

yeah, I know the reason why

#

it has been discussed before, lemme search it

frozen bear
#

okay, I'm going to rewrite phrase translator, then I'll release a new version

#

its been a long time when there's a new version, but this new version won't have as much new features, the rewrite is just catching up with the old code

frozen bear
#

I'm so glad toki pona prepositions translates to english prepositions

frozen bear
#

I think I'll rewrite the dictionary yet again

frozen bear
#

I can automate this

frozen bear
#
"akesi": [
    { "type": "noun", "noun": "reptile" },
    { "type": "noun", "noun": "amphibian" },
    { "type": "adjective", "adjective": "reptilian", "kind": "qualifier" },
    { "type": "adjective", "adjective": "amphibian", "kind": "qualifier" }
  ],
#

json is a lot uglier

#
  akesi: [
    noun("reptile(s)"),
    noun("amphibian(s)"),
    adjective("reptilian", "qualifier"),
    adjective("amphibian", "qualifier"),
  ],
#

this is what it looked before

#

I might create my own file format just for this

#
akesi:
  n reptile
  n amphibian
  adj reptilian
  adj amphibian
worthy rivet
frozen bear
#

why?

worthy rivet
#

there are fine ones out there

#

other than json

worthy rivet
# frozen bear why?

consider

<akesi>
  <ijo d="reptile(s)"/>
  <ijo d="amphibians(s)"/>
  <kule d="reptilian" k="qualifier"/>
  <kule d="amphibian" k="qualifier"/>
</akesi>
#

this is XML

frozen bear
#

hmmm

#

eh, I'll make one anyway

#
unpa:
  have(v) sex(n) with(pp)
#

yeah, I want something custom, so its nicer to read

#
olin:
  have(v) strong(adj) emotional(adj) bond(n) with(pp)
#

just ideas

frozen bear
#

this is how ilo Token translates "aaaaa"

  • ahhhhh
  • ohhhhh
  • haaaaa
  • ehhhhh
  • ummmmm
  • oyyyyy

this is how ilo Token translates "nnnnn"

  • hmmmmm
  • uhhhhh
  • mmmmmm
  • errrrr
  • ummmmm

does this needs to be changed?

frozen bear
#

hmm, okay, compromise isn't working

frozen bear
#

okay, I made it work

#

now the new dictionary is ready to rock!

frozen bear
#

I need to figure out how preverbs work

#

like how it can be translated

frozen bear
frozen bear
#

I just remembered, preverbs can be nested iiiiikeeeeh, hooray for another complexity

#

there can be preverb preposition construction as well

#

"mi ken tawa tomo"

#

I'm glad this is already covered in the parser, now all that's left is for translator to have it

frozen bear
fickle stag
#

Oh yeah removing it is fine

#

Learners will pick up that kind of use from context, it's very misleading to see in a word list

frozen bear
#

yeah

#

I still need to be careful tho, like I removed "cool" from lete and epiku. I probably missed some but things can changed in the future

frozen bear
#

I have stumbled upon inconsistency between runtimes, namely deno and firefox iiiiikeeeeh

#

I'm going to assume this is a firefox bug, but I have to verify this

#

oh to be clear, deno's output is intended

#

I also tested it on microsoft edge and it works as intended as well

#

firefox

#

bug aside, a new version is almost ready! to be clear, we can't translate sentences yet

worthy rivet
# frozen bear firefox

it's the ordering that's wrong?

Make sure you aren't iterating over the keys of an object and expecting the order to be consistent -- it isn't.

frozen bear
#

I'm not doing that

#

hmm, I do iterate over object but I know they're all unrelated to the final order, lemme double check

#

yep

#

its not that

worthy rivet
frozen bear
#

I only tried doing it on some parts, its all correct on deno, not on firefox. I haven't gotten deep yet tho so yeah, I'll investigate it further

#

what's weird is that there is sort in some places, firefox doesn't seem to respect it, although sort is kinda deep in the code so there could be problem on other parts

#

or it is a bug in firefox itself, this is my main hypothesis

worthy rivet
frozen bear
#

I'm going to take a rest

#

well, I'm going to release this version first

#

0.3.0

This is a huge update now with better quality translations, configurable settings, UCSUR support, and expanded vocabulary!

  • Reimplement the word "a". This were dropped due to parser rewrite.
  • The vocabulary has been expanded to nimi ku suli plus nimi su!.
  • New "dictionary mode", just enter a single word and ilo Token will output all definition from its own dictionary. This also works for particles. To bypass this and translate the word as if it is the whole sentence, just add a period.
  • Implement UCSUR support! It supports:
    • Cartouche with nasin sitelen kalama
    • Combined glyphs
    • Long glyphs
    • (Deprecated characters and combiners are not supported)
  • Implement nasin nanpa pona.
  • Implement settings dialog. More info.
  • Changes in error messages:
    • All possible errors will now be listed.
    • ilo Token now uses telo misikeke for error messages. This can be disabled from the settings.
  • Multiline text will no longer be recognized.
  • Add icons.

You may not notice this, we take good grammar for granted, but ilo Token now has generally better quality translations thanks to the following:

  • It is now aware determiners are separate from adjectives. So you won't see adjectives like "nicely my", since adverbs can't modify determiners.
  • It tries to ensure adjectives are in proper order. Yes this matters, it's "big red fruit" and not "red big fruit".
  • Just like adjectives, determiners are also ordered, but unlike adjectives, they're also filtered (some combinations are not shown). You won't see "my your animal".
  • It is aware of grammatical numbers. So you won't see "2 stick" or "1 sticks".

Inside update (intended for developers):

  • Implement lexer and english AST.
  • Overhaul dictionary: It is now a separate file with nicer syntax as opposed to written inside the code.

https://ilo-token.github.io/

frozen bear
#

mi lape

#

while really it's still only a phrase translator, we have come a long way, and we have necessary ingredients for translating full sentences

lament solar
#

a, o lukin kepeken ilo bun, ona li kepeken ilo pali insa ante

#

ken la ni kin li pana e nimi ante lon pini

#

ken suli nanpa wan la, pali pi ilo ff li pakala ala tawa lipu lawa nasin pi toki js, li ante lili taso

frozen bear
#

mi ni tu lon tenpo kama

gilded venture
#

hi :3 /respectfully

frozen bear
#

toki

gilded venture
#

toks :3

#

so, i tried ilo Token! its actually accurate!!!
i wanna help? but i don know code :(

gilded venture
#

how can i possibly help! :3

frozen bear
#

things followed by # are ignored

#

maybe see how it can be improved

gilded venture
#

nice :D

frozen bear
#

idea: dictionary editor right within the website

gilded venture
#

ah cool!
great idea!

#

but how would you do that?

#

would it be like a text?
you do like:

dictionary: a - scream (n.) screaming (v.)(adj.)

and anybody can edit it?

worthy rivet
#

Github Pages doesn't allow server-side scripting

gilded venture
#

okay

#

what about this:
a editable text thing that saves itself, then ko Koko will put the text into the github thing-a-magic

worthy rivet
#

though is probably convoluted

gilded venture
#

sorry idk alot of code lol

worthy rivet
gilded venture
#

ah okay

#

i get it

#

seems okay :D

frozen bear
#

this would be convenient to test out any modification without touching deno

#

and people would simply just send me the modification if they don't have github

#

plus, an added bonus would be users would be able to add nimisin, although not all nimisin can be added

unreal anvil
#

this is really good, ive been tinkering with it

frozen bear
#

mu

#

this survey might influence ilo Token

frozen bear
#

multiple pi is surprisingly common according to this survey, well actually its "if the meaning is clear, then multiple pi is fine"

#

ilo Token currently blocks multiple pi, I might reconsider that

#

ilo Token is capable of considering many possible ways the "pi" could nest so yeah, the filter/blocker could be disabled

#

this wasn't the case when ilo Token was in early development, that's why it is blocked in the first place

frozen bear
#

so... the question now is how the translator can handle multiple pi

#

you know what, nvm, I'll keep it blocked for now

gilded venture
#

what if, it shows the text connected via arrows?

#

like

frozen bear
#

cool concept

gilded venture
#

thanks

frozen bear
frozen bear
#

it was 10th december, that means ilo Token is now 2 years old, I'm not proud of that

frozen bear
#

will there be more rule-based translator other than this one?

untold linden
untold linden
#

taso mi sona ala e ni: ona li pana e ona tawa lipu seme

#

mi o toki tawa ona o alasa sona

#

ni

#

mi sona ala e pona ona

lament solar
#

mi awen ala pali

#

pali ni li kama suli ala tawa mi

rocky pulsar
frozen bear
frozen bear
#

I find it amazing that this is in toki pona taso. I struggle toki pona tasoing technical stuffs

untold linden
#

ona li sona wawa e toki pona e nasin pi pali ilo la ona li ken ni

frozen bear
#

mi alasa ni lon tenpo kama

frozen bear
rocky pulsar
#

ni li ike seme??

frozen bear
#

I was laughing at their code, idk it felt rude

worthy rivet
lament solar
frozen bear
#

on january, I might go back on working on this

gilded venture
#

cool

#

what if you just replace li with is

like, ona li pana --> he/she/it/they is give/eminate/throw

#

but its not that simple

#

i know, but what if tho

frozen bear
#

you guessed it right, it's not that simple

#

hmmm, I want the output text to be as grammatical as possible

#

I've already established this by reordering adjective so it's big red fruit and not red big fruit. and also by respecting grammatical number so there's no "1 sticks" or "2 stick"

#

also, I want ilo Token to output past, present, and future tenses

gilded venture
#

what if
you take the type of grammar thing (eg. noun, verb) of the first word
and then do the same with the other word
then select which "is" you pick

frozen bear
#

I don't get what you are trying to e

#

say

gilded venture
#

oh right yeah dangit

#

my toki is nasa

#

so like, take the "uri ng salita" thing

#

of both words

#

eg. in "mi leko"
the first word "mi" is like a pronoun
the second word "leko" or the word next to the "li" is like a verb or a adjective (refering to leko)

#

then it does the magic thing
where it selects which type of li (eg. is, are, etc.)
to make:

  1. I am a square
  2. I square (a thing)
  3. I is a square
frozen bear
#

yeah, that's exactly what I'm going to do

#

So there will be:
is/are
was/were
will be

#

there will be also "am" as special case

#

oh another thing for "kama", "become/becomes", kama is the only content word that translates into english linking verb

copper fern
frozen bear
#

ooh yeah

#

I missed that

#

glad to know kama isn't alone

frozen bear
#
awen:
  # stay(v);
  remain(linking v); # Thanks to ilo Tani for telling me this is supposed to be a linking verb
  # wait(v);
  pause(v) [object];

  protect(v) [object];
  # keep safe(v);

  continue(v);

  continue(v) to(particle) [predicate v];
#

I love the dictionary

copper fern
#

oooo what kind of syntax is this

frozen bear
#

it's custom!

copper fern
#

that is super cool

frozen bear
#

thanks, yeah I'm proud of it, it makes editing the dictionary easy

copper fern
#

i was planning to just use something like json or toml if i ever wanted to extend my parser to translation

frozen bear
#

I did that at first but defining complicated definitions annoyed me

#

ohh, you're planning on making a rule based translator? that's nice!

#

by complicated definitions I mean something like this

olin:
  have(v) strong(adj opinion) emotional(adj opinion) bond(n singular) with(prep) [object];
gilded venture
#

hey, for musi purposes
for "mi li (something)"
it translates to "I is a (something)"

but dont actually dont do this
its for musi purposes

frozen bear
#

I've been thinking of easter eggs

#

ilo Token currently does the boring way by calling it an error! error messages provided by telo misikeke

#

that's a fun suggestion, I might actually consider that

#

"don't actually don't do this"
I don't get this lol

worthy rivet
worthy rivet
gilded venture
#

also i thought you'd dont like the joke i made lol
my language skills are like uhhh.... idk, random

#

i also thought that you'd put it as an actual feature
and the joke flew above you

#

sorry!!!

gilded venture
#

frozen bear
#

"la" is a very versatile particle

#

it can be preceded by preposition "kepeken ilo la mi moku"

#

it can be preceded by nanpa construction "nanpa wan la mi kama lape ala"

#

ilo Token currently doesn't recognized the latter

copper fern
#

ooh my parser also doesn’t have the nanpa wan thing

#

that is a good catch

frozen bear
#

I just realized it after making this meme #toki-ale message

#

wait, I can forward

narrow goblet
#

may i suggest adding certain translations for certain phrases
e.g. toki a is translated as hello, hi

frozen bear
#

the dictionary strictly adheres to lipu Linku. ilo Token doesn't have "hello" because Linku doesn't have it

frozen bear
#

hmm, if you mean translating "suno pona" into "good morning", that runs the risk of lexicalization, we don't want that

#

translations should ideally be as bare as possible

gilded venture
#

huh

💥

frozen bear
#

that's because mu doesn't have noun definition

#

yeah that's a problem

gilded venture
#

ah ok

#

what if, animal vocalization (n.)

dense lichen
#

list as many mus as possible

frozen bear
#

that's what I did lol

#

just putting "mu" will make ilo Token spit all definitions

#
- (will) bark(ed)
- (will) cough(ed)
- (will) roar(ed)
- (will) hum(med)
- (will) quack(ed)
- (will) his
- (will) buzz(ed)
- (will) growl(ed)
- (will) yawn(ed)
- (will) screech(ed)
- (will) chirp(ed)
- (will) gobble(d)
- (will) purr(ed)
- (will) honk
- (will) burp(ed)
- (will) chomp(ed)
- bark
- cough
- roar
- hum
- ow
- quack
- hiss
- buzz
- growl
- yawn
- woof
- screech
- chirp
- hoot
- moo
- hiccup
- gobble
- purr
- baa
- honk
- tweet
- ouch
- meow
- burp
- chomp
- ribbit
- achoo
#

oh, that's bad, the automatic conjugation didn't worked on some verbs

#

oh whoops, I didn't know this would flood the chat

dense lichen
#

well it's your thread lol

frozen bear
#

what does linku says about this

sour ventureBOT
#
mu
usage

core (pu)

definition

(animal noise or communication, onomatopoeia)

frozen bear
#

hmm

copper fern
#

it’s still the main one

#

mu is animate, but using it for inanimate things allows you frame them in animate ways

frozen bear
#

ooohhh, I seee

copper fern
#

if you were to refer to the mu of a waterfall for example

#

rather than just a kalama it frames the waterfall as more animate

frozen bear
#

I see

dense lichen
#

wait omg

#

mu for life

#

so long konwe/aja/ka/what have you

frozen bear
#

hehehe

copper fern
#

pali used to be a lot more of an antonym to moli!

frozen bear
#

for noun definition of mu, I think I'll just use gerund form of all verbs, so "barking", "coughing", etc. this is what I did to other verb definitions

copper fern
#

makes sense!

#

hmm actually

#

“i hear a moo” “i hear coughs”

#

oh “i hear mooing” as a mass noun thing also works yeah

frozen bear
#

"animal vocalization" also works I guess

#

"I hear animal vocalization

#

kinda eh, but I mean olin is "I have strong emotional bond with you"

#

"I respect you"

#

no "love" as linku dictates

#

hmmm

#

"animal vocalization" is better if I want to reduce the definition count

#

reducing definition count is nice, just imagine if someone input a complex sentence

dense lichen
frozen bear
#

"mu" is actually currently an odd ball with that definition count

#

ehh, idk

west vectorBOT
frozen bear
#

lemme pin this so I remember

#

on another topic, I've been thinking of adding some flairs to ilo Token, nothing serious, just something fun, I've been thinking maybe splash text minecraft style, so I have prepared these:

  • not Globse!
  • context? who??
  • 20 positive qualities!
  • I know the rules, and so do... the ilo Token maintainers!
#

I'm a master of procrastination

gilded venture
#

jan Jhon

frozen bear
#

that works as expected

gilded venture
#

wait, it's supposed to do that?

#

cool

frozen bear
#

what do you think is wrong with it?

gilded venture
#

oh, i thought it will display an error message saying
"The word "Bisaya" is an untokiponized name"

#

or something

#

or
"The word "wuwojiti" is a word that violates toki pona phonology"

frozen bear
#

names can be untokiponized

gilded venture
#

ah ok
that's cool

frozen bear
#

maybe this needs filtering (the current version of ilo token doesn't support UCSUR)

#

its kinda cursed

gilded venture
#

BijosamaYa

#

hehehehe

frozen bear
#

it supports ucsur but the rendering is broken

gilded venture
#

yea

frozen bear
#

I'M FREE!

#

I have less responsibilities now

#

maybe I have time now for ilo Token

frozen bear
worthy rivet
#

but since my head is exploding I will stop for now

frozen bear
#

thank you so much for taking your time. I'll look at this tomorrow

worthy rivet
#

WAIT

#

WAIT

#

@frozen bear MAGICAL DEBUGGING UTILITY

#

I just came up with it

#
var dump = "";
["map","flatMap"].forEach(pn => {let old=Array.prototype[pn];Array.prototype[pn] = function(){let ret=old.bind(this)(...arguments);dump += JSON.stringify(ret)+"\n";return ret;}});
#

then run the two dumps on different browsers through a diff utility

#

wait

#

hmm

#

slight change

#

Looks like even the amount of times different browsers call map and flatMap are different.

#

Interesting.

worthy rivet
frozen bear
#

okay, I'm onto something now, somewhere things got reversed

#

my hypothesis: it is due to [].sort(), ilo Token relies on it being stable, firefox implementation might be unstable

#

although mdn says otherwise

#

so we're still searching what's the cause

frozen bear
#

it is sort...

#

ugh

#

well I'm not too sure yet

#

let's see if using custom sort will fix it

#

actually, lemme verify it first

frozen bear
#

OMFG

#

OMFG

#

turns out it was me

#

I've been blaming firefox lmao

#

but its still weird it caused runtime dependent bug

#

so the fix? just two characters

-      .sort((clause) => {
+      .sortBy((clause) => {
#

I might have discovered a way to detect what runtime you are using, and hence what browser you are using, of course there's user-agent but that can be spoofed. uhh is this a potential privacy issue?

#

I just reproduced it

let isFirefox = [false,true].sort(() => 1)[0];
#

so we're feeding the sort function an invalid comparison function

#

of course its going to be undefined behavior

#

pinging @worthy rivet you may wanna see this lol

gilded venture
#

wo

frozen bear
#

I could jump right in finally implementing an actual sentence translator. but I'm going to procrastinate, I wanna build the custom dictionary first

#

I've been trying to minimize the distribution code, but if we're having custom dictionary editor, there will be an npm dependency bundled in the dist code, it is a package called compromise and it is used to conjugate nouns and verbs, I wonder how huge the dist code will be

#

before, conjugating were part of the build process and hence it wasn't in the dist codes

frozen bear
#

fun fact, the toki pona parser and the dictionary parser uses the same handwritten parser library

frozen bear
#

OMFG what I did is scuffed

#

so, the reason I used deno is I wanted to be able to test the code quickly without running the browser

#

but there is limited options for bundler in deno due to how new it is

#

oh actually, bundler is unrelated

#

so, the npm version of compromise relies on "node:process" for some reason

#

wait, the bundler is related

frozen bear
#

i guess it works fine if it is undefined

#

so I did this, I specified this in the import map

"node:process": "./process.ts",

and process.ts is this

export const __Process$ = undefined;
#

its fucking scuffed lmao

#

of course, the import map is separate from what deno is actually using, so deno can use "node:process" just fine

#

okay, I can no longer just run the html file as the code now has fetch, lemme do something first

#

lemme test it first

#

it works 🎉

#

going to commit this now

#

the dist code main.js is now 1346 kB

#

its unminified

frozen bear
#

maybe when its gzip its smaller, but idk

#

anyway, now onto making custom dictionary editor

frozen bear
#

hmm, I need a way to display errors

worthy rivet
#

just add a line to tsconfig.json

frozen bear
worthy rivet
frozen bear
#

I didn't know that

frozen bear
#

well, I did built a script that auto builds the dist code whenever there are changes, it doesn't auto-refreshes the browser but its good enough for me

frozen bear
#

I'm doing everything to not use node.js lmao

worthy rivet
frozen bear
#

this is becoming a deno shilling thread, with node.js there's too many configs, npm sucks, I hate dependency hell. I love being able to deno run without bothering to build the frontend. deno run were heavily used during parser rewrite

#

deno run doesn't need transpiling!

frozen bear
worthy rivet
frozen bear
#

okay, I'm going to stop this here, there's no point in arguing, you're not going to convince me, I'm not gonna convince you

#

let's not get too off topic

frozen bear
frozen bear
#
ti:
  <script>alert`(`"code injection"`)`<`/`script>(n);

me when code injection

#

I need to sanitize it

#

the custom dictionary I mean

gilded venture
#

wtf

#

is neverrare just nit here?

sly prism
frozen bear