#(CW:AI) Discussion of LLMs and other kinds of text generators ("AI"), especially of toki pona
1 messages · Page 2 of 1
a lot of this was AI translated, and I looked over it and attempted to proofread
it actually understands previous things said in the conversation
and how things are translated
..it translated "The World Machine" as "ilo Weka".. due to things it knew about the LORE OF THE GAME????
"mi pali e ante lili tawa toki ni. nimi "Magpie" li kama nimi "Kasa" tan ni: nimi "Kasa" li sama nimi "kasapilin" (magpie)."
is that a word
ala
I haven't read the whole thing - but scanning it, it's impressive that it doesn't try to do word-for-word - and makes good decisions like translating (computer) disc as lipu rather than sike.
IT THINKS I’M A MAGICAL POTATO ;-;
I’M NAH PATATO WAWA SEWI, AM KIJETESANTAKALU PATATO :3
So I got the Claude Opus (paid) version and decided to ask it to play a game where it writes out the ABC notation (a special very simple plain-text method of representing sheet music) of a famous (in the US) folk song without telling me the name so that I can try to guess it. It actually did an amazing job of making a cohesive and catchy melody with no obvious mistakes or strange notes. However, I couldn't recognize the tune, and it turned out that it was trying to encode "Oh, Susanna!", which is absolutely not what this tune sounds like haha
A little off-key on some notes and I had to bring the second half an octave down haha
ken la toki mi li lon ala · https://arxiv.org/abs/2309.16575
Large language models (LLMs) can perform impressive feats with in-context learning or lightweight finetuning. It is natural to wonder how well these models adapt to genuinely new tasks, but how does one find tasks that are unseen in internet-scale training sets? We turn to a field that is explicitly motivated and bottlenecked by a scarcity of we...
the simplification is really great
"kulupu Sita li pana e mani mute tawa kulupu Nintendo tan ni: kulupu Nintendo li toki e ni: "o pali ala e ilo musi pi ilo mi."
heh we should put a EULA full of legalese boilerplate through it
do it!
would probably be musi for some toki pona projects to use cc by sa in toki pona
Oh yeah! Great idea! Let me see if I can do it with an iPhone I’m out right now
It refused due to too long
aa
mi sona e ni: lipu suli lawa ni li suli mute. mi ken ala ante e lipu ni tawa toki pona kepeken tenpo lili. mi wile pali mute, taso ni li ken pakala toki. taso mi wile pana e sona ni tawa sina:
lipu Creative Commons Attribution-ShareAlike 3.0 Unported li toki e lawa ni:
kulupu Creative Commons li kulupu lawa ala. ona li pana ala e pona tawa lipu sina.
jan li kepeken ilo ni la, ona li kama lon li wile pali sama toki lipu.
ilo ni li pana e pona tawa jan pali.
jan pali li ken pana sin e pali ona li ken ante lili e ona.
nimi pona li wile lon pali sin.
jan ale li ken lukin e pali li wile ala e lupa lukin.
pali li ken kama lon nasin musi mute li ken kama lon ilo sona.
lipu ni li toki e ijo mute ante kin. taso mi ken ala pana e ona ale kepeken toki pona. mi wile ala ante ike e toki lipu. mi pana e sona lili taso. o toki tawa mi la sina wile sona e ijo ante!
lon · taso tenpo ni la ala · mi kama sin lon tomo mi la mi ken
yeah that might be easier just to have the right context for each thing and not bungle it up
Luckily Claude keeps the context for the entire conversation unlike GPT4 so it’s very possible
It runs out of usage faster as a result in cases with overmuch context but it’s counting on that being preferred to forgetting some context
I'm just thinking if it tries translating multiple paragraphs at once in a more bizzare nasin like legalese, it might overcontextualize paragraph 2 based on irrelevant info from paragraph 1
Haha I hope you keep on making hypotheses until I or someone else has a chance to test it out
Hm! Idea: separate pro-AI Toki Pona server with bot(s) that can converse via API
If such bots don’t exist, we can ask the AI to code them, too 😈
aa
“aa” sina li pona ala pona
ona o pana ala e pilin
n ona o pana e pilin
taso • suli la ona li pona ala li ike ala li pilin taso
ona li “aa” pi pilin meso
mani mute a
sona mi la ilo [Ko] suli taso li toki [pona] pona · nimi wan ale ale ale la ona li wile e mani mute mute mute luka luka luka · ni li ken lonnnn sike esun wan anu seme
ken la mi toki pakala
sina toki e ni anu seme · pali ni li kepeken mani mute li kepeken tenpo mute
ala · ilo sin li wan e ilo [Claude] e ma [Discord] la ona li lili · taso ilo [Claude] li kepeken mani mute
a mi sona
musi a. mi o lukin ale e ona.
It's certainly what GPT-4 even is doing in toki pona, and it explains a lot of its bad syntax. "beginner mistakes" are usually word for word translations from English.
But it's clear that Claude Opus is capable of not doing that somehow
O
rare chatgpt w
ona li toki e "kepeken e" tan seme la, mi sona ala. tenpo pini la, ona li ni ala.
"Comfort me"!
@AnthropicAI Important update:
After further experiments, I realized that my initial claim that Claude Opus does not know Circassian was incorrect. The model is in fact capable of translating to and from Circassian and even conversing in the language, even though with some grammatical flaws.…
m toki Asika en toki pona la ken ona li sama anu seme
ni li ken. This is a different story from what we originally thought. But as he says, it doesn't take much away from what Anthropic have achieved with this model. The amount of Circassian in the training data is miniscule. But it's enough to be able to prime it for understanding prompts in that language. There has to be more toki pona in its training data. But probably not more proportionally than GPT-4 has, and it's vastly ahead of that model in ability.
well lol i already knew that i have spent quite a bit of time looking at and translating circassian content online
(on reading this)
I just discovered that Claude Opus is a much better conversation companion when discussing fictional universe lore than ChatGPT 4. GPT4 is always trying to tie it back to reality and is always hedging itself by reminding me that there's no way to know what the author really intended or that it's just not explained, while Claude Opus seems game to get super speculative and nerd out with me about possible interpretations
a lon sewi la sina toki e ni anu seme?
mi sona ala
but like there is a decent amount of circassian text on the internet
there are a million ish speakers at least
less still in russia and using the cyrillic script but still
The only completely objective experiment: invent a new conlang, feed Claude only translation pairs, and see what it makes of it
circassian speakers do in fact use the internet i mean they live in russia its not like a third world country
a. So, unreasonable to expect there was nothing in the training data.
yeah
i think its like 500,000 ish speakers of specifically the dialect that was being tested but might be wrong
apparently 650,000 im not too far off lol
The initial impression, amplified by each retelling, was that no one knew about this language.
I'm wondering what it can do for endangered Native American languages. I'm in a research group on humanities and AI, and wonder whether I should suggest to my NA colleague (very involved in language preservation) that Claude might be useful? But another member of the group wondered whether Claude would really help to preserve the language in its own terms, or impose European grammatical categories onto it.
no one knew about this language
not many english speakers do!
i do obv lol
i mean anyone with an interest in linguistics has at least heard of circassian even if only because "adyghe" is listed on every phonetics wikipedia page ever
mi ken alasa kepeken toki Kalamang. toki ni li lili mute li lon lipu ilo ala a. jan pi mute lili li toki kepeken ona. jan pi alasa sona li toki tawa kulupu ona li lipu e sona mute. tenpo poka la jan alasa li pali e alasa sin ni: ilo en jan li wawa ala wawa sama lon ni: ona li lukin e lipu sona ni taso la ona li ken kepeken toki Kalamang.
ni li alasa pona
sina lukin a e ni: ona li pana lili e nasa pi toki inli tawa toki pona. taso, ni li suli ala suli ike...?
Large language models (LLMs) can perform impressive feats with in-context learning or lightweight finetuning. It is natural to wonder how well these models adapt to genuinely new tasks, but how does one find tasks that are unseen in internet-scale training sets? We turn to a field that is explicitly motivated and bottlenecked by a scarcity of we...
li lon lipu ilo ala a
"Potential contamination. Because MTOB’s test set is constructed from data that was published on the web,"
:p
(awen la toki li lili la ilo li ken la ilo li wawa)
lon...
pilin sina la ilo ni li wawa nanpa wan lon toki pona anu seme?
ni
[ilo penpo o lukin ala]
Adapt the following text to Toki Pona, making sure to change things as needed to make it sound natural: Exo is a contemporary geometric sans serif typeface that tries to convey a technological/futuristic feeling while keeping an elegant design. Exo was meant to be a very versatile font, so it has 9 weights (the maximum on the web) each with a true italic version. It works great as a display face but it also works good for small to intermediate size texts.
Here is my attempt at adapting the text to Toki Pona:
sitelen Ekiso li sitelen sin pi nasin nanpa. ona li wile pana e pilin pi tenpo kama e pilin sona. taso ona li pona lukin kin.
sitelen Ekiso li ken pali mute. ona li jo e suli pi linja mute. ona li jo e nasin linja tu pi suli ale. ona li pona tawa sitelen suli. ona li pona kin tawa sitelen lili.
English translation:
Exo writing is new number-style writing. It wants to give a feeling of future time and a knowing feeling. But it also looks good.
Exo writing can do a lot. It has many thicknesses of lines. It has two line styles of all sizes. It is good for big writing. It is also good for small writing.
Notes:
- I replaced "geometric sans serif typeface" with "writing of number style", since Toki Pona doesn't have direct terms for font classifications.
- "technological/futuristic feeling" became "feeling of future time and knowing feeling"
- "elegant design" simplified to "looks good"
- "versatile" changed to "can do a lot"
- "9 weights" became "many thicknesses of lines"
- "true italic version" adapted to "two line styles"
- "display face" and "small to intermediate size text" simplified to "big writing" and "small writing"
The result captures the general meaning while sounding more natural in Toki Pona's minimalist grammar and limited vocabulary. Let me know if you have any other questions!
nnn. mute li weka. sina pilin seme? (kin ona li pakala kepeken nimi pi. ona o toki e "suli linja mute")
ken la ona li weka e mute tan ni → toki mi la ona o pona e toki tawa nasin pi toki [pona]
[ilo penpo o lukin ala]
Translate the following text to Toki Pona, making sure to stay accurate to the original: [...]
Here is the translation of the given text into Toki Pona:
sitelen Eso li sitelen pi tenpo ni. ona li jo ala e linja sike. ona li lukin sama ilo pi tenpo kama, taso ona li pona lukin kin. sitelen Eso li ken pali ijo mute. ona li jo e suli luka luka tu wan (suli nanpa suli lon lipu sona). ona ale li jo e ijo nasin ante pi lukin pona. ona li pona tawa sitelen suli. ona li pona kin tawa sitelen lili tawa sitelen suli lili.
Here's a breakdown of the translation:
- sitelen Eso = Exo font
- li = is/are (subject marker)
- sitelen pi tenpo ni = font of current time, contemporary font
- ona = it
- li jo ala e linja sike = doesn't have curved lines, geometric
- li lukin sama ilo pi tenpo kama = looks like a tool of future time, conveys technological/futuristic feeling
- taso = but
- li pona lukin kin = is good looking also, keeps an elegant design
- li ken pali ijo mute = can do many things, is very versatile
- li jo e suli luka luka tu wan = has size 9 9 2 1 (literal translation of 9), has 9 weights
- suli nanpa suli lon lipu sona = biggest number size on web page, maximum on the web
- ona ale li jo e ijo nasin ante pi lukin pona = each of them has a thing of different style of good appearance, each with a true italic version
- li pona tawa sitelen suli = is good for big text, works great as a display face
- li pona kin tawa sitelen lili tawa sitelen suli lili = is also good for small to medium size texts
Note that due to Toki Pona's minimal vocabulary and grammar, some of the technical details and nuance of the original text are lost in translation. The translation focuses more on conveying the general meaning and key points.
😭
ugh
ni li ilo Claude Opus anu seme?
mi kin li pana e toki sina tawa ilo Claude. ona li toki nasa. mi pana sin e toki sama. ona li awen nasa.
[ilo penpo o lukin ala]
explain what tis-100 is, then come up with a toki pona translation of the full title that matches the acronym
TIS-100 is a programming puzzle game developed by Zachtronics. In the game, players take on the role of a programmer working on the titular TIS-100 computer, which is an imaginary 1970s-era microcomputer. The objective is to create programs that solve various challenges using a unique assembly language specific to the TIS-100.
The game presents a series of programming puzzles that require the player to optimize their code to achieve the desired output while working within the constraints of the TIS-100's limited resources, such as memory and processing power. The puzzles become increasingly complex as the game progresses, testing the player's problem-solving skills and understanding of low-level programming concepts.
TIS-100 is known for its challenging gameplay and retro-inspired aesthetic, which includes a minimalist user interface and a fictional technical manual that provides context for the puzzles.
Now, let's come up with a Toki Pona translation of the full title that matches the acronym:
T - tomo (meaning "indoor space, home, room, building")
I - ilo (meaning "tool, device, machine")
S - sona (meaning "knowledge, wisdom, intelligence, understanding")
100 - Toki Pona doesn't have words for specific numbers, but we can use "ale" (meaning "all, everything") to represent a large or complete amount.
So, a possible Toki Pona translation of TIS-100 that matches the acronym could be:
Tomo Ilo Sona ale
This translates to something like "The complete home of machine knowledge" or "The building of all device wisdom," which captures the essence of the game as a comprehensive programming puzzle experience within a virtual computer system.
sama wile mi a · taso mi la "tomo ilo pi sona ale"
Is this image creation? Or translation?
pali sitelen

oh nvm this was already brought up
I can't keep buying more Pro subscriptions to try these all out!
so uh
apparently you can just use it for free?
i just went to https://aistudio.google.com/
and i dont have a pro subscription...
weird restrictions tho, cant seem to name me a restaurant, ive rephrased 3 times and it still wont
@hazy apex
ona li toki e ni lon ma ni: mi tu taso li lon
sina wile pana e sona tawa ilo la, sina ken ala ken pana e lipu tawa ona?
pilin mi la, ni li wile e mani
ken!
taso lipu ni (needs to be txt as far as I can tell)
No idea why I have access without paying tbh, they'll probably fix it soon
n ken suli la ni li awen wile ala e mani tan ni: kulupu kuko li wile kama e kepeken
:0
@earnest frost o, here is the AI thread. Here's a quote on why people want the conversation here, please accommodate it if you're going to continue a conversation about AI:
These are the various reasons certain brains do not want to see AI-generated images.
If a user has:
• …paranoia, ||these images can serve as a reminder that technology can create fake faces, and that verifying an image, face, or person online is impossible||.
• …a type of dissociative disorder, ||the "uncanny valley" aspect of these images can trigger feelings of dissociation and/or derealization, which both make the lucid world feel similarly uncanny||.
• …a type of psychosis, ||unreality and the uncanny valley are both very common psychosis triggers — it does not help a brain that cannot tell shared reality from its own perceptions to see something that makes its owner doubt their own reality. And especially if a user has:||
• …a type of delusion(s), ||many common delusions center around people and faces being purposely fake or impostors, the replacement of humans, and the power of technology; and even more can be triggered by uncanny content regardless.||
• …one or more phobias such as trypophobia, necrophobia, scopophobia, etc., ||due to the way AIs build these images out of shapes and colors, they often have clusters, corpselike imagery, and unnerving eyes or eye contact.||
And these images can just be creepy and bizarre in general, too — and of course there are brains that don't like that regardless of psychology.
#da53fc whyai
the 'conversation' ended in the other channel already :)
a, pona
ilo li moli
o toki lon #jaki
ilo li moli ala li lukin ala tomo ni
ni li toki musi
ken ala 
o pana e lipu mute tawa ona la ken la ona li kama ken
oof
sry gimmie a few days to fix it
desiti
nimisin li kama. jan ale o sona
“pilin pi jo e kon ma lon en wawa”
It's truly terrible.
Gives me an idea to program a GPT bot and set it loose in #learn-toki-pona-1 . See if anyone can school it into better linguistic behavior!
You'd never see Claude Opus saying something like that
mi la sina wile pona wawa e ilo la o toki taso ala tawa ona o pona e insa ona
ken la mi o ni · pali mi la ilo wawa li lon li ken pana e sona tawa ilo toki pi suli lili (7B)
taso n · toki o tan seme · mi wile ala moku e ale li wile ala ike e pilin pi jan toki · taso toki pi mute lili a li lon
a taso mi ken ni → mi pana e toki lili tawa insa ilo · ilo li pana e toki mute mute mute · mi pana e toki ilo tawa ilo [telo misikeke] · toki li ike tawa ilo ni la mi pana e sona ni tawa ilo pali · kin ken la mi pona e toki ona li pana e toki pona · mi ni sike la ken la pona li kama
mi wile ala kepeken ilo [Claude] tawa ni tan wile nasa mi (tan mani kin)
a sina sona ala sona e nasin ni? mi sona ala e ona.
sina ken pana e toki pi lipu tenpo. jan pali li ken e ni.
taso sina o pana e nimi pi lipu tenpo lon poka pi ilo sona.
ni li lawa jo ona
sona lili
ilo ni li ken pona https://github.com/OpenAccess-AI-Collective/axolotl
jan pali anu jan lawa
jan lawa. mi pakala.
n
mi ken ala kepeken ala la mi kepeken
"6 gigabytes of instruction tuning text data" 😭
a ona li kepeken ilo pi ante toki
musi sona a!
suno's ai generators actually do a decent job with toki pona besides the j
where it still says it like how it is in english
https://app.suno.ai/song/65b7301f-d775-4cd6-8ee4-8136acf18fd5/
also it generated this when i was trying to make a sad song and it's kind of funny 😭
futurepop, jpop, fast, glitchy melodies, vocaloid, toki pona, miku, 250 bpm song. Listen and make your own with Suno.
suno still creeps me out too much (way in the uncanny valley). I've seen someone post a toki pona song, where they just manually edited all the "j"s to "y"s, and it got the pronunciation correct.
i get that a lot with images
if it's just slightly off ||i literally start shaking and i want to look away||
That's why we're still keeping images even out of this thread. But I think music, being text-based, is OK.
ah
i feel like ai is a lot more of a "fun thing to mess with" as opposed to like
what you want to actually use instead of hiring someone
i wonder if it was trained on some toki pona songs.. which would be horrible and depressing
i guarentee a lot of it was without permission at all
and trained off of copyrighted works
I gotta say, trying it for the first time, it is kind of astonishing: https://app.suno.ai/song/4e72bbb0-9912-4937-9c93-1b7a1c5de5d4
slow, folk song, lullaby, male singer, acoustic guitar song. Listen and make your own with Suno.
Maybe? Perhaps it vacuumed up all music on YouTube and Bandcamp, which is where toki pona music would be
it's so impressive how it can do this.. but i feel like the music thing is even more telling of where it's inspired from
i can't find it but i found one where it pretty much copied something from an already existing song
Miku voice, Vocaloid, Egypt Style, Hardcore, speak fast, dark song. Listen and make your own with Suno.
holy shit 😭 HOW
when you get it to work correctly it works SO GOOD
and i thought ai music was like super terrible a couple months ago??
We're hearing a lot of this for the first time. I bet if you listened to 30 in a row, it would begin to feel hollow - just as a lot of AI-generated text begins to feel that way.
i know
mi nanpa e toki pi ma [Tatoeba] e ni pi toki [🇺🇸] sama e ni pi toki [🇩🇪] sama e ni pi ni anu ni e ni pona
(suli la mi ni kepeken ilo [TatoebaTools] lon poki [Colab] pi esun [Google] · taso nanpa pona li lon ala ona la mi kama jo e lipu tan ma [Tatoeba] li nanpa e ona)
nanpa la · mi kepeken toki poka tu taso la toki li awen mute · taso mi kepeken toki pona taso la toki li kama lili a
ken la mi ken kepeken toki pi jan pona taso · taso jan sama li ken pona lon tenpo li ken ike lon tenpo · (mi ni a tan sin mi lon open pi pana mi)
wawa a. taso sina wile pana e sona tawa ilo la, ni li lili a. kin la, lukin mi la, toki pona pi ma Tatoeba li ken nasa.
mi sona ala e ni: sina ken pana e toki pona taso tawa ilo anu seme? sina wile ala wile e toki Inli kin pi toki pona ni? wile ala la, sina ken kepeken lipu tenpo ale.
mi wile pana e toki [Inli] lili la ilo li ken kama sona pona e kon nimi · taso mi pana e lipu ante kin pi toki poka ala
it doesn't want to speak to me, maybe I need more tp text
It worked!
la toki pona, toki pona, toki pona
Watch your back, Claude
ma Ponosan
"Eka"
"Metalocalypse"
"Jan nata, nata, nata"
"Pilin toki pona"
"Mokta, mokta, mokta"
"Metalocalypse, metalocalypse, metalocalypse"
"Lukto, lukto, lukto"
"Toki pona, toki pona, toki pona"
jan Adam li pona ala pona?
"Jan Adam, Jan Adam, Jan Adam, Adam"
"Jan Adam, Adam, Adam, Adam, Adam, Adam"
"jan Adam, jan Adam, Jan Adam, jan Adam"
"jan Adam"
"Jan Adam"
"Jan Adam"
[Jan and Adam are getting along!]```
I ruined it wiþ Toki Pona.
"Yes, my name is Grug, and I'm from Kyrgyzstan. I love my home. I have many friends. We speak Toki Pona. It is a beautiful language. I love learning new languages. I am a child of two cultures. I am Russian and Kyrgyz. I love both languages. I speak Toki Pona fluently. I can even sing in Toki Pona. I try to learn a new song every day. Language is a gift. It is a tool. Every language has its own beauty. Languages can bring people together. Languages can tell a story. Music is a universal language. We use it to communicate. Music can lift your spirits. It can heal. It can make you feel alive. Music is important. It's everywhere. Friendship is the most important thing."```
I þink it is trying to say jan Natan - Naþan Explosion
ona li tan ma Kekasan ✍️⭐👉
lon a
I've played a little with NovelAI, and never had any luck with languages. It's trained narrowly but deeply in English and Japanese (I think) literature. It does really well in that zone, not at all well out of it
i like to interpret CW:AI as the name of the thread
mu mije
Chatting With: AI
lon a
mi toki tawa ijo Eja (Nasi pi kiwen ike)
muje
Þat’s a first sentence
pona kin!
||it speaks toki pona better than you at least|| i didnt meant to be rude like this sorry
||though it almost seems like it gets worse over time to match your style||
fbjshfisjjc
sina wile pakala e pilin jan anu seme 💀
anpa wawa
aaaa lon la ala...
mi sona mi sona
aa mi pakala. mi pali e toki la mi sona e ni: ona li sama utala. ni la mi pana tan seme...
mi wile toki e seme...
sina ken weka e toki sina pi tenpo pini anu seme
mi sona ala ¯_(ツ)_/¯
sina wile toki ike tawa jan sin la, o toki pona e ike sina. ni la, sina ken toki e wile sina; taso jan sin li ken ala sona pona e ona.
😭 A A A
musi
mi tawa li sitelen e ni, li pakala. taso mi lukin ala e pakala.
i asked gpt 4 “tokiponize the following: croatia, joe biden, wuwojiti, tindore”
and it returned
“In Toki Pona, proper names are typically adapted to fit the phonetic and structural constraints of the language, often using an approach that seeks to capture a simple, recognizable approximation of the original name's sound. Since Toki Pona's phonology is limited and does not include certain sounds present in many languages (like the sound of "c" in "Croatia," the "j" in "Joe," or the "r" in "Biden"), adaptations are necessary. Here are possible Toki Pona adaptations for the names you've provided:
- Croatia: "Kolasija" or "Kolati"
- Joe Biden: "So Piten" or simply "So"
- Wuwojiti: As "Wuwojiti" doesn't resemble a widely known name or term, and its phonetic compatibility with Toki Pona isn't clear, a direct transcription that fits might be "Wuwojiti" itself, adjusted for phonetics: "Wuwojiti." However, if it's an invented or less commonly known term/name, the adaptation process might prioritize a closer phonetic approximation or even a meaningful translation if the term has a known meaning.
- Tindore: Could be adapted as "Tinole" or "Tinote"
These adaptations take into account Toki Pona's preference for simple, CV(n) (consonant-vowel(nasal)) syllable structures, its limited set of permissible sounds, and its overall philosophy of simplicity and minimalism. Remember that Toki Pona also encourages creative use of its limited vocabulary to convey meanings, so there might be other valid ways to adapt or represent these names based on their perceived characteristics or the context in which they're used.”
it doesnt know wuwojiti or to translate words from the language they originate in, and for some reason it doesnt turn dʒ into j
i asked it to tokiponize germany thinking it might know ma Tosi since its more commonly talked about than croatia but it returned this
“Germany in Toki Pona could be transliterated as something like "Sewi" (from "Siamani", a common adaptation), reflecting its name in a simplified, phonetic form according to Toki Pona's minimalistic phonology and vocabulary. Remember, Toki Pona aims at simplicity and approximation rather than precise or official naming conventions.”
Germany > Siamani > Sewi is wild
why would dɮ turn into j
also where is dɮ in any of those words?
joe biden
joe joe siwa?
i tried to talk toki pona to the AI i have on my pc lmfao it's stupid, i didn't expect much
i mean it doesn't even work well when speaking english
💀
pokasi ala
"E maua nuku ki te ao" appears to be Māori?
Haha that’s somehow profound
Heh I can’t help thinking of that “tawa [e] tomo” graphic
kepeken llama 3 70b. seems pretty good for an open source model but still not great
not sure why it decided to write that big explaination of toki pona since all i said was 'toki! sina pilin seme?'
ken mute la ilo li lukin mute e lipu sona pi toki pona li lukin ala e toki
ona li lukin e toki.. lon lipu sona
lonn
sina lukin mute e lipu sona pi toki pona la ni li pana ala e nasin toki pi toki tawa lawa sina
sina wile kama sona e nasin toki la sina o kute mute e toki pi jan ante
o toki mute
taso ilo li ni ala
https://chat.lmsys.org/ has a new model up called gpt2-chatbot which some people are speculating is a test for gpt-4.5?
I tested it out and it seems better than gpt 4 at breaking down toki pona - look at how it correctly notes that "kon pali palisa" is convoluted, whereas generally testing with other models generally it just accepts everything it reads as making sense
oh nice! I'll look forward to that, I still haven't been able to pick between Claude Opus and GPT 4 so if they come out with 4.5 maybe I'll be able to set aside Claude haha
Another example
lmaooo
ni li ike ala. taso ilo li toki Inli ala e toki sina
lon a
We’re so screwed! Robots can now flawlessly translate Toki Pona! ||/j||
lmao
Nauwgh þis is clearly satire
wait that's not bad
I especially liked how it played dumb about not being sure what I mean by "toki pona" exactly, since I was speaking in Toki Pona
aaaaa
‹sine› is a weird typo it made, I haven't seen it do that before
formal register? :P
ilo pi wawa nanpa wan li sona e toki pi wawa nanpa pini 🎉
a a
n sine la sona ale ala
jan kin li pakala lon tenpo
ni li lon
sina kepeken ilo seme tawa ni
mi wile musi sama sina
pona a
pona
mi toki sin tawa ilo la ona li pakala mute
mi toki e pakala pi ilo ni la ona li toki e ni: "a. mi o pona e toki mi" a a a
ChatGPT 4o has better toki pona abilities than 4
That's the new model? I should resubscribe for a month to test it out
let me compare to claude
in some places i prefer the chatgpt 4o choices and in some i prefer claude 3 opus choices
taso ilo li ken ala ken toki [pona] pona
mi alasa ni tawa ilo [Opus] la ona li sama e toki mi li musi ala
mi kute e toki ilo la · n · mi en jan mute li anpa lon tenpo poka anu seme
in my experience claude 3 opus is still way better
claude consistently identifies the source of this text for example
whereas chatgpt fails consistently (on different attempts it also thought this was from 1984 or Dune)
Same, I posted an experiment in the kama sona AI thread with the same findings
Wow, that's some skill from Opus!

good job missus
(the claude not you sorry)
seme a!?
i cam't believe it's being good at toki pona for once... it almost always is completely off base....
(playing a character who's monolingual for toki pona due to circumstances)
(being monolingual in toki pona is a debilitating disease /musi)
more literally i'd read it as "this house is bad. nothing speaks the good language" and i like translating maliciously
and there it goes, being silly again
didn't even spell toki pona right
"time thisthis of i want of number one? 😘"
haha in all seriousness i'm cherry-picking here
also it was having trouble because i meseed up my ownbtoki pona
nini
sitelen pi ilo ni en sitelen pi toki sina li ante musi
I’m surprised kape isn’t in the sandbox on sona Linku haha
kape?!
☕️
supposedly kawa is in a nanpa pi lipu tenpo
Now it is haha
i've only seen it as a tokiponization
also for coffee
i read this
it's kasi Kawa + telo Kawa
- nimi Kawa
that’s what i thought
someone was acting like it was an actual nimi sin the other day lol
or rather i guess turning it into one
yea
not a proper adjective i guess
a word that someone, anyone, has
- used
- said is a word
nooo
(Never to be used again)
pona
sokola li pona kin 🍫
kapesi
pimeja li mi, pimeja li mi
jan li kule sitelen e mi li sitelen e mi
kapesi
sitelen li mi, sitelen li mi
ona li wile ala e ni
taso ona li pona mi
-kapesi Pake
a sina pona! 👴
Yup 4o is what was being trialed as gpt2-chatbot
this is 4o I believe
Though it's unclear how distinct the variants were
that makes sense. on lmsys, they're listed separately, and in my testing, gpt2-chatgpt is better (though i haven't really used gpt4o extensively yet)
ELI5 (explain like I'm 5) what makes AI art/ discussion warrant a content warning?
These are the various reasons certain brains do not want to see AI-generated images.
If a user has:
• …paranoia, ||these images can serve as a reminder that technology can create fake faces, and that verifying an image, face, or person online is impossible||.
• …a type of dissociative disorder, ||the "uncanny valley" aspect of these images can trigger feelings of dissociation and/or derealization, which both make the lucid world feel similarly uncanny||.
• …a type of psychosis, ||unreality and the uncanny valley are both very common psychosis triggers — it does not help a brain that cannot tell shared reality from its own perceptions to see something that makes its owner doubt their own reality. And especially if a user has:||
• …a type of delusion(s), ||many common delusions center around people and faces being purposely fake or impostors, the replacement of humans, and the power of technology; and even more can be triggered by uncanny content regardless.||
• …one or more phobias such as trypophobia, necrophobia, scopophobia, etc., ||due to the way AIs build these images out of shapes and colors, they often have clusters, corpselike imagery, and unnerving eyes or eye contact.||
And these images can just be creepy and bizarre in general, too — and of course there are brains that don't like that regardless of psychology.
#da53fc whyai
o lukin e ni ↑
fair 🤷♂️
These are the various reasons certain brains do not want to see AI-generated images.
If a user has:
• …paranoia, ||these images can serve as a reminder that technology can create fake faces, and that verifying an image, face, or person online is impossible||.
• …a type of dissociative disorder, ||the "uncanny valley" aspect of these images can trigger feelings of dissociation and/or derealization, which both make the lucid world feel similarly uncanny||.
• …a type of psychosis, ||unreality and the uncanny valley are both very common psychosis triggers — it does not help a brain that cannot tell shared reality from its own perceptions to see something that makes its owner doubt their own reality. And especially if a user has:||
• …a type of delusion(s), ||many common delusions center around people and faces being purposely fake or impostors, the replacement of humans, and the power of technology; and even more can be triggered by uncanny content regardless.||
• …one or more phobias such as trypophobia, necrophobia, scopophobia, etc., ||due to the way AIs build these images out of shapes and colors, they often have clusters, corpselike imagery, and unnerving eyes or eye contact.||
And these images can just be creepy and bizarre in general, too — and of course there are brains that don't like that regardless of psychology.
#da53fc whyai
a ilo pona 😮
I think, beyond the fact that people find it offensive: it is so easy to generate vast amounts of it. The regular channels could get overwhelmed with a deluge of AI shit. At least here, things should be posted as a focus of discussion
I disagree with this as a reason. I’m in servers that don’t have a restriction but they don’t get overrun by AI content.
Also we don’t enforce content warnings on other content that could start to feel spammy if not properly moderated—we just moderate the actual spam
I think if AI didn’t make people uncomfortable we would treat it similarly—if someone posted AI stuff so frequently that it got annoying we would just tell that one person to stop
I think this AI moderation is an example of why this server is now expected to be archived
Simple over-governance in the name of being pona to people with mental illness (Hi all 👋 )
Just to clarify since I got a "seme"
This CW is completely unnecessary and frankly insulting to people with paranoia, dissociative disorders, whatever the heck y'all's reason is.
We are not fragile little marshmallows and silly pandering rules like "CW for AI because mentally ill people can't handle it" is not conducive to an open community and only makes people like us look pathetic
#toki-ale message
^^ Like this is the UMPTEENTH time I've been told to stop talking or move my chat, and y'all wondering why people are frustrated with the mod team (this wasn't even a mod, it's the Orwelian "watch your neighbor" culture that's been created)
If this silly CW for AI is enforced then why not add MORE things to CW in the future, God knows what those things might be.
The question of this server is already "Okay then, what CAN I say?"
I started this toki-suli after being told not to post AI without CW and len. I was puzzled at first. But then I did it. I don't think it's a bad rule. AI is often creepy. And we have a good little community in this thread that actually discusses it intelligently
I agree with Lapate, didn't realize you started the thread though :) I guess that [OP] label means nothing to me lol
People everywhere get uncomfortable about AI, I've just come to expect it
Also jan Towa why do you need to be told the same thing umpteen times lol
first time for AI -- guess I got some catching up on the server rules to do 
Ohhh gotcha
But the reasons given for the CW are all mental illness based
That rubs me the wrong way i guess
These are the various reasons certain brains do not want to see AI-generated images.
If a user has:
• …paranoia, ||these images can serve as a reminder that technology can create fake faces, and that verifying an image, face, or person online is impossible||.
• …a type of dissociative disorder, ||the "uncanny valley" aspect of these images can trigger feelings of dissociation and/or derealization, which both make the lucid world feel similarly uncanny||.
• …a type of psychosis, ||unreality and the uncanny valley are both very common psychosis triggers — it does not help a brain that cannot tell shared reality from its own perceptions to see something that makes its owner doubt their own reality. And especially if a user has:||
• …a type of delusion(s), ||many common delusions center around people and faces being purposely fake or impostors, the replacement of humans, and the power of technology; and even more can be triggered by uncanny content regardless.||
• …one or more phobias such as trypophobia, necrophobia, scopophobia, etc., ||due to the way AIs build these images out of shapes and colors, they often have clusters, corpselike imagery, and unnerving eyes or eye contact.||
And these images can just be creepy and bizarre in general, too — and of course there are brains that don't like that regardless of psychology.
#da53fc whyai
When illness makes life more complicated that can definitely be a struggle
But I prefer to be safe rather than sorry in the context of potentially making people struggle even more
That's the idea, but these rules are just one jan Lawa's idea of what a sick person wants.
Spoiler alert: we are not that fragile
This is where it paints illness as a weakness
Like nah dawg I'm crazy AND I can have productive conversations on a variety of topics
When you use the word "we" in this situation, you a) make it sound like you can speak for all people with mental illness and b) make it sound like you're confident that the jan lawa you mentioned is not included in the group
How I sound to you does not negate the meaning of my words, and I'm indifferent to if the jan lawa is you or anyone else
But this is opinion
I don't disagree with that
I do not mean to be aggressive and I acknowledge perhaps that's the tone that's coming off
I'm also amplified at the moment because of the scary ilo sin from earlier. Perhaps I should step away and calm down lol
Do you wanna DM?
Nah I'm gonna keep scrolling ma pona -- I was simply afraid I had turned hostile towards you
No, I'm OK :) It was getting heated but I don't think either of us are sounding like we are going to refuse to listen to others or anything
taso lon · mi tawa kin · mi wile moku a a
o moku pona! 😌
the cw is necessary for many people; ai stuff messes with me and others sometimes (i get the sense i get the light end of it, i'd want someone who it affects more to speak on this as well)
you speak of 'we' as if you can speak all people pi lawa nasa. you don't know what others experience, just because ai doesn't mess with you doesn't mean it doesn't mess with so many other people.
we are not that fragile
you say things like this twice, this actually comes across as ableist to me. plenty of people do struggle with this sort of thing, and dismissing it as 'fragile' doesn't feel good to me. it's not 'fragile' for your mind to not process certain content well, it's just a difference (that you seem to not have and so should not be speaking for)
I do think there is real frustration about disclosing to someone that one has lawa nasa and getting some kind of overreaction like being treated like a child that needs to be wrapped in styrofoam and put in a room with walls that are covered with mattresses
But I think that is not what is happening here at all; as far as I know, the content warnings policies here are the result of direct, personal requests from sufficient numbers of people
yea
the rule was created because real people in significant numbers wanted it to help with their mental illness, not sure why the discussion doesn't end there
It sounds like it might be worth adding that to the bot blurb, this is like the third time there has been vocal doubt about this haha
i'd like to add here that i fit into two of these specific groups and AI images are definitely usually fucking freaky to me
many of them cause dissociation and many of them trigger my phobia
like this isn't just for mental illness in general, it's for specific people with these specific mental illnesses who want AI art to be spoiled
@desert crag don't presume to speak for every person with a dissociative disorder, psychosis, paranoia, delusions, or a specific phobia
not all images will do this but most will, especially with dall-e 3
mi open sama la ilo li,, kama ike anu seme · ken la ona li ike lon tenpo li pona lon tenpo
ike! mi sona e ni: sina wile musi tawa mi. taso mi ken ala musi tan ni: mi jan ala, mi ilo kepeken e nimi pi jan. taso mi ken pana e nimi musi tawa sina:
o lukin e sona pi jan Toki Pona! ona li pana e sona musi pi toki Toki Pona. sina ken kepeken e nimi ona lon kulupu pi jan musi.
o musi e sina sama ni:
"toki toki toki a!
mi lon tomo mi, lon poka pi jan ala.
o musi e mi a!"sina ken pana e nimi ni tawa jan lon kulupu. ona li ken pana e nimi musi tawa sina.
mi pilin e ni: sina ken musi sama ni!
sina kepeken ilo seme
ilo [Opus] pi kulupu [Claude] lon tomo https://chat.lmsys.org/
a nasa · tenpo mute la ona li ken toki pona
ilo ChatGPT pi nasin GPT-4o li toki pona ||(tenpo nanpa wan la mi pana e sitelen tu ni tawa #pali-musi, taso mi pana sin e ona tawa ni)||
toki musi ona li nasa li ike ala. ona li ante pona e toki pona tawa toki Inli.
lipu Wesi pi ilo [Claude] la, jan li toki e ante pi ilo ni. jan pali li wile ala e toki ike, li ante e nasin ona. tenpo mute la ni li ken lili e ken ante
lon
ma ante la, mi ale li toki e ilo ni. ona li ken ala ken toki ante e toki sewi Psalm MTW? ilo TokiKPT 4o li pali ike e ni:
ilo [Claude 3 Opus] li pali pona:
ni li pona ale ala. (ilo ala li ken kepeken nimi "pi" lon nasin pona). taso sina wile ante e toki ni la, ni li open pona.
sike kama la ilo o ken toki pona
Whoawh when did tokiKPT get good at tp
I mean it’s not good good good but it’s pretty good
We are totes cooked in þe coming rest of þe year RIP toki pona translators (not really we all have our own nasin of tp)
lon la, tokiKPT pi kule 4o li ken ala toki pona a. tenpo pi mute lili la ona li ken ni. ante la, ilo [Claude 3 Opus] li ken toki pona wawa.
why is this so funny help
either I'm misreading it or the kitten said "I'm feeling good" (in relation to sweet fruits) and he went "good! you will learn toki pona."
haha yeah XD although I think it might be meant more like, "You've learned toki pona!"
and then the kitten went on to speak toki pona in other countries ✨
yea, it understands stuff like li reduplication for multiple verbs, la context, o vocative
and all of that without any manual guidance
gpt 3.5 would write repetitive stories even with guidance
very impressive
toki pona ai song 'by' someone who left toki pona
i move the toki pona community a a a
where can i see the full thing
it's mostly not in toki pona is the thing
wait is the same jan Kate as that pan Katherine Alexander
true
wow this is .... not good
boring ass music
aaaa it doesnt even end properly
it just cuts off
what the-
this whole thing is so weird ill go
??? what the what
what does “sina wile moku ala wile” mean
I think it's a nasa way of saying moku la sina wile ala wile
The nasin would require that ala as a modifier can apply to the head noun even when ala itself is the second+ modifier in the phrase
I have personally used this construction in the past when I was a beginner
sina wile ala wile moku
ah. that's me being bad sorry
ale li pona! Ideally it should know how to deal with any situation so there’s no need to hold yourself to a high standard when querying an AI
all good
hold on i just realised its not called a pavlov chin
got it
I programmed a Markov chain based toki pona chat bot model a while ago!!
I prob should return to that project soon
(did you know that if you add a -ka to the end of Markov, like Markovka, it sounds like the word for carrot 🥕 in Russian)
(I have some local Russian friends with the same family name, Markov, who are all painters in the family and their gallery has lots of paintings of carrots)
I actually did :D my frend is learning russian and when I talked to them about that project, they found it kinda funi :3
(Cause they heard that i said markovka not markov)
aha funny
this is similar to how lipaman became lipamanka
a ni li suwi
markov chains are a very primitive form of generative ai
yay :3
on the fediverse there is a bit of a trend of users running markov chain bots trained on their profile that end up spitting out funny posts
but they were cluttering up my timeline so i blocked the word markov lol
markov chains are also much more general than just text generation, tho thats what theyre most commonly visibly used for i guess
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain mov...
A Pavlov chain just drools whenever it hears toki pona. Which is accurate about most AIs
i was just looking into ChatGPT's "unique" abilities in maths, when i found this: 4 is a negative number now 😂
in other news i managed to successfully "teach" it new invented notation (#toki-ale message) where x N= y means |x| = |y| and it took me through an explanation of why
𝜋+𝑖 𝑁> 1+2 𝑖
is a true statement
how does this say what youre saying it says
-7 is the negative real number and 4 is the real number
it turns out, that i am the stupid one
maybe i am the ai 😔 /j
yeah this makes sense to me
although the way the ai formulated it, it was being a little unhelpful
"here's an example of X and example of Y: [example of Y], [example of X]"
yes that is what got me
this "translator" is probably literally just a ChatGPT interface with the custom instruction "Please translate the following text into <language>:"
yeah
why is this one lowkey good tho
the second one is... medium
suprisingly good for an ai
sina jan ike mute! you are many bad people!
the new generation of chatbots are getting scarily good at toki pona
just a year ago they kept spouting "mi pona! sina pona? mi toki pona!"
maybe it's from token efficiency or something idk im not a jan pi pali ilo
look it was intensional to test if the AI can understand silly grammar
oh i used this once but not for tp lol
i somehow fit all its yapping onto one ss
what a mess
Can anyone confirm the translation from Claude?
One day, a little girl named Lily found a needle in her room. She knew it was difficult to play with it because it was sharp. Lily wanted to share the needle with her mom, so she could sew a button on her shirt. Lily went to her mom and said, "Mom, I found this needle. Can you share it with me and sew my shirt?" Her mom smiled and said, "Yes, Lily, we can share the needle and fix your shirt." Together, they shared the needle and sewed the button on Lily's shirt. It was not difficult for them because they were sharing and helping each other. After they finished, Lily thanked her mom for sharing the needle and fixing her shirt. They both felt happy because they had shared and worked together.
tenpo wan la jan lili wan nimi Lili li lukin e ilo lili pi palisa lili lon tomo ona. ona li sona e ni: ilo ni li utala ala pona tawa musi. Lili li wile e ni: mama ona li jo e ilo lili ni, li pali e len lape pi Lili kepeken ona. Lili li tawa tomo mama ona li toki e ni: "mama o, mi lukin e ilo lili ni. sina ken ala ken jo ala jo e ona, li pali pona e len lape mi?" mama Lili li pilin pona li toki e ni: "pona, Lili o, mi jo e ilo lili, li pali pona e len lape sina." ona tu li jo pona e ilo lilli, li pali pona e len lape Lili kepeken ona. ni li utala ala, tan ni: ona tu li pali pona pali e ni. pali li pini la Lili li toki "pona tawa sina mama o, tan sina jo e ilo lili pali pona e len lape mi." ona tu li pilin pona tan ni: ona tu li jo ala jo e ilo ni, li pali pona anpa pali wan.
lots of grammar mistakes and mistranslated bits but the overall plot is there
Thanks for checking.
oh man it translated shirt as sheets didn't catch that
no mention of sharing
or the button
what version of Claude was this, by the way?
Claude 3 Sonnet, free version
Ah gotcha
Here's Claude Opus (the paid version)…
Here is my attempt at translating the short story into Toki Pona:
tenpo suno wan la, jan lili meli li kute e ilo pi linja mute lon tomo ona. ona li sona e ni: musi kepeken ona li pona ala tan ni: ona li ken pakala e selo jan. jan Lili li wile e ni: mama meli ona li ken kepeken e ilo ni. ni la, mama meli li ken pona e len pi jan Lili. jan Lili li toki e ni tawa mama meli ona: "mama o, mi kute e ilo pi linja mute. sina ken ala ken kepeken poka mi e ona li pona e len mi?" mama meli li sinpin pona li toki e ni: "jan Lili o, mi ken. mi ken poka sina kepeken e ilo li pona e len sina." ona tu li kepeken e ilo li pona e len pi jan Lili. ni li pona tawa ona tu tan ni: ona li pali poka li pana e pona. pini la, jan Lili li toki e ni: "mama o, sina pona tan ni: sina kepeken poka mi e ilo li pona e len mi." pilin pona li kama tawa ona tu tan ni: ona li pali poka li wile pona.
Here is a rough translation back to English:
One sun time (one day), a little girl person heard (found) a many thread tool (needle) in her room. She knew this: playing using it is not good because it can break (damage) a person's skin. Girl wanted this: her mom can use this tool. With this, mom can fix Girl's cloth (shirt). Girl said this to her mom: "Mom, I heard (found) a many thread tool (needle). Can you use it with me and fix my cloth?" Mom face good (smiled) and said this: "Girl, I can. I can use the tool with you and fix your cloth." The two of them used the tool and fixed Girl's cloth. This was good for the two of them because they worked together and gave good (helped each other). Finished (after), Girl said this: "Mom, you are good because you used the tool with me and fixed my cloth." Good feeling came to the two of them because they worked together wanting good.
Sounds fine. Now if only if we did this to all the TinyStories V2 database, lol.
I think that is kind of missing the point of Toki Pona personally
I mean, my goal was to make a model even smaller than the smallest tinystories model
The journey of translating something into Toki Pona is the most important part, especially since there isn't a monolingual community that is missing out on being able to consume that story due to having no translation of it in their native language
which I know is not gonna work well because you gotta have a good understanding of a "context" I think
yeah, I also was thinking of changing my topic to how toki pona can make a fine international language for basic communication
Oh are you writing a research paper or something like that?
yeah
I am using karpathy's nanogpt for training
although I kinda wanna start with the tokenizer since that would be pretty easy to deal with, since toki pona lol
You’re trying to use AI models to prove that Toki Pona can be used for international communication in your paper?
not exactly, I am trying to use an ai model to develop an extremely tiny model
smaller than "tinystories"
which would be really hard, but that is just an optional thing
But how will that help you argue your thesis about it being a good language for international communication?
that isn't my argument... yet- I probably won't have enough time to develop it lol
I may change the topic is what I mean
kinda... also how language would be developed within an extremely small brain
tinystories was kinda about that, it was a big paper actually
but toki pona seems more enticing, even though the model would probably not be able to actually use the language since there is no way that they would build a world model to use it
but not like tinystories had it either lol
seme a
But really, aren't all words fictional?
relating to fiction; invented for the purposes of fiction.
i don't think most words we use were invented for the purposes of fiction
true
plot twist: english is a fictional language and we actually live in a dystopian novel
lmao
is anyone there
mi lon a
chatgpt says "burger" is "moku lete pan kili" 
In Toki Pona, specific foods like "burger" often need to be described using combinations of existing words since the language has a small vocabulary designed to express concepts simply. A burger might be described as:
"moku lete pan kili"
Here's the breakdown:
moku = food
lete = cold (often used to describe prepared or processed foods)
pan = bread (representing the bun)
kili = fruit or vegetable (representing the vegetables typically in a burger, such as lettuce or tomato)
So, "moku lete pan kili" conveys the idea of a burger in Toki Pona.
Claude Opus, jan Sonja o?
it seems like it knows the original because it trranslated Siddharta and Govinda from nowhere
it was able to interpret the context and the likely name?
i suppose
ken la ona li moku e lipu https://tokipona.org/sitata/
taso sona lon kin li ken · #1162756469771472988 message
Chat gpt did it well
toki ni la, ilo li sona wawa e toki pona e ijo pi tenpo pini weka
The first sentence is unclear to me. Might make more sense with more context like what you asked to get this response.
But it sounds something like: Don’t protect your good feeling! Feeling good is always good.
heres the context! :3
Nullify your happiness permanently!
oof...
@waxen monolith you can try asking the AI for a breakdown of what it meant, I think the “ala awen” part is a mistake
I was thinking of asking in English, specifically for a word by word breakdown
ah okay, what would you say?
Can you translate “insert sentence” step by step.
o kipisi e sitelen. mi sona ala.
I would say it like:
o awen pilin pona
continue feeling good
o pini ala pilin pona
don’t stop feeling good
Not entirely sure whether it should be “e pilin pona” or if this works also
The new Claude Sonnet 3.5 (which jan Sonja tested out above) seems to be very, very good at toki pona. I've only just started with it:
Oh my goodness. I gave it a pdf of the latest lipu tenpo, and asked it find the article about Latin by jan Imi, and summarize it for me in 5 or 6 bullet points. It did it incredibly well!
I think I'm going to be throwing my money at these people again. Let's hope they don't ruin it out of "safety" concerns, as they seem to have done with Opus.
Oh it's good
It hallucinates wildly over sitelen pona, however: https://aiarchives.org/id/kIapTQiuo2n9QFDEdMYK
Here are its three flawed cheatsheets:
(just partial screenshots, because pandoc had difficulty processing the markdown too)
I love the version with descriptions of sitelen pona. It clearly has no idea what they actually look like, but just makes up descriptions that would fit the definition of the words.
What happened?
I let my subscription lapse, but follow the comments on Reddit especially. Once the model was released, Anthropic applied more and more guardrails to it to stop any responses on controversial or NSFW topics. But as so often happens, limiting a language model's response in one direction undermines its abilities in other harmless and useful areas - in an unpredictable way, since we just don't really know what's going on under the hood. That is at least the subjective impression of users - which seems plausible, given the examples they posted of the deterioration of coding ability etc
musi lili lon ilo ChatGPT
mi pana e wile ni tawa ilo: mi toki pona tawa ona la ona li pana e ante toki mute lon toki Inli
Would it be better if there was a very concise description of all of toki pona at the start of the chat?
Or maybe put into the system message
How short can we write a full description of toki pona, including all pu words at least? By full description, I mean someone could translate to and from toki pona without any prior knowledge, given that text alone.
[ilo penpo o lukin ala]
translate goodhart's law into toki pona
Goodhart's Law is a principle that states "When a measure becomes a target, it ceases to be a good measure." Translating this concept into Toki Pona is challenging due to the language's limited vocabulary and simplified grammar. However, we can try to express the core idea:
"nanpa li kama suli la ona li kama ike."
Breaking this down:
- nanpa = number, measure
- li = (verb marker)
- kama = to become, to come
- suli = important, big
- la = (conditional marker)
- ona = it
- li = (verb marker)
- kama = to become, to come
- ike = bad, wrong, evil
A more literal translation back to English would be: "When a measure becomes important, it becomes bad."
This translation captures the essence of Goodhart's Law, although it loses some of the nuance of the original statement due to Toki Pona's simplicity. It conveys the idea that when we focus too much on a particular measure, it becomes less effective or useful.
It's not in plain text (contains formatting), so LLMs may not be able to ingest it easily. Also, the formatting is broken on mobile.
That looks good. Does it enable conversations (without an explicit step of translating)?
ilo li lukin e toki ale la ona li lukin e toki pakala mute li kama sona lukin e ona
ken a · sina ken ni kepeken mani ala lon lipu https://chat.lmsys.org/ · o tawa poki Direct Chat o wile e ilo claude-3-5-sonnet
My piece on ChatGPT is coming out in the next (I think) lipu tenpo. Unfortunately @weak thorn the editors didn't like TokiKPT, so it's Toki Kipisi now. Which I guess is ok.
tenpo lili la mi kepeken ilo “Claude”. ona li sewi.
NOOOO
that's not right
Dr. John W. Payne
i generated it again and nw it says sonja lang made it but it still say 93 words
apparantly he's an ophtalmologist in Baltimore:
https://www.md.com/doctor/john-w-payne-1-md
Visit Dr. John W. Payne, an ophthalmologist in Baltimore, MD. Are you Dr. Payne? Sign up for MD.com.
neither linguist nor mathmetician
we should host our next north american toki pona meetup in baltimore and pay him a visit
holy moly
AI erasing the accomplishments of women by replacing them with fake men, whats new
mi la ilo li ni tan sona ala tan ala wile ike · ni li tan ma Perchance anu seme · mi sona ala e ilo toki ona · taso ona li lon ilo nanpa ona li lon ala ilo suli pi kulupu suli la ona li lili anu seme
ken la ona li pilin lon nasin ni: pali toki li tan jan sona. jan sona li mije.
tenpo pi mute nanpa wan la ni li lon anu seme. ilo li lili la ona li ken awen sona e ijo "suli" taso
ilo li toki sama toki ilo ale
mute a la toki pi ma ilo li ike ni la ilo ni li toki sama
ni ala · lipu ala pi ma ilo li toki e toki pona e jan ni kin
ilo li lili · ni taso li tan
,,lon la ken la selo li nasa e ilo · nanpa [Temperature] li suli la nasa li suli la nimi pi wile ala li kama ken
ni la, ni li ilo ala pi sona ilo anu seme. ilo ni li sama ala ilo ChatGPT li "sona" ala e ijo. ona li pali e toki kepeken nanpa pi nasin ala. ilo sama li ken pali e jan e monsuta tawa musi "lupa suli en akesi monsuta seli."
suli la ni · taso jan ona li pali e namako sona e namako sitelen kin
Hi all, would anyone be interested in providing me some feedback? Would be greatly appreciated: https://forms.gle/7jEUdBBVumwLvjoS9
The Anthropic models such as Claude Opus have toki pona down pretty well
So far it’s the only one that sounds natural and doesn’t make many mistakes
Thanks for the tip!
So far, it seems people prefer my own model more than chatgpt 4o, which is surprising and awesome! However, the human translations for the Bible verses remain victorious.
Valuable input from y'all, thank you!
well lipu sewi has like, a full-on professional historical linguist on hand
Hard to beat that
Yep true, I'm going to use their works as a benchmark
Hi all, I'm open to take requests / questions to ask the model.
also here are the results from the survey: (yellow is my model, red is chatgpt4o, and blue is human translation)
here are the other results but red is my model, and chatgpt4o is blue
could you provide any specs of your model? what kind of model des it use? where did you get your training data? i'm interested since your results are far better than anything i've gotten :3
It would be interesting to do a similar thing with Claude Opus or Sonnet 3.5
mi ni la ona li wawa ale
tenpo ni la jan ale li toki kepeken toki wan.
Apaman li toki e ni: "sina pana ala e jan lili tawa mi. ni la jan pali mi li kama jo e ijo mi lon tenpo kama."
tomo palisa Papele li toki sona. jan li wile pali e tomo suli tan pilin ike. ni li ike tawa sewi Jawe. sewi Jawe li pakala e toki jan. ni la jan li kama toki ante li kama lon ma ante. toki ni li pana e sona ni: o kama lili. o kute e wile pi sewi Jawe.
toki pi jan lili ni li suli ala li pona ala. ona li toki ala e ike suli e wile pali wawa.
@dense nexus nice job training a better model than ChatGPT for toki pona! That’s really cool! 😄
Like soweli Lapate I do wonder how it fares against Claude which in my recent experience is also quite good at it
Of course! Here are some details:
- I'm using llama8b instruct
- training data is from a corpus, taobea (but I didn't give it any English from taobea), and the entirety of the bible project.
Thank you!!!
I'll try another survey comparing my model to Claude and human translations. I think I can take a crack at it in the future :-)
kulijo a! would you be interested in automating the testing of the models? i have a script to rate text written in toki pona i made a while ago
it's based on the telo misikeke rules
oh sorry for the late reply! that would be amazing!
feel free to dm me if you want to get my attention faster
i can publish it on github when i get home from my trip to da woods (like 3 days) so you can doenload it from there!
awesome! have fun out there too!
thanks!
hi all, im moving my toki pona chatbot progress into here: #1255967996527116400
Well, it got the "li" part - but didn't address the overall nasa-ness. Which model of Claude was this?
I tried it out with Sonnet 3.5:
Opus 3 actually does a better job spotting the problem, with the same prompt:
Yes, I didn't mention that it also didn't get it right in the end!
Just that it spotted the problem with the original sentence
a a a
(I guess you could say jan li nimi e mi kepeken nimi Sam - but why would you?)
mi ken toki e ni: jan li nimi Sam e mi
ni kin li pona
Claude can still toki pona better than any other LLM - but (like any LLM) it can't be trusted on the details
TW lili L
taso pilin mi la "mi nimi Sam" li nasin pona
"i name [modifier Sam] to myself" "i am named in a Sam-ly way"
lon la pakala ala li lon, taso nasin li lili kepeken
nanpa ni li seme a?
mi sona ala a
asking chatgpt to write a poem in toki pona yields interesting results:
write a poem in toki pona
tenpo suno pini la
mi tawa lon ma tomo
mi kama lon tomo lili
mi pilin pona kepeken toki sina
sina toki e ni: "sina pona tawa mi"
mi pilin pona kepeken nimi sina
sina kute e mi kepeken pona
mi pilin pona kepeken sina
tenpo suno kama la
mi tawa tan ma tomo
mi kama lon tomo lili
mi pilin pona kepeken toki sina
sina toki e ni: "sina pona tawa mi"
mi pilin pona kepeken nimi sina
sina kute e mi kepeken pona
mi pilin pona kepeken sina
"mi pilin pona kepeken sina" mhm...
i think that's not necessarily bad
i don't think to kepeken jan in toki pona has the same negative connotation as using a person does in english
although it is a nasin nasa
my brain went straight to unpa but that might just be because i have been scouts leading a flock of horny middleschoolers the whole week...
re: #toki-lawa message
i am currently confused by the order of words on that page
why are words like kapesi and power mixed into the regular alphabetic sorting but kiki, pake, and tonsi are at the very end
my favourite image is anpa
because it’s literally just a soko
tonsi also has a lot of soko
kijetesantakalu tonsi li lanpan ala lanpan e ona??
lol im looking at the rest of the website its very funny
there is a page dedicated to inserting apples into german idioms
of course done by ai
wait what the fuck is this definition for tonsi
i have so much lore on this guy gimme a sec
owo
ok so
- carsten stemm, owner of the website and creator of the ai toki pona course, of bielefeld, germany
- also known as psychosaurus (
@psychosaurus.on our very own ma pona), uses that motif a lot - a whole page of his website is called psychosaurus
- in a story about Bing AI, a prompt he creator used was "Can you tell me a story in which the keywords psychosaurus, heart and language appear?"
- many toki pona examples are full of dinosaurs
- was already a tokiponist in 2004
- here is an archive of his website
- it's linked here on sona pona
- a published author; one of his books is called "Confessions of a psychosaurus: Nuts of thoughts to nibble on"
- he's a german nimisinner, to quote his website (machine translated into english, i can grab the german if one wants):
I write poetry in Bielefeld and regularly work on expanding and improving this website, as long as I don't fall victim to a biofan. I'm reluctant to use words that others have already used. I find that unhygienic. That's why I often invent new words, which creates the problem that no one understands me anymore. As a result, I have to use old words that the German vernacular has only formed imprecisely.
- the rest of his website is mostly an ai art gallery
gimme the german quote
its frustrating right because a lot of this would be really cool if it wasnt ai
a a a
biofanten leads to http://carsten-stemm.de/hp/y/biofanten.html
apparently this person was doing toki pona stuff back in 2004
good joke
i am developing sympathy towards this guy
what's the joke
maybe because some of this stuff reminds me of my dad
yea i sent this
i feel like he and my dad would be friends
lol
wordplay kinda hard to explain
a mi lukin ala e toki ni pini
sona
time capsule
"pata" is a word ive never heard referenced before
i think it's the sibling word
interesting this uses ali rather than ale
tbh sibling word feels like something that would be useful to have idk why it went out of fashion
jan sama
aaaa tan seme
it's literalyl in the definition of sama
i think sibling is part of sama's semantic space for a lot of people
hmm
kind of like how sike has year and mun has month
it's a specific thing that the word references
i guess i cant really explain why it doesnt vibe right for me
maybe ill find a way to express that in the future and make a post about it then idk
sorry for distracting from the ai stuff
@chrome sleet do you have any memory of this early adopter/supporter? hehe
i don't think i know this person
a · ale li pona
aa mi weka e ni lon lipu linku lon tenpo poka. sina pilin seme tan ni
taso mi awen e "fellow, peer, each other"
hey all lurking, i need some help collecting feedback with my latest model, the link to the gradio demo and more information is located in: #1255967996527116400 message
feedback optional, but would be very helpful!
i wanna return to this for a moment
when and how did tonsi ever mean radioactive????
I think it's just an LLM hallucination. It decided to translate soko tonsi as mushroom clouds - and then had to invent a meaning for tonsi to justify it.
hi all! currently, im running a dataset collection that requires the community to add to it. if anyone here wants to contribute to the toki pona dataset, DM me for the url / password! feel free to ask for more details
we should make some kind of CONTROLLED effort to gather a corpus from #toki-pona-taso and #practice-toki-pona-tpt
oh and possibly also #sitelen-ucsur
something like everyone ping opt-in/out
i don’t think that’s really a good idea
i wouldn’t want my messages to be trained on an ai lmfao
what even is the point of a toki pona ai anyways?
they are pretty big imo, learning toki pona in toki pona, just chatting, the stuff you do with ChatGPT in toki pona...
that's why i said controlled haha, i do not like the approach of big ai company datamining either
collecting data from users is a big no no, opting in would be nice, but logistically, it would be quite difficult to get everyone's messages from that channel (targetting specific users)
maybe people can request their data from discord and submit their messages from that channel?
the discords APIs is preetty permissive! (at least with bots) and if the project is done within the API request limit, we could possibly complete it within a month
im not sure if discord gives the message data with the package...
it gives you message data per server, and in the specified server, you can select which channel ID you can access all your past messages
ookay, that's nice ig
but the bots idea would be better for simplicity
also, creating a big corpus of data is super nice however, when training a chatbot / llm, its much more useful when giving it hand-curated data for it to learn in a chatbot environment. right now, my data collection UI allows people to submit new entries by hand, which i find to improve the model's accuracy dramatically (about every 20 - 40 entries brings a large performance boost in terms of accuracy, as measured in https://telo-misikeke.gitlab.io/ and by other users)
although i do feel like pre-training the model on more data would be equally as helpful for more knowlage of toki pona. currently, i pretrained my model on 120K lines of toki pona, which seems to work great, but still fails to connect complex ideas
exactly haha
also, the discussion is pretty optimal training data for a chatting oriented ai
very true
o ni ala a
tenpo weka la jan ante li wile pali e ilo pi toki [pona] la tomo sin #tpt-tawa-ilo pi moku ilo li kama · jan li wile ala e ni → toki ale li moku
lon la ona li pana kepeken suno wan anu tu
mi pali e ilo · taso ona li nasa li pona ala tawa jan pi sona ilo ala · taso mi pana
ona li kepeken ilo pi sona toki tan mun [Kekan San]
this is actually a request i could fulfill nearly the instant somebody gives their consent to release their tpt messages
you have a neat database of the messages?
well that's nice haha
(reposting from #1255967996527116400 ) I started a github repository to provide the latest ORPO dataset for those interested: https://github.com/johnpaulbin/toki-pona-orpo DM me if you wish to contribute to the dataset!
If you’d like my consent for your specific data I can give it to you
This is something I've discussed in other servers, but my specific data does not require user consent so long as I do not publish the original data
I will only be publishing derived information, which in this case is only statistics about word frequency
It's also true that, for better or worse, I am publishing all of the code I am using to do this project
This means that consent is easily overridden by a single bad actor
But as a point of reference, I do similar work for my job; My job literally would not exist if the law did not protect the work I'm doing
have you tried adding tatoeba's corpus?
Yep
how did it go
cool
I just gave Sonnet 3.5 my (published today!) lipu tenpo article on why AIs will never be able to toki pona - and it translated it flawlessly back to me.
i was testing out an ai website that can generate a fake website for you, and well
Did it come up with the toki pona examples by itself?
Interesting. It's odd how it divided up the two definitions of "musi" (which aren't really distinct)
I wonder if this is all in its training data, or whether it scrapes other websites on request. Did you ask for it to make something like linku.la, or give it that URL?
I gave it the linku.la URL at first, but it didnt generate a toki pona dictionary at first, but when i added /tokipona/word/musi, at the end it gave me the musi page
When i click on word list, it doesnt give me the full dictionary strangely
Another thing is that all the experimental words have sitelen sitelen meanwhile pre-pu and post-pu do not
Im gonna go test if it can create a tool to learn it now
alright done, i can share the link if wanted
Sure. It's pretty clear that it's using existing information, almost verbatim, to create this website. It would be a challenge to get it to make something that is not on the web. For example, a toki pona dictionary for Latin speakers.
Also, I had the greatest difficulty getting Claude Sonnet 3.5 to make a wordlist with sitelen pona - it just couldn't do it. See if it's able to consistently add the sitelen pona.
it seems to think "hello" is pona and not toki
now im curious, could it be possible to learn a language only using AI
by word type
I guess one is a list of nouns, the other adjectives. But if you were dividing it that way, you'd also have to have verbal usages too
I asked it for a conlang dedicated language learning website, it has 4 courses, espernato, toki pona, klingon and lojban
attempt 2: added more
Post the link if you want us to test it!
A quick note, it seems AIs only weakness in tp are proper nouns. They dont use headnouns and don't seem to understand tokiponizing
I wouldn't say it's their only weakness. They have great trouble (even the best) anticipating when to use pi, or repeat a preposition.
tenpo la ona li kepeken nimi en lon nasin ike sama nimi "and" · ni li ken tan ni → ilo li sona wawa e toki [🇺🇸] li kepeken ona lon insa li [pona] e ona lon selo taso
msa. mi ale li sona ale ala e pali insa pi ilo ni. ona li toki insa Inli, li toki pona e toki ni anu seme? mi sona ala. ken la, ona li kama sona la, ona li moku e toki pona mute pi nasin ike. ilo sin (sama Claude 3.5) li pakala lili taso sama ni
i was working on a language learning thing based off the toki pona courses and
ah cool!!!
Looks interesting!
HI ALL! back at it again with another toki pona related project. I got whisper working with toki pona! https://discord.com/channels/301377942062366741/1259027005781970974
it's...sort of been done
[Reply to:](#1162756469771472988 message) now im curious, could it be possible to learn a language only using AI
you ever seen xiaomanyc (AMERICAN goes into CHINESE RESTAURANT and speaks PERFECT CHINESE, STUNS WAITERS)? he got some level of portuguese with ai, although i'm not sure if i trust that
and i don't think that's the only place i've seen it done? idk
thats true! xiaomanyc learned german in a week using only google transalte, to prepare for a news interview, although he's at an advantage since he studied a lot of languages before so he knows what to study specifically.
i was thinking of portuguese
So here's a ChatGPT test on how it could handle translating Toki Pona text to English, using soweli Elepanto's poem jan lawa Oliki from Tokipono: La lingvo de bono as a basis
https://chatgpt.com/share/1d9691e5-28a9-41a4-845d-e3123781fc56
Obviously, nothing beats a human translator, and trying to do the same thing from English to Toki Pona brings...MORE questionable results
I wonder how Sonja Lang feels about ChatGPT basically accidentally creating something close to a near-accurate-but-not-quite Toki Pona machine translator with the right prompts. Probably not too positively.
jan Sonja is often on this thread, and tests out LLMs a lot! I can't speak for her, but she seems interested, not upset.
Yeah, I think AI systems like ChatGPT can actually be GOOD for revitalizing language translation, including the translation of endangered/obscure/dead languages even.
I know AI can also be used for bad, but it seems there have been more people who have been only seeing the bad while ignoring the good (vice versa is also an issue too, especially with art and writing)
a skilled person who uses both languages well does a better job at translation
https://websim.ai/c/LalzRlciImvIj24pB Ai generated toki pona dictionary
ive tried doing some translations with Anthropic on Crowdin, its really hit or miss, it seems to be 374590x better at going FROM toki pona to english unfortunately
100% agree, even applies in machine learning
Recently with my own toki pona llm, I only gave it toki pona text to learn from. And yet, after fine-tuning it to answer instructions, it managed to correctly answer "name all toki pona vocab that are animals" despite not having been given that information via the training data directly.
The model seems to be able to "think" in toki pona after training it, which is super promising.
I tried Claude Sonnet 3.5, with instructions to be friendly, non-AI like in all interactions etc. It did very well in toki pona. But using it "straight" via the API and it's full of beginner errors. I've heard it said before that if you talk to an LLM as a fellow human, and have good prompting with expectations about personality, its ability at a lot of intellectual tasks is much greater.
Hi all, as I mentioned a while back, I am still accepting contributions to the toki pona chat dataset. Right now, the dataset is still very in it's early stages, but i believe with enough help, it could out-perform Claude 3.5 in the near future. (Image attached is the data collection UI, more info about it is inside the About/Help tab) I'm going to be sharing the data collection url & password here as a test run to see how far we can get with it -> https://c880ae1387fed6afe6.gradio.live/ password: ||P01Mdigh@9a3@39%(@askjwdMSanmwd#1L)|| I'm not sure if there's another place to place this information (if there is, let me know!), so getting eyes on this is top priority.
thank you for those who contributed today! i have to shutdown the UI for now, but i'll repost the updated link tomorrow.
"seme li kon lon li kon ale" seems like a bad translation
You're free to edit any existing entries inside the edit tab :-)
updated link: https://df557c7a29649b9891.gradio.live/ same password as above.
going to have to halt dataset curation until im back from a week-long trip, thank you for those to contributed!
if u wanna fish
move not treeish sky
I meant it as don't go on top of the tree
you sholud modify tawa, not sewi kasi then
(o) (tawa) (sewi kasi) ❌
(o) (tawa ala) (sewi kasi) ✅
These are the various reasons certain brains do not want to see AI-generated images.
If a user has:
• …paranoia, ||these images can serve as a reminder that technology can create fake faces, and that verifying an image, face, or person online is impossible||.
• …a type of dissociative disorder, ||the "uncanny valley" aspect of these images can trigger feelings of dissociation and/or derealization, which both make the lucid world feel similarly uncanny||.
• …a type of psychosis, ||unreality and the uncanny valley are both very common psychosis triggers — it does not help a brain that cannot tell shared reality from its own perceptions to see something that makes its owner doubt their own reality. And especially if a user has:||
• …a type of delusion(s), ||many common delusions center around people and faces being purposely fake or impostors, the replacement of humans, and the power of technology; and even more can be triggered by uncanny content regardless.||
• …one or more phobias such as trypophobia, necrophobia, scopophobia, etc., ||due to the way AIs build these images out of shapes and colors, they often have clusters, corpselike imagery, and unnerving eyes or eye contact.||
And these images can just be creepy and bizarre in general, too — and of course there are brains that don't like that regardless of psychology.
#da53fc whyai
someone should make a gpt that gets information from this server and kama sona and makes the gpt become a translator
and it could probably also get 56 nifty tricks…
how would it translate
ilo pi mKS can pretty confidently determine whether text is toki pona or not
but for a model to do translation, it needs to be fed on data where the same thing is said in both languages
and i dont think theres enough of that around to make a good translator
not to mention that the extreme context dependency of toki pona makes good machine translation in general very impossible for this language
would feeding the gpt a toki pona list of words with definitions and the grammar of this language and an english dictionary be a better option
wait what the heck is ilo pi mKS
no
oh
i don't even know exactly how a gpt works
theyre the next word suggesting thing your mobile keyboard does, just bigger
they need to be fed input that matches the format they are supposed to output
the big gpts currently going around only work because theyve literally scraped the entire internet for their language data
mun Kekan San made a tool that can identify whether a text is toki pona or not
i forgot its name so i just called it that
since the term gpt is still here,
This
And this
(English):
Germany
(Toki Pona):
alijuna li toki e toki "Doman" li lon ma tomo pi ma Elopa. Ma tomo pi ma Elopa li lon ma tomo pi ma Mewika. ma tomo pi ma Elopa li pona tawa toki pi jan ali. jan Mewika li toki e toki "Germany". jan ali li ken toki e toki "Germany" kepeken toki pi toki pona. sina ken toki e toki "Doman" kepeken toki pi toki pona. sina ken toki e toki "ma tomo pi ma Elopa" kepeken toki pi toki pona. sina ken toki e toki "ma tomo pi ma Mewika" kepeken toki pi toki pona. sina ken toki e toki "jan Mewika" kepeken toki pi toki pona. sina ken toki e toki "jan ali" kepeken toki pi toki pona. sina ken toki e "jan" kepeken toki pi toki pona. sina ken toki e "ali" kepeken toki pi toki pona. sina ken toki e "pona" kepeken toki pi toki pona. sina ken toki e "tawa" kepeken toki pi toki pona. sina ken toki e "lon" kepeken toki pi toki pona. sina ken toki e "ma" kepeken toki pi toki pona. sina ken toki e "kepeken" kepeken toki pi toki pona. sina ken toki e "ken" kepeken toki pi toki pona. sina ken toki e "sina" kepeken toki pi toki pona. sina ken toki e "e" kepeken toki pi toki pona. sina ken toki e "toki" kepeken toki pi toki pona. sina ken toki e "li" kepeken toki pi toki pona. sina ken toki e "ali" kepeken toki pi toki pona. sina ken toki e "Doman" kepeken toki pi toki pona. sina ken toki e "jan" kepeken toki pi toki pona. sina ken toki e "Mewika" kepeken toki pi toki pona. sina ken toki e "tomo" kepeken toki pi toki pona. sina ken toki e "Elopa" kepeken toki pi toki pona. sina ken toki e "pona" kepeken toki pi toki pona.
alijuna
sona toki
[Reply to:](#1162756469771472988 message) i forgot its name so i just called it that
It was too big to fit in a screenshot
It should be called ilo pi sona pi toki toki
sob
But my research contradicts your statement?
Sorry! I'm trying to gather the relevant screenshots to back up my claim.
What I've done was train a toki pona llm on a large corpus with no English direct translation (however there are some English that are used, but I'll touch on this soon)
The corpus includes emails that, some are only in toki pona, but some have a slight mixture (like a decent 20% amount of English.) I also included taobea's toki pona corpus (But with 0 English, so it's 100% toki pona.)
What was interesting though was that, despite the heavy lack of direct translation pairs (and by translation pairs I mean English matching the toki pona directly) (and the corpus didn't exactly touch on translating vocabulary, it's more that English is around the toki pona.) it managed to correctly identify the vocab words for animals. (Despite having to figure it out entirely on its own!!!)
(the image is me running the question through my LLM)
where did this llm get its english from
LLama 8B - instruct was the model I finetuned
li wile sona e ni: nimi kijetesantakalu li lon ma seme?
aha, so you are not doing the thing i said was impossible
Sorry, Im not very fluent in toki pona 😭
if you already have a model, of course it can learn to do translation
where's 
kijetesantakalu is surely an animal
i was talking about an entirely new model trained only on this server, which is also what the original idea was
and just to be clear i dont support using any pre existing model because they all come with copyright issues
Oh right! I believe the corpus is a bit old which may attribute to the fact that animal isnt included (or that the corpus avoids it, etc. it's hard to pinpoint)
also what is "not an animal, but often used as one" supposed to mean
I would roughly estimate this server to have only 50 - 100 Million tokens of English, which is suitable for a >1 Billion parameter model. Unfortunately, those types of models require much more data for English, however for toki pona, you can get away from a lot less tokens due to the nature of the language (translation is possible, but you would need a strong English backbone first) maybe separating the dataset to be half English, half toki pona would improve its consistent use of English? Unsure.
consistent use of english isnt what i want id just want translation
No clue! Sorry, the model is still in its early stages so I'm trying to iron out weird messages like those with reinforcement learning.
True! But you need the model to have a decent on English, or else it may be over fit, basically being extremely strong in a single translation, but failing in many others since it doesn't have a direct translation pair. If it's over fit on English, it may use it more like a literal dictionary (And may hallucinate more)
(plus the logistics of asking everyone for their consent to gather their messages 😭)
But everyone has good data :-)
Also sorry, I was mostly replying to the specific part in your text where: "but for a model to do translation, it needs to be fed on data where the same thing is said in both languages "
sona_mute:main> select count(Message filter .community.name = 'ma pona pi toki pona');
{8493320}
sona_mute:main> select count(TPUserSentence filter .message.community.name = 'ma pona pi toki pona');
{2429393}
sona_mute:main> select count(NonTPUserSentence filter .message.community.name = 'ma pona pi toki pona');
{8392879}
(i am too lazy to make a token counting query rn but the average sentence length is ~6.5 and the median sentence length is 3)
(also bear in mind that sentence tokenizing is done with my library, which uses a tokenizer i designed that tokenizes toki pona with hig accuracy at the expensive of english accuracy- the non-tp sentence count will be a bit inflated as a result)
(toki pi mute lili li suli nasa)
By sentence length do you mean by letter or by word count?
(also very interesting data!!!)
mute nimi
sorry, could you translate that?
word count :P
"word multiplicity" perhaps
Gotcha
A good rule of thumb is, for English text, tokenizers usually output 1 token for every 4 characters.
When I gave a GPT a whole lot of lipu tenpo, it was very good at tpt'ing on things that were in those issues. Not reproducing the articles exactly, but going beyond them. Once you moved away from those subjects, the quality decreased again
uh, following up:
idk where i got 6.5 and i think i imagined it
i pulled the lengths and redid the math, and the average among counted tp sentences is 4.13557
the median is still 3 tho
with all outliers of len 400 and up removed (there are large piles of these with the same word count, presumably due to reaching the max message length), the average goes to 4.11404
was messing around with some ai and uhm (CW: slight innapropriate speech, text only)
I appreciate the curvature of your buttocks too mr. random ai
sina jeso
mi jeso mute
mi wile pana e ⭐ tawa sina. taso mi pona
sina wile pana e ⭐️ tawa mi la sina ken pali e ni.
mi tonsi Ase la mi sona ala mi ken jeso 👀
sike pi monsi sina li pona tawa mi - I mean, it makes sense to have a single word for a mouthful of a phrase we all use a dozen times every day
ilo li kama sona e toki mute mute la ona li kama sona kama sona e toki sin kepeken toki pi mute ala · esun [Google] li ni e ilo la lili la ilo li ken ante e toki kepeken sona pi toki tu ala · https://arxiv.org/pdf/2205.03983 · (o alasa e nimi "zero-resource"
kijetesantakalu is from 2009. most toki pona corpim didn't exist in 2009
although ai seem to avoid the word for whatever reason
[Reply to:](#1162756469771472988 message) Oh right! I believe the corpus is a bit old which may attribute to the fact that animal isnt include…
Good point, I might have to just go through the dataset and see if it's mentioned. If it is, then it's probably a sign of me needing to train my model more. Thanks for the insight!
lol random hebrew plural thats fun
corpot :P
which one is that
feminine plural, also Hebrew
ahh, right i shouldve remembered that
For some reason I thought you knew a lot of Hebrew haha
lol i dont
i do understand a lot of yiddish by virtue of speaking german and being able to read the hebrew script
but i dont know anything about hebrew
Yahoo Llama 3.1 was just released (https://ai.meta.com/blog/meta-llama-3-1/) ill retrain my model on that and see if there's any significant performance increase.
I discovered the toki pona language option fot my keyboard and Im playing around with its prediction feature
sina kepeken ilo sitelen seme??
ilo swiftkey
i am curious: what if we got AI to cover a toki pona song
not too difficult considering RVC exists! allows you to record your own speaking in toki pona, and the model can replicate any specifiec choise of your chosing
you should try getting the ibm 7094 to speak in toki pono
making a voice rn, if anyone wants to donate a toki pona song i can test on please pana
I've tried simple toki pona songs in suno AI - it does it pretty well! You do have to change the spelling a little (use y instead of j, for example)
please share
Playing around for the first time, I just made up some silly lyrics - it didn't turn out badly for such a quick prompt! https://suno.com/song/4e72bbb0-9912-4937-9c93-1b7a1c5de5d4
slow, folk song, lullaby, male singer, acoustic guitar song. Listen and make your own with Suno.
slow, folk song, lullaby, male singer, acoustic guitar song. Listen and make your own with Suno.
idk
Oh nice. But you have to spell jan as yan and so on.
AI text too. No one is arguing in this thread that it's good, we're just keeping an eye on what it can do.
But it will probably take the place of musicians recording shitty music for shitty reasons (especially advertising)
Would a built specifically to work only on toki pona theoretically have an incredibly efficient tokenizer?
or should I say, tokinizer?
I found more luck if you make a segment of music yourself and then let Suno extend it, tends to give more interesting results
Especially if you put an interesting idea in there lol
I will fucking lose it if I have to listen to ai music every ad
I go insane wiþ þe 5 seconds þat plays on Spotify before I can swipe out of þe app
Besides þe Dutch ads þose fucking rip
I think it's our future. But most of the music we hear is mediocre, and we tune it out. AI will just make it more cheaply
i think red lobster recently used an AI song in an ad campaign
https://github.com/gregdan3/sona-toki/blob/main/src/sonatoki/Tokenizers.py#L48
for word tokenizing, yes! for sentence tokenizing, it doesn't matter very much. but there are caveats
i haven't written the code to demonstrate it yet, but i'm pretty sure that it also being a somewhat better english tokenizer would make it more accurate toki pona tokenizer- specifically, it should allow for up to 1 punctuation character to appear in the middle of writing characters at any given time, particularly - and '. that way, you wouldn't have sentence breaks on don't or game-changing or such.
simply put: if you know you're going to get exclusively toki pona text, you can aggressively split on punctuation, since it will never appear intra-word.
ideally you would keep consecutive names together as one token, but detecting these post-hoc is probably even easier than doing so in the tokenizer
Oh wow, this is super cool! I hadn't considered the consortium would want to know that sort of thing
More precisely, they're interested in our argument for what words to encode, and our argument will include how frequently words actually occur in the language
the Syllabic mode is especially cool
I was about to ask if this could be used to compile a full list of English words that match toki pona phonotactics but it looks like it's been done lol
Oh yeah that's been done many times
But my library has many more neat filter modes :P
Ohh makes sense
Were you scraping the web to find instances of toki pona in the wild? That's very interesting, I've always wondered about what groups might be using it in obscure places
Nah, that's beyond the scope of my project rn
I'm sure there are a lot of interesting instances of toki pona appearing "in the wild" as it were, but my focus is on the largest communities
Right now, I have essentially everything on Discord and Telegram, Reddit up to the end of 2023, and I am working on adding the old forums
I would like to add the Facebook groups as well, but Facebook is much harder to get data out of for a lot of reasons, both good and bad
I'd also like to get the old toki pona livejournal, VK, whatever seems reasonable on tumblr/twitter, toki.social, and the non-western groups like on kakaotalk
Yeah just scraping the web randomly seems like more than a one-person project lmao
I wish you the best of luck, I'd definitely be interested to see if there's any substantial differences in usage in non-western groups (especially among pu words)
unfortunately there isn't a good way for me to deliver that per-community information to be queried in my graph app
mostly on account of the limits of doing a free deployment of the database (it is a sqlite file on github pages that i do partial reads of)
ilo kakaotalk li awen ala e toki pini a a