#ilo.muni.la: graph how toki pona is used!

1254 messages ยท Page 2 of 2 (latest)

devout jettyBOT
#

pff

#

what I'm saying is if there were only like ~1000 messages of toki pona recorded that year than someone would only have to send suwi like 30 times to get that spike

serene hollow
#

thats true

quaint wagon
#

can confirm; check the "absolute" graph for that day. almost nothing!

#

more baffling case though: check meli in 2016-17

serene hollow
#

omg ๐Ÿ˜ญ

#

thats a big meli spike over there from march01 2017

#

womens history month

#

NO WONDER

#

september 6th 2023 was the highest

fresh sentinel
#

what happened in September...

#

meli sonko discourse?

brittle mulch
#

guys

#

can someone give me the link to ilo muni

quaint wagon
brittle mulch
#

pona

quaint wagon
serene hollow
#

ohh

#

okay that was clever (kekan santa)

quaint wagon
plucky gazelle
#

nimi ale kule li mute sama

quaint wagon
novel lintel
#

sina wan e toki pona e toki pona tan seme

quaint wagon
# novel lintel sina wan e toki pona e toki pona tan seme

mi wile sona e ni: nimi li lon toki ni la ona ale li mute seme tawa ale pi toki pona?
ilo Muni la mute pi kulupu nimi li tan ni taso: ona li lon. mute pi nimi ona li suli ala; mute ona taso li suli.
sina wile nanpa e nimi ale ona la o pana e ona lon tenpo pi mute ni: nimi li mute seme lon toki?

#

o pilin e ni:

  • nimi 10,000 li lon
  • toki pi nimi tu li lon tenpo 9,000
  • toki 'toki' li lon tenpo 100
  • toki 'pona' li lon tenpo 100
  • toki 'toki pona' lon tenpo 90

sina wile pana e sona kipisi la nasin mute li ken:

  • kulupu nimi li kipisi seme lon nimi pi suli sama?
    • 'toki pona' la o ni: 90 / 9000 (1%). taso sina wile weka e ni tan toki 'toki' la pakala li lon: nimi toki li lon tenpo 100 lon nimi 10,000 la o ni: 100 / 10,000 (1%). ni la sina weka e kipisi 1% tan kipisi 1% li kama jo e kipisi 0%. pakala.
  • nimi ale pi kulupu nimi wan la, nimi li kipisi seme lon nimi ale?
    • toki 'toki pona' li lon tenpo 90 la sina suli e nanpa ni tan ni: nimi tu li lon kulupu nimi. (90 * 2) / 10000 (1.8%). nimi 'toki' li awen lon kipisi 1%. ni la sina weka e kipisi 1.8% tan kipisi 1% li kama jo e kipisi -0.8%. pakala.
  • kulupu nimi li kipisi seme lon nimi ale?
    • open la ni li ken pakala tawa pilin. taso, ni li pona wawa e sona tan ni: kulupu nimi la, sina ken weka e nimi wan wile. o alasa weka e toki 'toki pona' tan toki 'toki'. (100 / 10000) - (90 / 10000) = 1% - 0.9% = 0.1%. ni li pana e sona ni: nimi 'toki' li lon ala kulupu nimi 'toki pona' la, ona li kipisi 0.1% lon nimi ale. sina ken ni tawa nimi 'pona' kin
#

@novel lintel

#

ilo muni li ni nanpa tu wan.
taso, alasa la, mi wile sama ni nanpa tu. ni la mi wan e nimi sama lon tenpo tu.

novel lintel
#

a

icy turret
#

@quaint wagon nasa

thick gust
#

nasa a!

#

this is surprising

quaint wagon
# icy turret <@497549183847497739> nasa

it is an error!
if you open the advanced menu and set minimum sentence length to 3+ words, e falls behind as expected
this is caused by i.e. and e.g.
it will be fixed in a future db revision lmao

icy turret
#

sona pona

quaint wagon
#

to be precise, the issue is as follows

  • period is a sentence delimiter in my parsing library
  • i only just added it as an intra word punctuation mark (which means i attempt to skip over exactly one of it, if it is between writing characters)
  • but the sentence delimiter is not at all aware of intra word punctuation, so it splits on them anyway
  • this was fine for the other intra word punctuation marks, because i wasn't counting them as sentence delimiters
  • and that's how I missed this in my tests!
#

so i.e. splits this sentence into
so i
e
splits this sentence into

timid zenith
#

kijetesantakalu usage over the years (thought this was cool)

devout jettyBOT
#

yeah I think ku did a lot for it

icy turret
#

@quaint wagon data request:
lets define an "active author" as someone who has said at least 1 sentence in 6+/12 past months, and whose said a total of N or more sentences in the 12 month period
calculate num active authors for each month we have data on

#

N ~ 100 is probably fine

icy turret
#

for bonus points, define separate series depending on which month any given author first qualified

icy turret
#

or alternatively we could have a data dump that excludes ngrams and only has authors, months, and (the existence of) sentences, and leave it up to others to process

timid zenith
#

This graph shows the absolute usage of toki, a, li, ni, mi, sina and ona (some of the most used words in online discussions).

We know that toki pona boomed around 2020, so the increase in hits towards the end of the chart doesn't raise many questions.

But what about the 2 peaks in 2007 and 2010?
Was toki pona "famous" during these two brief periods, only to come back to its pre-2007 stage afterwards?
I wasn't in the community at the time, so I really don't know what happened.
Does anyone have info?

PS: I just realised these two peaks I see could also be statistically insignifcant?

whole epoch
#

i think that's the time of the yahoo group and toki pona forum, so there's more data from that period specifically
i highly doubt toki pona was in a lull period from 2011-2015
-# /i'm mostly guessing

short narwhal
#

i got bored again so i kalamaed e sitelen nanpa akesi, this time using Authors. (audio file has normal, then reversed, b/c iseka could probably work as a percussion instrument if you wanted to sample ilo Muni data of all things for your music)

novel lintel
#

@dusk tree o ni โ†‘

timid zenith
short narwhal
#

im taking a stats class this semester and every graph i see reminds me of ilo Muni

i see a new kind of graph in class and my brain says "omg ilo Muni reference"

quaint wagon
devout jettyBOT
#

this is kind of nice actually

akesi Kiloโ€„โ†ฉ๏ธ

[Reply to:](#1272180068721889290 message) i got bored again so i kalamaed e sitelen nanpa akesi, this time using Authors. (audio file has normโ€ฆ ๐Ÿ“Ž

wicked stratus
#

toki

#

what does it mean when the stat is negative

icy turret
stone yokeBOT
#

mi kama lukin ala

quaint wagon
#

mu

wicked stratus
#

mu

icy turret
#

wooo boi its time to go through tons of queries

#

๐Ÿค”

#

@quaint wagon so remind me what was the deal with july?

#

i don't think this is a month you only partially archived right

quaint wagon
icy turret
#

i remember suno pi toki pona already appears in the default queries but mun pi ante toki might be a good addition

topaz spear
#

cool! you can see how pi appears less at the end of sentences than tawa, even though in general they're both neck and neck

icy turret
#

tenpo pana no longer fulfils this role, thanks to majeka and siliwa

quaint wagon
icy turret
quaint wagon
#

yeah

icy turret
#

not really in complete decline, seems to have stabilised

#

first time ilo muni can search itself

#

we already know about the gap but holy heck this goes to near zero

quaint wagon
quaint wagon
quaint wagon
#

interesting

icy turret
#

no up or down trend

#

a continued trend away from dialogues?

quaint wagon
#

or increased focus on third party subjects, which are most of them
dialogues between two people where they discuss a thing they're mutually interested in would not include as many mi as li for sure

#

oh my god i forgot to include that i added poki lapo

#

i added poki lapo, that's in there too

icy turret
#

laughing is inversely proportional to having fun

quaint wagon
#

LMAO

icy turret
#

santa is still seasonal

quaint wagon
#

consider:

#

genuine shift going on there

icy turret
#

remind me what's _full

placid crow
quaint wagon
#

the phrase is the entire sentence

icy turret
#

right

#

a chart to post elsewhere with no context

#

any ideas how to interpret this

quaint wagon
#

now this is why i added all these attributes.

#

also worth mentioning that i improved the parser significantly, and i'm thinking that'll let me do google ngrams's trick in the future
so you'd be able to make queries like (taso_start / taso), showing what % of taso uses are taso at the start of the sentence
that's not in yet but, well, it would be nice lol

icy turret
#

im gonna just scroll through the thread and repeat queries that aren't too much of a pain

#

this one's just a continued trend

#

this ones fun because nanpa open is up, more than ever

#

penpo has straight up fully eclipsed kokosila

quaint wagon
#

tbf we knew that last year, but it is fascinating to see that trend continue

icy turret
#

a proxy for influxes of learners, perhaps?

#

something cool is that the mentions never really bottom out the way they did in the 2024 dataset

quaint wagon
#

i'm not sure what you mean by bottom out

icy turret
#

like this is readable, even for low frequency months

quaint wagon
#

oh i see

icy turret
#

@quaint wagon this might help you

#

never seen that before

quaint wagon
#

the library i'm using says that every time

icy turret
#

oh.

quaint wagon
#

that happens on chromium too, but that's just a funny webserver behavior

icy turret
#

my powers of observation are unparalleled

quaint wagon
#

ehehe

#

np

icy turret
#

this one is interesting cause thats despite there not being a lipu monsuta release

#

we have successfully unlonsied

quaint wagon
#

lmao

quaint wagon
plush gust
#

"musi nimi" li kama wawa tan seme? entirely possible this is just an event i missed by not being active here

quaint wagon
#

that's a good question! i have no idea!

#

oh my god it's a thread

#

oh my god it's people playing toki pona wordle

plush gust
#

LOL

quaint wagon
#

this falls outside of my criteria for inclusion........................

plush gust
#

whoopsie!

quaint wagon
#

thanks for spotting it ehe

#

see here's the fun thing

plush gust
#

seems like a pretty sizeable impact on "musi" as a whole, lol

topaz spear
#

nooo not musi nimi

quaint wagon
#

after the last time i had to regen the db, i implemented a suspicious datapoint check, trying to find months which are more than 3x at least one of their neighbors
but this still doesn't match it

wicked stratus
#

did someone say musi nimi /silly

icy turret
#

i love how you can see the gravitational pull

quaint wagon
#

hehehe

icy turret
#

for some reason people are anu semeing more than ever

quaint wagon
#

i discussed this with jan Kepijona recently

devout jettyBOT
#

Oooo what does mun kekan mean?

kala Asiโ€„โ†ฉ๏ธ

[Reply to:](#1272180068721889290 message) i love how you can see the gravitational pull ๐Ÿ“Ž

quaint wagon
#

imo, anu seme is just a way easier question format to work with
you can turn any statement into a question by appending it, no need to think about grammar

quaint wagon
devout jettyBOT
#

Oh

#

LOL

pseudo smelt
#

ona a

icy turret
devout jettyBOT
#

I could've figured it out if i'd looked around for a second huh

quaint wagon
#

hehhhehe

quaint wagon
# icy turret

oh! you can mix wildcards with attributes
kin *_start

icy turret
#

in this case i wanted to group all possible followups to kin as one series

#

except for la ofc

icy turret
#

since theyre used frequently they would retain the feature more readily

plush gust
#

the continued decline of "sina pali e seme" is fascinating

quaint wagon
#

bc i can't be bothered to say all that shit lmao

icy turret
#

lanpan decline!

topaz spear
#

what is this ginormous spike

quaint wagon
#

no idea!

wicked stratus
#

maybe a ton of people started getting into toki pona at that time

icy turret
#

that was my first impression looking at the graph but actually thatd be 2020

#

so why the first months of 2021, no idea

icy turret
plush gust
#

toki pona 2030.. standard goodbye becomes weka pona?

icy turret
#

3/4rds of all words in july 2007 were nijon

#

wait no

plush gust
#

wouldn't that be 3/4ths of a percent of all words?

wicked stratus
#

isnt that 3/400
but still

icy turret
#

3/4rds of a percent

#

i can read

#

totally

wicked stratus
#

could you imagine though

plush gust
#

tenpo jul 2007 la seme li lon ma Nijon?

wicked stratus
#

-# (i still dont know what couldve caused that ?)

topaz spear
quaint wagon
plush gust
icy turret
#

@ruby whale

plush gust
#

a. sona

topaz spear
#

no no that's not what it stands for

#

kulupu Mi Sewi e Nimi Weka

#

it's supposed to support weka meaning "leave" instead of tawa meaning it

topaz spear
#

holy

whole epoch
quaint wagon
# whole epoch ?

"musi nimi" turned out to have a data error, and he's showing how large that error is starting in like March this year

whole epoch
#

oh, how is it an error?

river shard
#

ijo ni li lon: #1352699363100594338

whole epoch
#

ah! Lmao

wicked stratus
#

procrastinating by checking random searches

#

were they like, coining those new reserved words in 2002 and 2012

whole epoch
wicked stratus
#

oh makes sense

quaint wagon
#

yeah, two letter words are my eternal enemy; they're extremely difficult to parse correctly and appear all over the place. the best strategy I've come up with is to only count them if they appear in my dictionary, but even that means getting false positives for short "sentences" like somebody saying "ku ku ku" or something

low timber
#

interesante

#

kije is a rather recent development

plush gust
#

what is the cause of the dip at the end of the data? were numbers actually that low in July? if so, that's a quite notable fluctuation

#

pretty sure most of the spikes are in August from suno pi toki pona but what are these other spikes in march and then december from?

plush gust
#

i saw that live! probably the best sptp presentation of all time imo. i should've been more clear, i was referring to July 2025 not "the dip" lol

icy turret
#

oh that one. @quaint wagon i think weve talked about it already but i forget

plush gust
quaint wagon
#

i actually don't have an answer for that one yet
it comes before the changes to starboard and the reorg of the server iirc
it also comes before the observed general slump in august-september

#

it also remains to be seen whether that month was coincidentally quiet or the start of a trend

#

well, at least as far as the graph of toki pona specifically

plush gust
#

i mean.. disregarding the winter spike from non-sptp sources which seems to have propped up the numbers in 2024 the July 2025 low doesn't seem as incongruent

quaint wagon
#

hmm, i should add a toggle for a draggable horizontal bar...

#

it would make this exact comparison much easier

#

but this is a super handy visual

icy turret
wide pawn
#

actually the hits show more increase for mu than authors

icy turret
prime sundial
#

mun Kekan San o, tenpo weka la sina toki e ni:

tenpo lon la, nanpa pi jan toki li seme? ona li awen ala awen lili tawa nanpa 1000?

quaint wagon
prime sundial
quaint wagon
#

The cutoff point is July 31st; I did all the collection in November IIRC

quaint wagon
#

(mu, also for my sake)

torn citrus
#

uhh idk if this is the right tomo but this is maybe a typo but the group could just be named that

quaint wagon
torn citrus
#

ahh okay

whole epoch
#

this might be a bit ask, but i'm going a bit silly-mode about how pilin is a weirdo (copula-esque)
like you always see "pilin pona mute", "pilin ike nasa", "pilin pona lili", but very rarely "pilin pi pona mute", "pilin pi ike nasa", "pilin pi pona lili"
sure, you could argue that when people say "pilin pona mute", they are "pilin mute" and "pilin pona" but i personally doubt that, since in my experience, when people say "pilin pona mute" or something similar, they're almost always trying to explain that the pilin is "pona mute", and not that the pilin is mute and is pona
anyways, unsure of how doable this is, but would it be possible to compare the ratio of "pilin X Y" to "pilin pi X Y" and compare that to other words? discounting ala
cause my hypothesis is that pilin is very rarely together with pi to express an emotion, and only with pi for non-emotion stuff ("pilin pi jan ante", "pilin pi toki pona" etc)

#

like i'm willing to concede that sometimes when you say "pilin pona mute" you do mean that the pilin is mute
but i very much doubt that people would say that like 20x more than that they are feeling very good

#

oh, and in that case you'd also expect to see "pilin mute X" or just "pilin mute" (as a verb with no object), but both of them are basically non-existant

#

idk it's 1am and my brain is silly cause pilin is weird and people barely talk about it ehe

river shard
#

it's being talked about less nowadays
the copula thing has been talked about a couple of years ago regularly
but it was at the same time also being talked about regarding "X pilin" vs "pilin X"

quaint wagon
#
select text, hits, authors
from term t
join yearly y on t.id = y.term_id
where y.attr = 0
  and y.day = 0
  and t.len > 1

  and (' ' || t.text || ' ') like '% pilin %'

  and (' ' || t.text || ' ') not like '% la %'
  and (' ' || t.text || ' ') not like '% li %'
  and (' ' || t.text || ' ') not like '% e %'
  and (' ' || t.text || ' ') not like '% o %'
  and (' ' || t.text || ' ') not like '% en %'
  and (' ' || t.text || ' ') not like '% pi %'

order by hits desc;

so, query that gets all the phrases with pilin, except those with some particles. also excluding pi, because here's the version that gets phrases containing pilin and pi:

select text, hits, authors
from term t
join yearly y on t.id = y.term_id
where y.attr = 0
  and y.day = 0
  and t.len > 1

  and (' ' || t.text || ' ') like '% pilin %'
  and (' ' || t.text || ' ') like '% pi %'

  and (' ' || t.text || ' ') not like '% la %'
  and (' ' || t.text || ' ') not like '% li %'
  and (' ' || t.text || ' ') not like '% e %'
  and (' ' || t.text || ' ') not like '% o %'
  and (' ' || t.text || ' ') not like '% en %'

order by hits desc;

problem: there's like, an order of magnitude fewer results in the second query, let alone examples of usage

but you might have some luck answering this question with poka muni: https://acipensersturio.github.io/poka-muni/

#

though i would point out
can you even explain the difference between pilin pona and pona pilin
ok, now can you do it at the speed of a conversation

whole epoch
whole epoch
#

i've also messed around a lil with poka muni, but since it only accepts single words it's hard to check for patterns with 2 modifiers or with pi

whole epoch
quaint wagon
#

also, on the note of "pilin is weird"
yesterday, i was giving an example of asking and answering questions, with the example "noka sina li pilin ala pilin ike?"
i was going to say that the correct responses are "pilin" or "ala"
but i realized partway that i wouldn't really do that myself; it's valid, and would probably be understood, but it isn't really the operative information to provide in response to that question, even if it follows grammatically

whole epoch
quaint wagon
#

lmao

whole epoch
quaint wagon
#

thonk

icy turret
quaint wagon
# whole epoch ah i see i'll maybe do that if i'm very bored one day ๐Ÿ˜…

i was considering adding an open query page to ilo muni
but there would be a bunch of problems to solve, and sqlite doesn't really give me the ability to solve them
i can give you a plain table of the data you requested, that's not too bad
but if somebody makes a super basic query like select * from monthly, it will attempt to download the entire table, and then github will nuke me from orbit for using all their gigabytes
databases like postgres can tell you approximately how expensive a given query will be, and approximately how large its results will be too
sqlite doesn't have this capability; i can infer certain basic things from the explain query plan feature, but it would be essentially impossible to plug all the holes that let somebody make fuckhuge queries and attempt to download the entire db over github pages

icy turret
#

and just not promise itll survive if cuts are needed

quaint wagon
#

i have considered that but then i would need to host it; that might've been kinda reasonable around when i created ilo muni, but rn my hosting situation is going to be precarious starting in august and continuing through fuck knows

icy turret
#

is perfectly valid in russian imo

#

even though "it throws itself" is, in theory, not interpretable without eyes

quaint wagon
icy turret
#

for russian at least i would syntactically interpret the entire predicate to be included in the response, but only the head is needed. the rest is syntactically present, but can simply be a bunch of empty nodes, filled in from context that it literally was just said