#ilo.muni.la: graph how toki pona is used!
1254 messages ยท Page 2 of 2 (latest)
what I'm saying is if there were only like ~1000 messages of toki pona recorded that year than someone would only have to send suwi like 30 times to get that spike
thats true
can confirm; check the "absolute" graph for that day. almost nothing!
more baffling case though: check meli in 2016-17
omg ๐ญ
thats a big meli spike over there from march01 2017
womens history month
NO WONDER
september 6th 2023 was the highest
pona
fyi this represents sept6 to october 4th
(i will be going back to monthly units once i release authorship data)
http://gregdan3.github.io/ilo-muni/?query=laso%2C+loje%2C+jelo%2C+walo%2C+pimeja+-+tenpo+pimeja&minSentLen=1&scale=rel&field=hits&start=1470009600&end=1722470400&smoothing=2&smoother=cwin
mi kama sona e ni musi:
sina weka e toki tenpo tan nimi pimeja la, mute ona li sama nimi kule ante.
nimi ale kule li mute sama
https://gregdan3.github.io/ilo-muni/?query=toki+pona+%2B+toki+pona%2C+toki+-+toki+pona%2C+pona+-+toki+pona&minSentLen=1&scale=rel&start=1470009600&end=1722470400&smoothing=2&smoother=cwin
ni li musi!
kulupu nimi "toki pona" li mute seme lon ale toki? nimi "toki" li lon ala toki <- ni la ona li mute seme lon ale toki? nimi pona la sama
sina wan e toki pona e toki pona tan seme
mi wile sona e ni: nimi li lon toki ni la ona ale li mute seme tawa ale pi toki pona?
ilo Muni la mute pi kulupu nimi li tan ni taso: ona li lon. mute pi nimi ona li suli ala; mute ona taso li suli.
sina wile nanpa e nimi ale ona la o pana e ona lon tenpo pi mute ni: nimi li mute seme lon toki?
o pilin e ni:
- nimi 10,000 li lon
- toki pi nimi tu li lon tenpo 9,000
- toki 'toki' li lon tenpo 100
- toki 'pona' li lon tenpo 100
- toki 'toki pona' lon tenpo 90
sina wile pana e sona kipisi la nasin mute li ken:
- kulupu nimi li kipisi seme lon nimi pi suli sama?
- 'toki pona' la o ni: 90 / 9000 (1%). taso sina wile weka e ni tan toki 'toki' la pakala li lon: nimi toki li lon tenpo 100 lon nimi 10,000 la o ni: 100 / 10,000 (1%). ni la sina weka e kipisi 1% tan kipisi 1% li kama jo e kipisi 0%. pakala.
- nimi ale pi kulupu nimi wan la, nimi li kipisi seme lon nimi ale?
- toki 'toki pona' li lon tenpo 90 la sina suli e nanpa ni tan ni: nimi tu li lon kulupu nimi. (90 * 2) / 10000 (1.8%). nimi 'toki' li awen lon kipisi 1%. ni la sina weka e kipisi 1.8% tan kipisi 1% li kama jo e kipisi -0.8%. pakala.
- kulupu nimi li kipisi seme lon nimi ale?
- open la ni li ken pakala tawa pilin. taso, ni li pona wawa e sona tan ni: kulupu nimi la, sina ken weka e nimi wan wile. o alasa weka e toki 'toki pona' tan toki 'toki'. (100 / 10000) - (90 / 10000) = 1% - 0.9% = 0.1%. ni li pana e sona ni: nimi 'toki' li lon ala kulupu nimi 'toki pona' la, ona li kipisi 0.1% lon nimi ale. sina ken ni tawa nimi 'pona' kin
@novel lintel
ilo muni li ni nanpa tu wan.
taso, alasa la, mi wile sama ni nanpa tu. ni la mi wan e nimi sama lon tenpo tu.
a
@quaint wagon nasa
it is an error!
if you open the advanced menu and set minimum sentence length to 3+ words, e falls behind as expected
this is caused by i.e. and e.g.
it will be fixed in a future db revision lmao
sona pona
to be precise, the issue is as follows
- period is a sentence delimiter in my parsing library
- i only just added it as an intra word punctuation mark (which means i attempt to skip over exactly one of it, if it is between writing characters)
- but the sentence delimiter is not at all aware of intra word punctuation, so it splits on them anyway
- this was fine for the other intra word punctuation marks, because i wasn't counting them as sentence delimiters
- and that's how I missed this in my tests!
so i.e. splits this sentence into
so i
e
splits this sentence into
kijetesantakalu usage over the years (thought this was cool)
yeah I think ku did a lot for it
@quaint wagon data request:
lets define an "active author" as someone who has said at least 1 sentence in 6+/12 past months, and whose said a total of N or more sentences in the 12 month period
calculate num active authors for each month we have data on
N ~ 100 is probably fine
for bonus points, define separate series depending on which month any given author first qualified
or alternatively we could have a data dump that excludes ngrams and only has authors, months, and (the existence of) sentences, and leave it up to others to process
This graph shows the absolute usage of toki, a, li, ni, mi, sina and ona (some of the most used words in online discussions).
We know that toki pona boomed around 2020, so the increase in hits towards the end of the chart doesn't raise many questions.
But what about the 2 peaks in 2007 and 2010?
Was toki pona "famous" during these two brief periods, only to come back to its pre-2007 stage afterwards?
I wasn't in the community at the time, so I really don't know what happened.
Does anyone have info?
PS: I just realised these two peaks I see could also be statistically insignifcant?
i think that's the time of the yahoo group and toki pona forum, so there's more data from that period specifically
i highly doubt toki pona was in a lull period from 2011-2015
-# /i'm mostly guessing
i got bored again so i kalamaed e sitelen nanpa akesi, this time using Authors. (audio file has normal, then reversed, b/c iseka could probably work as a percussion instrument if you wanted to sample ilo Muni data of all things for your music)
@dusk tree o ni โ
could probably work as a percussion instrument if you wanted to sample ilo Muni data of all things for your music
im gonna do it
im taking a stats class this semester and every graph i see reminds me of ilo Muni
i see a new kind of graph in class and my brain says "omg ilo Muni reference"
real
this reminds me of that one post by ?jan Misali maybe?
[guy who has broadly taken in media from many forms and time periods] hmm... getting a lot of human experience vibes from this
https://ilo.muni.la/?query=ukawina%2C+kolona&minSentLen=1&scale=rel&field=hits&start=1470009600&end=1722470400&smoothing=2&smoother=gauss
ni li wawa: sina ken lukin e open jaki e open utala lon mute nimi
jan Kasape li pana e sona poka la mi alasa e ni
this is the orignal meme btw
this is kind of nice actually
[Reply to:](#1272180068721889290 message) i got bored again so i kalamaed e sitelen nanpa akesi, this time using Authors. (audio file has normโฆ ๐
you have subtracted the "a" series from the "toki" series. during this month, the former is larger than the latter
mi kama lukin ala
mu
mu
wooo boi its time to go through tons of queries
๐ค
@quaint wagon so remind me what was the deal with july?
i don't think this is a month you only partially archived right
we don't know! reddit is missing from all of 2025, but everything else is there. it is not partially archived either
i remember suno pi toki pona already appears in the default queries but mun pi ante toki might be a good addition
cool! you can see how pi appears less at the end of sentences than tawa, even though in general they're both neck and neck
tenpo pana no longer fulfils this role, thanks to majeka and siliwa
that's suspicious
no way, right?
up to you to investigate
yeah
not really in complete decline, seems to have stabilised
first time ilo muni can search itself
we already know about the gap but holy heck this goes to near zero
i can't reproduce this
was this the sample search?
the unicode effort revived it lol
yes
interesting
or increased focus on third party subjects, which are most of them
dialogues between two people where they discuss a thing they're mutually interested in would not include as many mi as li for sure
oh my god i forgot to include that i added poki lapo
i added poki lapo, that's in there too
laughing is inversely proportional to having fun
LMAO
santa is still seasonal
remind me what's _full
this might be the best thing ive seen all month
the phrase is the entire sentence
now this is why i added all these attributes.
also worth mentioning that i improved the parser significantly, and i'm thinking that'll let me do google ngrams's trick in the future
so you'd be able to make queries like (taso_start / taso), showing what % of taso uses are taso at the start of the sentence
that's not in yet but, well, it would be nice lol
im gonna just scroll through the thread and repeat queries that aren't too much of a pain
this one's just a continued trend
this ones fun because nanpa open is up, more than ever
penpo has straight up fully eclipsed kokosila
tbf we knew that last year, but it is fascinating to see that trend continue
a proxy for influxes of learners, perhaps?
something cool is that the mentions never really bottom out the way they did in the 2024 dataset
i'm not sure what you mean by bottom out
like this is readable, even for low frequency months
oh i see
the library i'm using says that every time
oh.
that happens on chromium too, but that's just a funny webserver behavior
my powers of observation are unparalleled
this one is interesting cause thats despite there not being a lipu monsuta release
we have successfully unlonsied
lmao
well yeah, tenpo monsuta almost certainly refers to halloween most often
"musi nimi" li kama wawa tan seme? entirely possible this is just an event i missed by not being active here
that's a good question! i have no idea!
oh my god it's a thread
oh my god it's people playing toki pona wordle
LOL
this falls outside of my criteria for inclusion........................
whoopsie!
seems like a pretty sizeable impact on "musi" as a whole, lol
nooo not musi nimi
after the last time i had to regen the db, i implemented a suspicious datapoint check, trying to find months which are more than 3x at least one of their neighbors
but this still doesn't match it
did someone say musi nimi /silly
i love how you can see the gravitational pull
hehehe
for some reason people are anu semeing more than ever
i discussed this with jan Kepijona recently
Oooo what does mun kekan mean?
[Reply to:](#1272180068721889290 message) i love how you can see the gravitational pull ๐
imo, anu seme is just a way easier question format to work with
you can turn any statement into a question by appending it, no need to think about grammar
hi, that me
ona a
I could've figured it out if i'd looked around for a second huh
hehhhehe
oh! you can mix wildcards with attributes
kin *_start
ive done that ye
in this case i wanted to group all possible followups to kin as one series
except for la ofc
if we extrapolate this, you could imagine a future where x_ala_x only applies to preverbs
since theyre used frequently they would retain the feature more readily
the continued decline of "sina pali e seme" is fascinating
for me, it's already down to just sentences which have at most two words in the verb as a statement
bc i can't be bothered to say all that shit lmao
lanpan decline!
what is this ginormous spike
no idea!
maybe a ton of people started getting into toki pona at that time
that was my first impression looking at the graph but actually thatd be 2020
so why the first months of 2021, no idea
toki pona 2030.. standard goodbye becomes weka pona?
wouldn't that be 3/4ths of a percent of all words?
isnt that 3/400
but still
could you imagine though
tenpo jul 2007 la seme li lon ma Nijon?
Events in the year 2007 in Japan.
-# (i still dont know what couldve caused that ?)
kulupu MSNW approved
This is fucking hilarious thank you
kulupu ni li seme?
@ruby whale
a. sona
no no that's not what it stands for
kulupu Mi Sewi e Nimi Weka
it's supposed to support weka meaning "leave" instead of tawa meaning it
?
"musi nimi" turned out to have a data error, and he's showing how large that error is starting in like March this year
oh, how is it an error?
ijo ni li lon: #1352699363100594338
ah! Lmao
procrastinating by checking random searches
were they like, coining those new reserved words in 2002 and 2012
it's just noise in the data
oh makes sense
yeah, two letter words are my eternal enemy; they're extremely difficult to parse correctly and appear all over the place. the best strategy I've come up with is to only count them if they appear in my dictionary, but even that means getting false positives for short "sentences" like somebody saying "ku ku ku" or something
what is the cause of the dip at the end of the data? were numbers actually that low in July? if so, that's a quite notable fluctuation
pretty sure most of the spikes are in August from suno pi toki pona but what are these other spikes in march and then december from?
mun has already spoken about it in fact! https://www.youtube.com/watch?v=ZS53CuYNOn8&feature=youtu.be
Check out https://suno.pona.la/2024/
Check out the schedule https://suno.pona.la/2024/tenpo/
o lukin https://suno.pona.la/2024/
o lukin tenpo https://suno.pona.la/2024/tenpo/
i saw that live! probably the best sptp presentation of all time imo. i should've been more clear, i was referring to July 2025 not "the dip" lol
oh that one. @quaint wagon i think weve talked about it already but i forget
this seems like it should have an obvious answer, but it's eluding me right now for some reason. viral toki pona internet content maybe?
i actually don't have an answer for that one yet
it comes before the changes to starboard and the reorg of the server iirc
it also comes before the observed general slump in august-september
it also remains to be seen whether that month was coincidentally quiet or the start of a trend
well, at least as far as the graph of toki pona specifically
i mean.. disregarding the winter spike from non-sptp sources which seems to have propped up the numbers in 2024 the July 2025 low doesn't seem as incongruent
a bad rendering of my thinking
hmm, i should add a toggle for a draggable horizontal bar...
it would make this exact comparison much easier
but this is a super handy visual
actually the hits show more increase for mu than authors
mun Kekan San o, tenpo weka la sina toki e ni:
tenpo lon la, nanpa pi jan toki li seme? ona li awen ala awen lili tawa nanpa 1000?
n! lon la mi sona mute ala tan ni: mi lukin mute ala e suli kulupu sama tenpo ni: mi pali open e ilo Muni.
taso, pilin mi la kulupu li awen kama suli - tenpo mute la mi kama toki tawa jan; jan li sin tawa mi, taso ona li sona toki pona
pilin mi la ni li kama mute lon tenpo poka, a a
What's the cutoff point for the data? I noticed this slump the other day, and I assumed this was a partial month.
IE, the data set ends in "July 2025", but not "July 31st, 2025".
If data collection happened on say, July 15th, that month will look like it has half as much conversation as any other month.
The cutoff point is July 31st; I did all the collection in November IIRC
uhh idk if this is the right tomo but this is maybe a typo but the group could just be named that
it is actually named that! tok is the ISO code for the language
ahh okay
this might be a bit ask, but i'm going a bit silly-mode about how pilin is a weirdo (copula-esque)
like you always see "pilin pona mute", "pilin ike nasa", "pilin pona lili", but very rarely "pilin pi pona mute", "pilin pi ike nasa", "pilin pi pona lili"
sure, you could argue that when people say "pilin pona mute", they are "pilin mute" and "pilin pona" but i personally doubt that, since in my experience, when people say "pilin pona mute" or something similar, they're almost always trying to explain that the pilin is "pona mute", and not that the pilin is mute and is pona
anyways, unsure of how doable this is, but would it be possible to compare the ratio of "pilin X Y" to "pilin pi X Y" and compare that to other words? discounting ala
cause my hypothesis is that pilin is very rarely together with pi to express an emotion, and only with pi for non-emotion stuff ("pilin pi jan ante", "pilin pi toki pona" etc)
like i'm willing to concede that sometimes when you say "pilin pona mute" you do mean that the pilin is mute
but i very much doubt that people would say that like 20x more than that they are feeling very good
oh, and in that case you'd also expect to see "pilin mute X" or just "pilin mute" (as a verb with no object), but both of them are basically non-existant
idk it's 1am and my brain is silly cause pilin is weird and people barely talk about it ehe
it's being talked about less nowadays
the copula thing has been talked about a couple of years ago regularly
but it was at the same time also being talked about regarding "X pilin" vs "pilin X"
select text, hits, authors
from term t
join yearly y on t.id = y.term_id
where y.attr = 0
and y.day = 0
and t.len > 1
and (' ' || t.text || ' ') like '% pilin %'
and (' ' || t.text || ' ') not like '% la %'
and (' ' || t.text || ' ') not like '% li %'
and (' ' || t.text || ' ') not like '% e %'
and (' ' || t.text || ' ') not like '% o %'
and (' ' || t.text || ' ') not like '% en %'
and (' ' || t.text || ' ') not like '% pi %'
order by hits desc;
so, query that gets all the phrases with pilin, except those with some particles. also excluding pi, because here's the version that gets phrases containing pilin and pi:
select text, hits, authors
from term t
join yearly y on t.id = y.term_id
where y.attr = 0
and y.day = 0
and t.len > 1
and (' ' || t.text || ' ') like '% pilin %'
and (' ' || t.text || ' ') like '% pi %'
and (' ' || t.text || ' ') not like '% la %'
and (' ' || t.text || ' ') not like '% li %'
and (' ' || t.text || ' ') not like '% e %'
and (' ' || t.text || ' ') not like '% o %'
and (' ' || t.text || ' ') not like '% en %'
order by hits desc;
problem: there's like, an order of magnitude fewer results in the second query, let alone examples of usage
but you might have some luck answering this question with poka muni: https://acipensersturio.github.io/poka-muni/
Generated by create next app
though i would point out
can you even explain the difference between pilin pona and pona pilin
ok, now can you do it at the speed of a conversation
The difference is that when people say pilin pona they often mean pona pilin
While if they say pona pilin they probably don't mean pilin pona :p
(No I couldn't explain the practical difference)
sorry, not a tech person, where would i put this query?
i've also messed around a lil with poka muni, but since it only accepts single words it's hard to check for patterns with 2 modifiers or with pi
yea i remember seeing like, one or two discussions about it way back, but the fact that people don't talk about it regularly is a bit wild to me
it feels like a pretty important part of how pilin is used but most don't even realise it
ah sorry! i skipped a few steps lmao
if you have and know how to use a terminal, you can open up the raw database file (downloadable on the site) in sqlite and then paste this query
also, on the note of "pilin is weird"
yesterday, i was giving an example of asking and answering questions, with the example "noka sina li pilin ala pilin ike?"
i was going to say that the correct responses are "pilin" or "ala"
but i realized partway that i wouldn't really do that myself; it's valid, and would probably be understood, but it isn't really the operative information to provide in response to that question, even if it follows grammatically
ah i see
i'll
maybe do that if i'm very bored one day ๐
lmao
that's what i'm saying
we should stop teaching pilin to learners so it dies out.
thonk
i would say pilin without reservations
isipin users rejoice
i was considering adding an open query page to ilo muni
but there would be a bunch of problems to solve, and sqlite doesn't really give me the ability to solve them
i can give you a plain table of the data you requested, that's not too bad
but if somebody makes a super basic query like select * from monthly, it will attempt to download the entire table, and then github will nuke me from orbit for using all their gigabytes
databases like postgres can tell you approximately how expensive a given query will be, and approximately how large its results will be too
sqlite doesn't have this capability; i can infer certain basic things from the explain query plan feature, but it would be essentially impossible to plug all the holes that let somebody make fuckhuge queries and attempt to download the entire db over github pages
consider having a second web app thats actually got a backend
and just not promise itll survive if cuts are needed
i have considered that but then i would need to host it; that might've been kinda reasonable around when i created ilo muni, but rn my hosting situation is going to be precarious starting in august and continuing through fuck knows
for comparison:
- brosajetsja v glaza? - brosajetsja.
brosaj-et-sja v glaz-a
throw-3SG-self in(to) eye-ACC.PL
Does it catch the eye? It does.
is perfectly valid in russian imo
even though "it throws itself" is, in theory, not interpretable without eyes
the primary db is also a fuck; edgedb was useful for this project, but it has proved to be more burden than benefit for anything beyond exactly what ilo muni needed
and also it is functionally dead due to vercel hiring edgedb's entire dev team to do not-edgedb
so i would need to migrate to postgres before making any significant backend changes, realistically
for russian at least i would syntactically interpret the entire predicate to be included in the response, but only the head is needed. the rest is syntactically present, but can simply be a bunch of empty nodes, filled in from context that it literally was just said