#ilo.muni.la: graph how toki pona is used!

1 messages · Page 1 of 1 (latest)

quaint wagon
edgy cradle
#

wan

quaint wagon
#

tan seme

edgy cradle
#

heres something i notices

#

the relative mode is less accurate further back in time

wicked stratus
#

san

wicked stratus
#

I actually have cool graphs I'm just not at home rn 😔

quaint wagon
#

this is why there are multiple scales offered tho

icy turret
#

nanpa wan is roughly twice as common as nanpa tu

whole epoch
icy turret
#

indirect evidence for unpopularity of sona-preverb

wicked stratus
edgy cradle
whole epoch
quaint wagon
quaint wagon
# whole epoch

psst, check the absolute mode
there's like 30 occurrences there :P

icy turret
#

damn, its not often that you can attribute a word to one month and then essentially never again

wicked stratus
whole epoch
edgy cradle
quaint wagon
#

in general, take anything that has fewer than ~400 occurrences all-time (you can check with cumulative) with a grain of salt
that data can be relevant, but it can be much more easily swayed by errors, or affected by an individual speaker

#

if you wanna know how powerful exactly one speaker is, look up ilo o ken e toki ni

whole epoch
#

idk im thinking out loud

quaint wagon
whole epoch
#

gotcha

#

that might just be cause the toki pona community is bigger and has gotten less,,, meta for lack of a better term

quaint wagon
#

ehehhe, yep

whole epoch
#

using tp as the medium of conversation instead of as the topic

quaint wagon
#

i am surprised i haven't seen anyone look up ki yet

to be clear, ki's results are nonsense and unavoidably so lmao

#

there are at least 7 things i'm aware of which ki appears in that are not uses of the toki pona word ki

whole epoch
quaint wagon
#

eeeheheh

novel lintel
icy turret
#

@quaint wagon feature suggestion: create a list of commentary matching particular search requests
use it to comment on searches that could be misleading

#

e.g. if someone looks up san remind them the search is partially "spoiled" by your presence

#

the counterargument to adding this is we cant possibly account for every misleading search so it might be better done in conversation than in gui

#

the countercounterargument is that we can account for the most potentially frequent of those

arctic yoke
#

a lot of people, i have noticed, seem to think my name is a toki pona question word /musi

drifting breach
round mural
tall rune
#

what was itomi

whole epoch
tall rune
#

ah

mint knoll
thin rose
mint knoll
icy turret
#

alasa is growing in relative terms, both as a "standalone" predicate and as a preverb

fresh sentinel
#

how does that compare to lukin preverb?

drifting breach
#

isn't there a special syntax for marking only preverbs or only verbs or only smth?

icy turret
#

there isn't

#

lukin is harder to estimate for this exact reason

verbal marten
meager steeple
#

#1187212477155528804

#

la nimi ona li awen lon tenpo mute

quaint wagon
#

^ tenpo kama la mi weka e tomo ni tan sona pi ilo Muni

brittle mulch
quaint wagon
#

woah how's it transparent

brittle mulch
#

right click on the graph

#

and press copy image

quaint wagon
#

WUH????

#

never knew that was a thing

brittle mulch
#

SINA LI IJO E NI

icy turret
#

yep thats a thing

quaint wagon
#

SITELEN LI TAN ILO ChartJS

#

LI TAN ALA MI

brittle mulch
#

a

quaint wagon
#

MI PANA E NANPA E NASIN TASO E SITELEN ALA

brittle mulch
#

peli pani

quaint wagon
#

seme

#

are you saying
very funny
perhaps

tall rune
#

you can see when i joined the community :3

#

also it strips double letters :c

quaint wagon
# tall rune also it strips double letters :c

gonna be real, this will probably never not be a feature (necessity) of ilo muni
there's just Way too many ways to write words if i don't collapse duplicate letters
which makes the database too massive to deliver as i'm currently doing it

tall rune
#

yeah thats fair

quaint wagon
#

there's a similar but less massive problem with capitals

tall rune
#

mhm

quaint wagon
#

fwiw, in processing it's more like

  • collapse duplicates (preserving the first cap)
  • score
  • lowercase
  • frequency count
    that is to say, i score things with their caps, then remove caps later for space/counting reasons
icy turret
quaint wagon
#

these are really all the same problem

#

this also assumes the english words are, themselves, rendered appropriately

icy turret
quaint wagon
#

i'm not honestly sure it would even do that

#

not without a pretty sizeable manual processing step, anyway

tall rune
#

apparently there was a little stella bump in june 2021

icy turret
#

my hypothesis is that 1 is essentially not affected, and 2 3 are fewer than the number of english words that are correctly preserved by this

#

but like yeah

#

its so far removed from the point of the tool that its just not worth your time

merry walrus
meager steeple
#

"mi li" and "sina li" can still be found in sentences like "soweli mi li pona"

icy turret
# merry walrus

unfortunately this can't be used to draw conclusions about the frequency of ungrammatical li because what ilo tani said yeah

meager steeple
#

sona mi la mun li wile ken e ni: ^ mi li la ni li alasa e toki lon open toki taso

icy turret
#

@quaint wagon check this out

#

two bits of insight from this

#
  1. 90% of toki pona is those words (i think is how youre meant to read it???)
quaint wagon
#

nope the scale is wrong

icy turret
#

fuc

quaint wagon
#

the shape is right but the numbers are wrong

#

my bad

icy turret
#

right, its the same scale problem

quaint wagon
#

check again in relative; you'll have exactly one line anyway

icy turret
#

anyway insight two is not affected by that issue

quaint wagon
#

20% of toki pona is those words, btw
roughly

icy turret
#
  1. assuming early tokiponists weren't using really fancy toki pona which they weren't, the fact that these words add up to less probably means that early toki pona data has more non toki pona noise in it
quaint wagon
#

oh yeah no doubt

#

to help a bit, move up to sentlen of 2

#

or even 3

icy turret
#

minor gui nuisance: you might not want to allow users to do this

quaint wagon
#

the math will be done against the total number of words, not sents of len 3+, but more words = more opportunities to score correctly

icy turret
#

ye

#

@quaint wagon vertical lines grid in the graph background. are they hardcoded

quaint wagon
#

nah they're a default of chart js that I did not investigate

#

i could do a ton more presentation wise

icy turret
#

noted

#

@quaint wagon whats the correct way to compare the frequency of a vs a a vs a a a

#

i tried a - a a, a a - a a a, a a a - a a a a but from the vibe of the chart i feel like ive not considered something

#

misali has been declining in mentions

fresh sentinel
#

oh, the

icy turret
#

apeja li mi peaks during every sptp because we're a crowd and we shout it

#

pretty expected. had to use log scale to see toki Inli taso

drifting breach
#

the unpa season

merry walrus
#

I am going to listen and see what you are trying to say

#

Oh I've listened to þis before

merry walrus
#

I listened to þe whole song really trying to interpret þe lyrics in a sexual way and it feels so much like a stretch þat I don't þink þis is what your talking about or maybe I am just really misunderstandign what she is saying

#

Oh þere are lyrics

#

I'll come back

icy turret
quaint wagon
#

this is a mistake google makes too, if their docs are to be trusted

icy turret
#

sona pona

merry walrus
#

I really misunderstood her

#

Like misheard her words a lot

#

I loved þis song because it sounded like

#

two lovers who are apart woefully missing each oþer

#

But now þat I'm reading þe lyrics

#

It's just like

#

not þat

#

ugh

#

Anyways I want to find out what Majeka was being referred to here

#

Oh I found it

#

No?

#

Þere are instances of majeka as a magic nimisin in 2021

#

Like 3 times which ig aren't here

#

or maybe I'm dumb because I don't know how þis þing works

drifting breach
#

do these spikes really exists?

merry walrus
#

mi ni e mama sina li is powe despair

novel lintel
short narwhal
#

a tomo ni li lon

novel lintel
short narwhal
#

mi kalama e sitelen akesi
(there are slight rounding errors due to how i made the wave in audacity: 1, normalizing to be within +/-1; 2, rounding to 1 decimal point; 3, each month was 1 sample point of a wave, so i slowed it down in audacity to auto smooth it)

brittle mulch
#

all spiked oct 2021 hmm

edgy cradle
merry walrus
#

It seems þat (ik pi li is an old grammatical þing) people's grammar's getting better

uneven prairie
fresh sentinel
#

i'm also curious about

  • pi * e
  • li pi
icy turret
#

but for li pi we can at least say it was probably more used pre pandemic

#

and pre pandemic you legitimately had a lot of people learn from jan Pije so

#

then again jan Pije's site already purged weird usages after pu came out

fresh sentinel
#

pandemic probably meant that lots of people were learning for the first time with good resources

icy turret
#

not necessarily

#

jan lentan's came out in like what 2021? i forget

#

someone should check tbh

fresh sentinel
#

good = avoiding li pi

#

what about pi * e? like "soweli pi moku e kala"

#

(i can't run Muni because it doesn't work in my browser at this time, i've filed a GitHub issue)

icy turret
#

and there are few trigrams that fit pi * e

#

most are too infrequent to be included

#

so the result is unrepresentative

#

and you cant sum them up, it shows as different lines

fresh sentinel
#

ahhh

icy turret
#

so for example

#

if there were 2 results for pi walo e

#

in all texts

#

ilo Muni doesn't store that trigram because not enough data

#

and won't display it here at all

#

it also only displays the top 10 ish(?) when doing a wildcard search

icy turret
#

@quaint wagon smoothing suggestion: try a kernel with soft edges

#

google ngrams doesn't do that but it feels like itd be better at literal smoothing of the graph

#

so instead of peaks becoming plateaus, they would become more spread out peaks

#

-# ofc this comes with the disclaimer that this feature request is for whenever you feel like working on muni again

quaint wagon
pseudo smelt
mint knoll
#

pu ku su

quaint wagon
#

I wonder if I could deliver the given graph as the open graph image for the site, if the URL params are filled in

#

That's... Probably irresponsible with my database? I actually don't know what it looks like networking wise when you fetch the metadata of a link
Is discord doing that for you and then sending you the result? Or is it each individual who sees the link?

tall rune
#

it does it for everyone individually

#

theres this one website whose embed is a simple math problem, except its randomly generated every time, so when you post it, everyone sees something different

tall rune
short narwhal
#

pi * e = pie 🥧 👍

fresh sentinel
quaint wagon
# tall rune theres this one website whose embed is a simple math problem, except its randoml...

Ooo that's super neat
And actually brings up a different question:
I probably need to have an actual server to do this trick, huh?
All my JS is client side lmao, so that idea is not happening
Although it does occur to me that an alternate way for me to deliver the app would be to have the entire thing be hosted on like, Vercel? And keep the entire DB in memory, using just sql.js on the server side
Something to investigate for later

tall rune
#

this is the thing

quaint wagon
#

LMAO

mint knoll
tall rune
#

nah its 19 trust

quaint wagon
# fresh sentinel what about `pi * e`? like "soweli pi moku e kala"

actually, this is a search i could allow
the reason I limit it to standing in for a single word is because if you search for the top 10 matches of a given form like this, you'll only get back 10 phrases of the minimum matchable length for the search; shorter phrases are also more numerous.

#

granted there are some places around funky grammatical features that could be exceptions, but there won't be many of them

fresh sentinel
#

ooh

quaint wagon
#

what i could do is performance testing on allowing multiple wildcards; there are few enough terms available that it could work

whole epoch
#

it's interesting how, despite interjection lon becoming more unpopular, interjection ni has stayed at a steady level

#

also cool to see how "mi mute" for "we" is on a steady decline
back when i learnt tp originally way back when, it was almost universal to use it (as i remember it anyways)

icy turret
#

its been backlashed

whole epoch
#

yeye

#

i just expected ni have gotten more popular

quaint wagon
#

@icy turret i think it was you who posted it but i can't find it; you posted the comparison between nimisin and a few of the other "standard low use but around" type words like linluwi and majuna? and i have thoughts about that as well, actually
well, one thought really:
a user of the word nimisin is much more likely to talk about lower use words; conversely, those who don't use the word nimisin talk about newer words less, resulting in the phrase nimi sin being about as used as the word nimisin.
in fact, this is something i could probably demonstrate in my primary db? by counting the number of distinct authors who have sentences containing nimisin, versus the number of distinct authors who have sentences containing the phrase nimi sin

fresh sentinel
#

i am generally curious about like
if Muni describes a word as popular
is it because many people are using the word, or because spiders Georg is using the word a lot

quaint wagon
#

i know for a fact this is happening to lipamanka and nano, because their names are primarily said by themselves
but determining that currently requires checking manually

#

you can be extremely confident that words over rank 150 are actually in use by a variety of speakers, but the range of confidence drops significantly after that since you go from several thousand uses at rank 150 to several hundred uses at rank 200; that's an amount which you could swing by in a single day of being silly

fresh sentinel
#

idk the privacy implications of this but
if Muni could say whether a word is being used by under 10, 100, 1000, 10000 people
and then lines could have different thickness or opacity depending on the category
that would maybe make it easy to visualize if a spike is a silly spike

#

this wouldn't catch mu mu mu

quaint wagon
#

ooooo wait that actually is interesting and i don't think it would be hard to check

#

uagh it would be a lot of queries tho lmao

#

well, not on ilo muni's side
db generation side

#

i'm unsure how to represent it on the graph tho; line thickness gets difficult to judge if there are more than 3 distinct thicknesses

fresh sentinel
#

3 might be enough? for 10, 100, 1000

quaint wagon
#

also, for reference, distinct author count is imprecise bc of pluralkit and more generally bc i can't combine authors across platforms

fresh sentinel
#

idk how often a word will be used by more than 1000 people

short narwhal
fresh sentinel
#

in one month

quaint wagon
#

oh i see, you want that info on a monthly basis

#

that could be harder

#

that makes sense tho

fresh sentinel
#

not necessarily! this is just the first idea that came to me

#

and wouldn't change the UI too much anu seme

tall rune
#

we found a spiders moment yesterday

icy turret
#

@quaint wagon you could create some kind of measure for how concentrated word use to one person vs many people

tall rune
icy turret
#

this sounds vaguely similar to the gini coefficient

tall rune
quaint wagon
#

my god

#

i have no way to fix something like this

#

like, not reasonably

tall rune
#

o tonsi tawa ale owe

quaint wagon
#

idk at least it's clear it's a spike of silliness

icy turret
#

o tonsi tawa ale

whole epoch
short narwhal
#

sina alasa e pipi Georg

fresh sentinel
#

this would show up as an under-10-people line in my proposal, which may help

quaint wagon
#

right

tall rune
icy turret
#

you know, i can ruin your data by posting procedurally generated text at least once a month in large quantities

quaint wagon
#

i would omit you, personally, from the data in that case

icy turret
#

the correct solution is ofc to ban me from ilo Muni

#

ye

quaint wagon
#

actually i think i neglected to mention this entirely but i do omit #jaki from the data

icy turret
#

yeye

fresh sentinel
#

reasonable

quaint wagon
#

which i think is completely understandable ye

#

uagh i've been fiddling with postgres all morning at work and now i'm looking at the sqlite db

#

explodes due to slightly different keywords

fresh sentinel
#

starting a public server and calling the general chat #jaki /utala

tall rune
#

other stuff i did yesterday was making these sorts of graphs to compare the relative popularity of two words, pretty fun to look at

quaint wagon
icy turret
quaint wagon
#

that channel, in text, is 4gb of the 56gb of the entire raw discord dataset

fresh sentinel
#

holy shit lmao

tall rune
quaint wagon
#

i'll grant that text is a lot of spare discord fluff i.e. authors and their roles and metadata on a per message basis
but.

tall rune
#

i also learned that nobody said mije in november 2016

quaint wagon
#

why do you have a-a on the graph?

tall rune
#

on the previous ones it was to highlight the 0 line

tall rune
icy turret
tall rune
#

makes sense

quaint wagon
#

yeah, until i think march 2017? the data is all telegram and reddit

fresh sentinel
#

a time when men didn't exist

quaint wagon
#

mije is reliably less used than meli which is very amusing to me

short narwhal
#

tenpo pi mije ala

tall rune
icy turret
tall rune
#

there was a time in early 2023 tho where mije was more popular

icy turret
#

i just saw someone say minisin and this is making gears turn in my head

short narwhal
#

misinin

quaint wagon
#

gears grinding violently, making a terrible screeching noise, but just barely turning

icy turret
#

what would a minisin look like

short narwhal
#

me when i do this again

#

mi ni sin

quaint wagon
#

ok i think i can do author counts on a per ngram basis

icy turret
#

it might be better

quaint wagon
#

would i not need author count in order to get that info

#

the sqlite db side of it would be easier than i thought initially, since i can just add another row to the frequency table "num_authors"; the number of authors of a given word or phrase is always containable in the same dimensions as a frequency entry i.e. some phrase, some min sent len, and some date (representing a range)

#

in order to count authorship, i need a way to know what distinct authors have said a given term, for which i can use their edgedb generated uuid (note: authors are joined by their combo of platform and platform id+name, and only may be entered when i enter a new message) and probably stuff that in a per-entry set for the lifetime of the ngram counter, which is the time it takes me to count a month of data
when i query sentences, i just tack on the additional info of sentence.message.author.id and use that to update the authorship sets corresponding to the phrase i'm updating
and at the end of a month, write back the length of each set side by side with the frequency data

icy turret
#

i think it might just be num authors of all ngrams per month? if you count authors for every ngram individually you lose the 0 case

#

not sure tho, maybe thats stupid

quaint wagon
#

i am intent on getting num authors of all ngrams per month and num authors for each ngram

#

the frequency table and ranks table are nearly identical for a reason

#

that reason is, while you can count the ranks info from the frequency table, it takes over 20x the reads

#

i'm essentially packing a more specific type of query into that table

quaint wagon
#

question:
if i implement a selector for smoothing method, would that be overwhelming
there's already. a lot of options lmao

#

@icy turret @fresh sentinel @whole epoch @tall rune

fresh sentinel
#

i haven't actually used the tool yet ehehe

quaint wagon
#

ah fair

#

did you see my suggested search btw
it really shouldn't take more than 3s to resolve a simple text search even with the default settings
and since those other tools all work, it must be something I'm doing that's breaking on your browser
idk what tho, and you can't pull up dev tools on mobile, aaaaaaa

icy turret
quaint wagon
#

uh, my naming method would have been to use the name of the smoothing method as i discover it on Wikipedia pages about stats w.r.t . timeseries data

whole epoch
#

and if it's too much you can just not use it (assuming the default is good)

quaint wagon
#

@wanton shoal (hi) suggested exponential smoothing and it's pretty good bc it mostly preserves changes in direction while moving the peaks and troughs nearer to one another; it also passes the tonsi and misikeke tests, correctly showing 0 for any point prior to the initial non-zero point of those graphs
i have also found gaussian and median, which are respectively extra curvy and extra blocky; gaussian fails the tonsi/misikeke tests but is good for trend analysis; median does decently at those tests, passing them up to 30 smoothing

#

actually median technically fails the tonsi test bc of the 3 occurrences in July 2019, 0 in Aug/sep, and 21 in Oct; it omits the July data for a while

wanton shoal
#

might be good to add a dropdown where u can choose from multiple methods? and maybe do a little explainer on what they do and don’t do, for the uninitiated?

quaint wagon
whole epoch
#

in the help page there are explanations

wanton shoal
#

oh am on phone and didn’t scroll up that far soz xd

quaint wagon
#

np!

quaint wagon
wanton shoal
#

i played around yesterday with only including data up to time T, but not beyond it, so data wouldn’t get extended into the past

#

however it didn’t seem to work all that well tbh

quaint wagon
#

exponential?

#

or median?

novel lintel
wanton shoal
#

the thing is, it works to some extent, but imo the data only barely resembles the input haha
here's the input:

#

and 5 smoothing applied

#

i can see the resemblance, but in the original the peaks aren't nearly the same height (i suppose the second peak is flattened a lot)

#

(also, "the tonsi test" is a very funny phrase to me for some reason)

wanton shoal
#

median smoothing also sucks

#

destroys the data, yet suffers from the same issues as simple avg

#

mi sona ala. pilin mi li ni: ken tu li lon.

  1. ilo pona ni li awen. ona li pona ala, taso ona li sama ilo Ngram
  2. ilo one-sided exponential li kama kepeken.

... 3. some data analyst helps out :D

#

lots of research into things that deal well with cyclic data, passing only certain bands for signal processing, whatever
but i feel like just stat vis isn't really a focus of any research. very thin info online

#

taso mi toki taso. jan Kekan San o toki e pali :D

quaint wagon
wanton shoal
#

lon

quaint wagon
#

ken suli la nasin ona li pona nanpa wan tawa lukin pi tenpo pini

#

taso, mi awen alasa e sona li wile toki tawa jan pi sona nanpa, a a

wanton shoal
#

lon. mi pilin e ni: ilo wan li pona lon ijo p# wan. ilo tu li pona ala lon ona li pona lon ijo p# tu.

quaint wagon
#

aaaaa ken

wanton shoal
#

lukin la ilo pi ken ale li lon ala :D

#

sama ijo ale hahaha

quaint wagon
#

a a a, lon

merry walrus
#

Out here doing þe Lord's work 🙏

#

Ignore þe Belarusians

#

hijksafkjajg

quaint wagon
#

.... belarusians?

merry walrus
#

Þe only oþer example of pokasi I've seen is it being used for Belarus

pseudo smelt
#

… pokasi?

quaint wagon
#

anyway this increases the necessity of unique author tracking ehehhehe

merry walrus
#

Yes

#

How much have I contributed to þe Pokasi population

#

Probably most because I made it up

#

Þe greatest þing I ever did was make a nimisin in my first SP sentence

#

I didn't even know what a nimisin was

wanton shoal
#

power move

pseudo smelt
#

wawa

wanton shoal
quaint wagon
#

i am halfway through writing this as we speak LMAO

wanton shoal
#

also re: issue #13 regex having look-behind and look-ahead is cursed anyways, makes it a CFG

#

lmao

quaint wagon
#

fair enough tho, this is easy to slot into my work!

wanton shoal
#

i was faster >:)

quaint wagon
#

ehehe

wanton shoal
#

would have been a good 20 mins faster still if i didn't accidentally work on the unforked branch lmao

quaint wagon
#

lmao

#

i should probably do my work on branches instead of directly on main now huh

wanton shoal
#

👏

#

preach

#

even when working alone, i've noticed that as soon as a project grows in size and i wanna do some quick fixes, PRs are still a godsent

#

cuz manually selecting what to include in the diff is annoying, when you've refactored half a file :D

quaint wagon
#

true, ehe

#

lazygit makes it smooth when necessary tho

#

i very rarely have to dip into my git plugin's diff view to fix things

wanton shoal
#

i guess it's also a result of me refusing to use the command line for git anymore, since i only work in IDEs nowadays, and it's just so much easier for me xd

#

so i'm kinda accustomed to select all, commit, push

#

but that does hide the fact i forgor to checkout another branch haha

quaint wagon
#

ehehe, np

#

i've merged your branch to a separate branch merge-smoother which i wish i just called smoother; gonna add a smoother url param and some logic to disable the button as necessary!

#

aside: oh god i have no idea if my js is any good, i am not good at computer

#

ah jk i just missed that you did so in reading the github diff!

wanton shoal
#

...but yes, some refactoring might be good haha

quaint wagon
#

yeahhhhh

#

ehehhehe

#

see the thing is
sona toki is the best code i've ever written
aaaand this whole project was downhill from there :D

#

i do know at a minimum that the sqlite file is a mess bc i've mixed the responsibilities of querying and mutating the data there
and that the input file sucks bc i'm doing a bunch of crappy splitting instead of like, actually parsing user input

meager steeple
#

btw, are you analyzing each message or sentences within a message?

#

if i put a full stop or semicolon would that turn the message into two sentences

quaint wagon
#

my sentence tokenizer isn't perfect
quotes count for it too
altho i am considering removing them honestly

meager steeple
#

sona

quaint wagon
#

i wanna sit down and do some analysis to that end
on one hand,

  • quotes, double and single, are definitely used to distinguish sentences
    on the other hand,
  • in toki pona, they much more often already are marked as sentences based on the other punctuation
    ofc, both of these are anecdotal statements- i wanna know which is more true and when
#

also, i wanna do intra-word punctuation in the word tokenizer for isn't don't
not because i need that to tokenize toki pona, but because doing so would slightly increase my accuracy of detecting toki pona

quaint wagon
#

@wanton shoal turns out that exponential is misleading for peak-y data! Fook

merry walrus
#

I don't get what þe smooþing does so I'll assume it's doing good here

quaint wagon
# merry walrus I don't get what þe smooþing does so I'll assume it's doing good here

the data is really noisy to begin with, so smoothing attempts to reduce that noise while still representing the data faithfully

but the problem is, under some circumstances the smoothing is inaccurate

  • if you have data with peaks at specific times, the peaks can be spread out in a misleading way (center averaging turns them into plateaus, and exponential smoothing turns them into slopes)
  • if you have data that is zero for a while then starts, smoothing can spread the data to before there was a real data point
#

ok having the realization that i can't really produce an appropriate smoothing algorithm without knowing what it means for there to be a "signal"

#

really, the smoothing necessary just depends on the input
and different terms will have different signals

#

median works extremely well on continuous data, but demolishes seasonal data instantly

#

extremely funny

#

no smoothing for comparison

#

exponential is reliably the least silly of all the smooting methods

#

relative minmax is nice for comparison multiple test cases at once

merry walrus
#

damn misikeke barely edges it out

#

RIP tonsi

quaint wagon
#

lmao

tall rune
#

im skeptical of smoothing and just turn it off when i use the tool usually

icy turret
hidden tide
#

tenpo Mopiju li lon

#

as you can observe, in times of tenpo pi kama sona la troughs form

#

here's the usage trend of various video games that were played on ma pona, i ought to add gartic phone but i don't know what the common tokiponization of it is

mild glacier
#

hey

#

hows this?:

hidden tide
#

each instance of a larger string of letters adds to the count of the smaller strings here

mild glacier
#

ah okay, cool!

#

hows this?

hidden tide
#

snazzy

meager steeple
pseudo smelt
#

a!

#

muti

hidden tide
#

muti mute

#

wait no i'm blending languages

#

muti a

pseudo smelt
#

muti muta
(sona mi la “muta” li nimi majuna pi toki sike)

meager steeple
#

muti tu

#

anu seme

hidden tide
#

i mean i guess

#

but i think just a afterwords sounds more natural

#

anyways, here's jan Wano's graph that looks at each string individually without accumulation

#

take away the first line and we can see the trend of how many as are used for laughter

meager steeple
quaint wagon
#

A AA A A A A

merry walrus
#

gyuhjkeafafd

#

Damn þe rest of tp þrough L just barely beat out þe particles

quaint wagon
#

LMAO YOU DID ALL OF THEM?
send link? @merry walrus

meager steeple
#

sina wan a li suli e "ijo sama"

quaint wagon
#

MI WAWA

meager steeple
#

wawa...

#

nimi ale li lon ala a a

quaint wagon
#

i know this for a fact: the pure particles are only about 20% of toki pona

merry walrus
#

I did it from memorie so I might be missing a couple words in L

quaint wagon
#

you haven't included "toki" or "pona" or "sona", or any of the pronouns for that matter which will ofc be a huge portion of all toki pona

#

i get that was the point to be clear :P

merry walrus
#

Oh I missed kule and laso damn

quaint wagon
#

those won't make much of a difference ehehe

merry walrus
#

Critical words

quaint wagon
#

i mean number wise

quaint wagon
#

i found the source of the wan/tu/mute/luka/ale spike in oct 2021

#

there is a channel in one server where people genuinely counted to the high thousands in the pu numbering system

#

well, i say people, but it seems to have been almost entirely one person

meager steeple
merry walrus
#

based

quaint wagon
#

things brings on an entirely new question
like, do i exclude that

#

how much is that reflective of "using the language"

merry walrus
#

Anoþer reason for a unique auþor feature

quaint wagon
#

you know, that's true tbh

#

i'll leave it be for now

placid crow
#

@quaint wagon how did you get the data for Muni? Ik it was through online to communities, but how did you download the data itself?

merry walrus
#

Illegal meþods /idk

quaint wagon
low timber
#

toki pono vs pono

verbal marten
mild glacier
#

what are the little numbers that people add?

#

like "a a a_6

whole epoch
#

that's the minimum length needed

mild glacier
#

ah okay

whole epoch
#

so like
toki_4 only shows toki from phrases with 4 or more words

mild glacier
#

ahhhh... okay!

#

i get it!

#

gants :3

whole epoch
#

it's very useful!

tall rune
#

i found another interesting spike

whole epoch
#

lmao??

tall rune
#

(this is a relative graph, the absolute graph is less extreme)

#

still there tho

mild glacier
#

hows this?

whole epoch
#

too little to say anything tbh, only a handful of uses

mild glacier
#

yeah

#

lol :3

#

(i have a weird urge to say sorry after every problem i kinda make)

tall rune
#

for some reason the tool isnt letting me search "unpa li ken *"

#

i wanna find out what the next word isssss

whole epoch
#

just unpa li ken has no results found so that's probably why

tall rune
#

thats incorrect

#

maybe youre in a too specific timeframe

#

"unpa li ken" is the big spike in march 2016

mild glacier
#

musi la:

whole epoch
#

nasa a

tall rune
#

the spike is some 6 word phrase that starts with unpa li ken

mild glacier
#

what is it?

tall rune
#

idk

#

its not letting me do the *

mild glacier
#

unpa li ken pakala e sina e sina kin

#

:D

tall rune
#

its not pakala

mild glacier
#

ah

tall rune
#

i cant search for it because its not from this server

whole epoch
#

since it's 2016 i assume it must've been like, a meme or running gag on telegram?

#

or maybe reddit

tall rune
#

i guess

quaint wagon
tall rune
#

ah

#

it just sticks out funkily

quaint wagon
#

granted it's odd it happened all at once, but probably makes sense to whatever discussion happened there

#

yeah, there was just a lot less conversation going on at the time

hidden tide
hidden tide
#

big spaghetti time:
here's my relative minmax graph attempting to convey the trends in focus over time of various meme phrases throughout ma pona's history. by "meme" phrases, i mean any word or phrase that is often repeated among speakers or has some sort of server addon like an emoji or a sticker. obviously incomplete. o lukin pona a!
https://gregdan3.github.io/ilo-muni/?query=a+-+a_2%2C+omekapo%2C+omekalike%2C+ale+li+pona%2C+mu%2C+kijetesantakalu%2C+ikea%2C+ilo+nanpa%2C+kekan+san%2C+jan+telakoman+li%2C+misali%2C+lonsi%2C+mi+tawa+tomo%2C+mi+tawa+e+tomo%2C+lon+-+lon+a+-+lon+ala%2C+lon+ala%2C+mi+sona+ala+a%2C+pingo%2C+kasi+ike+mute%2C+usawi%2C+nimi+sin%2C+nimisin%2C+tonsi%2C+su%2C+owe%2C+kamalawala%2C+wawa%2C+waken%2C+akesi+kule%2C+lon+a&minSentLen=1&scale=normrel&start=1470009600&end=1722470400&smoothing=0&smoother=cwin

#

something interesting that can be found by max smoothing it out is how, depending on what shape is portrayed, "wawa" is in a steady trend upward and either has passed over or is on the verge of passing over "ale li pona"

meager steeple
#

for standalone sentences it's been surpassed

hidden tide
#

fun

fresh sentinel
#

i feel like wawa interjection is maybe the closest pu equivalent to epiku interjection

#

that's how i use it anyway

fresh sentinel
#

oh good point

#

is it possible to search for a two-word interjection?

icy turret
#

pona a - pona a_3

river shard
#

uh
wawa_2 ?

meager steeple
#

wawa_2 is all messages that contain wawa with length 2 or more

meager steeple
#

so if i do wawa - wawa_2, it's the number of sentences containing wawa minus the number of sentences containing wawa of length 2 or more; aka how many times wawa was a standalone sentence/message

winter thicket
#

jsyk a link in þe help article is missing a slash

hidden tide
#

for whatever reason, August 2022 was the month of mu

#

absolute chart tells it better

quaint wagon
quaint wagon
#

well, fix is publishing in like 1 minute

tidal knoll
quaint wagon
tidal knoll
#

nasin pi te kepeken pi to en
nasin pi te kepeken lon to en
nasin pi te lon kepeken to li awen nasa

#

^ tan fucking seme la ilo pi toki pona taso li pipi e mi?!

quaint wagon
#

o alasa e ni: kepeken *, kepeken ala *
ni la sina ken lukin e ni: nasin ni li lon; ona li suli tawa nasin wan mute pi nasin ante. taso ona li ijo wan lon poka pi nasin ante mute li lili wawa tawa ona ale.

quaint wagon
quaint wagon
#

lon! ona li suli nanpa tu, taso ona li open e poki. ona li suli nanpa tu lon poka pi nimi ale ni: open poki en ijo poki li lon.

#
#

ni la sina ken sona e ni:

  • nasin pi nimi e li kama lili
  • nasin pi nimi e ala li kama suli
  • nasin pi nimi e ala li suli mute tawa nasin pi nimi e
tidal knoll
#

wawa

fresh sentinel
#

ilo penpo li pipi tu e sina

quaint wagon
#

seme a?

fresh sentinel
#

pipi utala tu li lon li sama

#

pipi kala wan li lon li wile

wicked stratus
#

what did that one person say about forbidden spaghetti

river shard
#

Hmmm something about HAVING HIGH STANDARDS?

#

but I try not to be upsetti

quaint wagon
#

i should Probably prevent people from doing this for exactly the reason you're seeing
but I also don't think it matters enough lmao
maybe it does to avoid bandwidth problems

hidden tide
#

so why did mun Kekan San choose to say walo loje rather than loje walo?

low timber
#

nimi suli li suli
nimi sewi li sewi lili
taso, nimi suwi li suwi e sewi ona
nimi seli li awen lili sama seli moli

#

pakala ni li seme a

#

ni kin li nasa

tall rune
#

theres a lot of these

#

both in #sitelen-pona and in the now archived #spt-jaki

#

@quaint wagon would it be worth excluding that channel?

#

i remember we talked about how identifying webhooks (which is what these are) didnt work out for you

#

but excluding spt-jaki from the data seems like a good idea

edgy cradle
tall rune
#

cumulativeeeee

quaint wagon
# tall rune <@497549183847497739> would it be worth excluding that channel?

hmm!
this is a case where i think the following two improvements would be better:

  1. scanning through all my webhook messages for pluralkit messages specifically, so i don't count we hooks
  2. scoring sentences with consideration for the message they're a part of

i would like to avoid poking more channels out of the list, preferring a code based solution that more correctly separates good and bad sentences, so that i can keep the intended fairness of the system

in this case, a document level score would ideally mark that first sentence low because it's among a bunch of non-tp words

#

also, there are other weird spikes that would be fixed by document level scoring! like pipi Kewapi saying "ilo o ken e toki ni a" genuinely some thousand times which i think is absolutely hilarious

tall rune
#

if you can do that, great!

#

i just thought cutting out that channel would be an easy solution

#

it feels like a channel where not much would be lost anyway

quaint wagon
#

that's probably true, yes
but i need an excuse to do the webhook check and the document scorer anyway :P

tall rune
#

fair

royal orchid
mild glacier
#

sitelen pi ilo Muni /musi /literal

pseudo smelt
#

musi a
seme a la ona li ilo tawa

mild glacier
#

mi sona ala /shrug

#

¯_(ツ)_/¯

icy turret
#

@quaint wagon sona pona wants the ilo muni logo cc-licensed btw

#

talk to em for details idk shit

mild glacier
#

couldnt they just ask for permision to use the picuture?

#

sorry for asking

#

i dont know alot of things

pseudo smelt
#

mi kama lukin e sitelen ilo… la

meager steeple
#

pakala...

mild glacier
#

among ku!!!

novel lintel
fresh sentinel
#

ona li wile pana e sitelen ale tawa poki Wikimilija

hidden tide
#

i believe in waso supremacy (not accounting for soweli which is giant in comparison)

meager steeple
#

aa seme la kijetesantakalu li suli sama akesi sama pipi

quaint wagon
#

the shape of the relative log graph is ultimately correct, but the numbers don't mean much

hidden tide
#

(wink wink)

plucky gazelle
#

pretty linjas

royal orchid
royal orchid
#

over the past year, anu seme has been, on average, 65% of all instances of anu

quaint wagon
quaint wagon
royal orchid
#

sona a

neat bramble
short narwhal
#

lmao i got 24/4+13=19

#

wait fym 32/4 + 15 = 19 thats 23

neat bramble
#

oh i'm silly

#

that's what they mean when they say 99% fail

#

hey look it's txnor the home of the unpa hack

edgy cradle
royal orchid
#

sama nimi wawa la, nimi sona kin li kama mute:

royal orchid
#

wan la, ni

#

musi la, ni

pseudo smelt
#

seme a

quaint wagon
#

ilo li toki e ni:
"tenpo wan la... toki ni li lon."

drifting breach
#

what are + and -?

#

@quaint wagon

quaint wagon
# drifting breach what is "XXX_n" ("a_3", "sona_2", etc.)?

the number after the underscore means to only count the times that word occurred in sentences with that many words or more! e.g. toki_2 counts only times where toki was in a sentence with at least 2 words.
and the plus and minus are literal- add or subtract two or more things. the math being done is, for every period of time that is represented, add together all the words. or, add/subtract, from left to right, to be exact.

merry walrus
#

Kokosila isn't ÞAT far ahead

#

Huh

#

Stupid þought

#

but I wonder how much affect I had on þat huge isipin spike in Oct and Dec

#

Oh it's a bot reposting some nimisin

#

And maybe Giggity Mantis yapping about using isipin

fresh sentinel
#

bots aren't counted in ilo Muni

pseudo drift
#

is there any info on how the data is handled here, by the way? I couldn't see any info about how or if it was anonymised

royal orchid
#

ni li tan seme a?!

#

kalama lon sptp???? toki jaki ala la, mi sona ala e tan ken ante

whole epoch
#

didnt notice this until now but you should probably add somewhere that wildcard ignores e and li

whole epoch
#

erm seme a

royal orchid
#

pakala nasa a

#

ona li pana e nimi pi mute nanpa 11-19??????? tan seme a??????? 😵‍💫

quaint wagon
whole epoch
#

nasa a,,

#

ok now it doesnt but it did before

quaint wagon
#

That's a lil cursed and given the query I'm doing is incredibly simple and the processing I'm doing after that is also simple, I think I can only blame the library I'm using to query the DB?

#

And it's a funny one so fair

whole epoch
quaint wagon
#

Uh, not really, no
The query as written should fetch the top 10 items, because it searches the ranks table and orders that by occurrences
The only way that could return something other than the actual top 10 would be if the query were convinced of having the top 10 items early/incorrectly

low timber
#

musi Mapo la ilo li toki e ni

quaint wagon
#

ILO

quaint wagon
# royal orchid ni li tan seme a?!

I KNOW WHY THIS HAPPENED

there's a funny but unavoidable oversight in how i detect bots vs webhooks
i don't actually have specific info about whether a given author is a webhook
all i know is whether they're a bot, and what roles they have
webhooks never have roles, while bots do if they're in the server
this is all i have to distinguish them as far as i'm aware?
so if i encounter a bot user with no roles, i mark them as a webhook

anyway mappo is not in the server!

#

if i just do the pluralkit message check thing i can fix this but i have not yet

royal orchid
#

a, sona!

quaint wagon
merry walrus
novel lintel
#

mute la nasin [UCSUR] en nanpa suli li lon · taso ni kin · seme a

#

suno wan la toki tu wan ni li lon · ma ni ala · ma pi kama sona ala · la ma seme

#

ni li musi wile mi

#

wile mi la mi ni ala https://xkcd.com/1138/

#

mi kepeken nanpa pi mute kepeken la linja ale li sama mute tan ante tenpo pi suli kulupu · taso mi alasa e linja pi sama mute mute · la ni ala anu seme

quaint wagon
quaint wagon
quaint wagon
#

mi sin e sona lon tenpo kama la nasin nanpa tu wan o lon anu seme...

fresh sentinel
#

sona mi la, ilo Pulaki li weka e sona mama pi toki majuna

#

mi sona ala e suli tenpo

#

taso mi alasa e sona pi toki majuna la ilo Pulaki li toki ala

quaint wagon
fresh sentinel
#

n, pakala mi li ken tan ijo ante

novel lintel
#

mi kama lukin e ijo pi tuki tiki e ijo musi mute ala · ni li musi lili

novel lintel
low timber
#

sona kiwen

quaint wagon
quaint wagon
novel lintel
#

sina ken jo ala a e sitelen

#

sina o sona e sitelen wile · taso poki pi kalama musi li lon

#

mi ni

yt-dlp https://www.youtube.com/playlist?list=PL3meDZ0v1E3e5hwSyfz9Os9ZUsMw4ecwz --get-comments --exec pre_process:"del %(id)s.comments.json" --print-to-file "%(comments)j" "%(id)s.comments.json" --exec "type %(id)s.comments.json" --skip-download

la ale li pona

#

a taso kalama pi mute ala li lon poki · la n

#

"ytsearchall:toki pona" li ken

quaint wagon
#

mi toki ala e sitelen suli li toki e sitelen wile
mi o jo e ona ale
alasa mute a

novel lintel
#

sina jo e ale pi ma mute anu seme · la o alasa e nimi linjuwi pi ma [YouTube] lon ona

quaint wagon
#

aaaaaAAAAA

#

wawa

icy turret
novel lintel
#

-a mi toki ike · mi wile toki e toki anpa pi jan ante e toki pilin · e ni ala → jan li sitelen nimi e toki pi sitelen ona

#

ni nanpa tu li pona kin · taso lili a

plucky gazelle
quaint wagon
# icy turret i still don't know the sona kiwen joke

there is not even a joke anymore, it's basically a way to irritate lipamanka
at two separate times, I defended the idea that somebody could use the phrase sona kiwen to refer to something which is difficult to understand
once in toki pona, where the response was middling
the second time in English, the true place where nasin discussion lives, and my defense bugged lipamanka so it argued against my point for a few Hours
mind you after like 20 minutes i was barely involved anymore, but as these things go the discussion remained for a while
eventually most the participants were so annoyed that bringing up sona kiwen genuinely irritated them
naturally pipi Kewapi and I think this is hilarious

quaint wagon
quaint wagon
plucky gazelle
#

#toki-ale message kepeken ni pi nimi Juwi li lon ala tan seme

quaint wagon
#

mi weka e nimi ale pi mute ni ala tan ni:
ona li mute wawa li anpa e ken ilo

plucky gazelle
#

a

drifting breach
tall rune
#

site's

plucky gazelle
#

x/sites/

charred patrolBOT
quaint wagon
novel lintel
#

mi,, sona wawa ala

#

mi kama jo tan lipu pana mi

#

ken la --write-comments --skip-download taso li pona sama

quaint wagon
#

sona, lukin la ni

#

sitelen ale la ona li kama jo e toki

#

wawa

#

lukin la pali ale ante li wile poki e toki taso

neat bramble
#

wawa

quaint wagon
#

mi o kepeken nasin nanpa ni anu seme:
toki wan li lon kulupu lili la toki luka tu li lon kulupu suli.
mi open e alasa lon tenpo luka pini.

devout jettyBOT
#

poka pi ma [sanpansiko] la ilo tawa li lon • nimi ona li ilo [muni] a

sike Kapo li lukin li ton(Sí?) ↩️

[Reply to:](#1272180068721889290 message) musi a
seme a la ona li ilo tawa

novel lintel
quaint wagon
novel lintel
#

mi awen sona ala :p

quaint wagon
#

pakala

plucky gazelle
#

ni li ike nanpa

quaint wagon
#

tu tu mute li mute mute mute mute
luka luka li lili tawa ni la o suli e ona kepeken mute mute mute mute
ni o suli lon tenpo ale. ale li nanpa.

#

(5+5) * ((2+2)*20) * (100)

novel lintel
#

taso seme li pana e sona ni → tu en luka ala li lon poki mute

quaint wagon
novel lintel
#

luka tu wan mute li seme

quaint wagon
#

a. mi kama sona e pakala.

wanton shoal
#

tHONK mi pilin nasa tan ni

#

taso mute pi toki "anu seme" li 1820 taso la...

#

ken

quaint wagon
#

ni li nasa seme?

wanton shoal
#

mi pilin ni: toki "kxk" li lon mute

quaint wagon
#

aaa sona

wanton shoal
#

taso toki "ken ala ken" li mute ala kin lon lipu nanpa la mi ken sona

quaint wagon
#

toki mute pi nimi tu wan li mute lili taso

wanton shoal
#

lon, taso mi pana e ona tawa poki la ona li lukin lon nimi ale toki, lon ala lon?
taso.. ona li suli ala

quaint wagon
#

kin la toki lili "kxk" li lon lipu nimi pi ilo alasa

wanton shoal
#

lipu seme?

quaint wagon
#

a, ilo alasa li ilo Sona Toki. ona li kama jo e toki li alasa e sona ni: ni li toki ala toki pona?
lipu ni li toki pona nanpa wan tawa ilo
nimi pi nasin kalama pona li nanpa tu
nimi ijo li suli lon sitelen open li nanpa tu wan
nimi pi sitelen ale ken li nanpa tu tu
nimi mute li lon ala ni ale la, toki li pona ala

merry walrus
#

Damn a completely fucking blows it out of þe water (I wonder how much oþer languages have affected it)

quaint wagon
merry walrus
#

pog pog

icy turret
tidal knoll
icy turret
#

sadness

meager steeple
#

ala la o e soweli

#

mi toki e ni la ona o kama lon ilo pi tenpo kama

icy turret
#

-# ilo pi penpo kama

meager steeple
#

pakala nimi wawa

novel lintel
fresh sentinel
#

khdhkdsgg

icy turret
#

ona li kon kekan san

icy turret
#

....apparently russia(n(s)) died in 2019

#

also oof @ that peak around early 2022

#

apeja consistently peaks every august and i love that

tall rune
#

why is that

mild glacier
thick gust
#

sona musi

mild glacier
#

lon

#

seme li nena ni?

fresh sentinel
# icy turret

which one is red and which one is blue? for both of these

icy turret
icy turret
tall rune
#

a

fresh sentinel
icy turret
#

this one wasnt actually a comparison, rather the same trend for both words

#

but anyway the first entry is blue, the second is red

fresh sentinel
#

ahh

icy turret
#

the ✨ hierarchy ✨

fresh sentinel
#

it would really help if you include the labels on these, most of us don't have the color order memorized

icy turret
#

improved the hierarchy

fresh sentinel
#

pona

icy turret
#

i expected namako below core words but nope, right there with noka and nena

fresh sentinel
#

noka and nena probably experience the monsi effect

quaint wagon
#

they do, as far as the charts show, yes

plucky gazelle
icy turret
plucky gazelle
#

ni

icy turret
#

jan li kepeken ala ->
log(0) = undefined ->
sitelen li ala

plucky gazelle
#

a

#

tan li nanpa.

icy turret
#

tan li nanpa.

merry walrus
#

Oh it is explained later down

#

sorry

icy turret
#

no worries

#

🐸 🎮

plucky gazelle
#

why does lija even say gaming in that song

#

like what does it mean

hidden tide
#

she's a gamer

verbal marten
fresh sentinel
#

direction words and body part words are used less frequently in text and VC, and more frequently IRL and in VR

#

so they show up less frequently in ilo Muni

#

which only looks at text

edgy cradle
thick gust
wicked stratus
#

mijun

verbal marten
wicked stratus
#

wait thats cool

#

what happened ,,, what happened in august 2023,,,,

hidden tide
#

me

quaint wagon
low timber
verbal marten
#

ni li tan seme?

quaint wagon
verbal marten
#

a sona

quaint wagon
#

taso, ni li ale ala

#

mi alasa

quaint wagon
merry walrus
#

lmao

short narwhal
#

o tonsi tawa ali (:<

pseudo smelt
#

Be all moving nonbinary-ers

merry walrus
#

📠 but wiþ a t before þe x

#

or

#

in þe middle

#

fa>t<

thick gust
#

what's this?

#

ah wait, this is without smoothing

quaint wagon
#

will check db and report back when able

#

-remindme 7h30m o alasa e lon lon

latent zodiacBOT
#

Set a reminder in 7 hours and 30 minutes from now (<t:1725130846:f>)
View reminders with the reminders command

thick gust
#

ooh, mysterious

icy turret
#

im guesing its someokne playing around with "bubble wrap spoiler shields"

mild glacier
plucky gazelle
mild glacier
#

aa mi sona

quaint wagon
plucky gazelle
#

lon ala

quaint wagon
#

pakalaaaa

rancid quarry
#

i cant find the link a

quaint wagon
rancid quarry
quaint wagon
#

niiiiii

pseudo smelt
#

nasa

latent zodiacBOT
#

Reminder for @quaint wagon

Reminder from YAGPDB

o alasa e lon lon

pseudo smelt
#

alasa li pini (anu seme)

quaint wagon
#

a, pini

verbal marten
#

taso kulupu ni pi nimi "lon" li tan seme a‽

quaint wagon
verbal marten
#

a sona nasa

pseudo smelt
#

a

thick gust
#

turns out they're playing some sort of chess

#

sona musi a

merry walrus
#

lmao

mild glacier
#

i have an idea now

#

imma make chess :D

plucky gazelle
verbal marten
#

toki. mi Sam. mi li pana wile en sona ijo toki pona.

mild glacier
#

sina sona ala sona?

icy turret
#

have we ever talked about the tenpo ni la dropoff

quaint wagon
# icy turret

a lot of words and phrases exhibit a significant change in usage starting in 2020, and while it's pretty easy to see the correlation with the pandemic, there's no telling what exactly that implies about the language
i would say something like, a sudden increase is skillful conversation? tenpo ni is a pretty simple and even first day of learning sort of construction used for basic conversation
try contrasting with "tenpo ni la sina pali"?

icy turret
#

but yeah

#

this is a similar dropoff but slower imo

#

oh fuck nvm, wrong smoothing

#

similar yeah

meager steeple
icy turret
#

a a mi kala.

wicked stratus
#

a a mi kala

wicked stratus
icy turret
placid crow
verbal marten
#

ma siko pi toki siko

low timber
#

so it’s lipu Wesi, but ilo Siko

quaint wagon
#

@icy turret

#

10 smoothing, because it's much more apt here

icy turret
quaint wagon
icy turret
#

awesome

quaint wagon
#

once again asking for any archived irc conversations lmao

#

they certainly don't exist beyond the few well known examples, sadly

icy turret
quaint wagon
#

i don't know tbh
i think it was though

#

I saw a 2017 fb post where kipo referred to that corpus with a lot more detailed information than anyone but its author would have

icy turret
#

whats responsible for higher traffic in 2007 and 2010 btw

quaint wagon
#

no clue!

#

this is another instance where i spent a long time running tests and manual queries to be sure i wasn't making some error

#

for 2010, perhaps the newness of the forum itself?

#

it came out in Oct 2009

wicked stratus
low timber
#

a

meager steeple
#

seme la ona li jan ala

quaint wagon
meager steeple
#

aaaaa sona

low timber
#

kapesi..

#

nasa a

#

kapilu

#

nimi kapesi en nimi kapilu li suli lon tenpo sama

#

nimi kap- li wawa lon sike 2021

#

n, nimi kapa ala

plush gust
#

definitely been done before, but i find it very interesting that people are saying "o kama pona" more now vs. just "kama pona"

jan mute pi tenpo pini li toki e "kama pona" tan seme? mi la "o kama pona" makes more sense. maybe it's a difference between announcing somebody has "come well" vs. telling somebody to "well come" but msa

low timber
#

my uneducated guess is that people are trying to steer clear of calques, and kama pona looks like an english calque for "well come"
perhaps "o kama pona" is more tokiponalike. though i don't see a problem with "kama pona" being an exclamation- though nanpa tu la it could be considered a lexicalization

i don't know

plush gust
thick gust
#

interesting

low timber
#

something happened mid 2020

plush gust
low timber
#

but this is relative data, right?

plush gust
#

lon

low timber
#

so that shouldn't affect the words relative to each other

#

maybe the absolute data

plush gust
#

taso toki pona li ante mute lon tenpo ni

low timber
#

ken

plush gust
#

with the influx of people, nasins ante'd mute

low timber
#

i arrived some time after that, so i don't know how different the community was before then

plush gust
#

and specifically phrases like "kama pona" were popular because new people were coming lots

low timber
#

hmm but i'm not sure why that would have caused people to prefer one or the other

plush gust
#

mi kin li sona ala

icy turret
#

overall "monthly volume of toki pona" seen by ilo Muni:

#

there are high plateaus in 2007 and 2010 and we don't know what they correspond to, as of now

#

so 2017 represents the first time toki pona escapes this average level of activity, by virtue of reddit + discord

#

unless, ofc, the communities not yet in muni (like obvs all the irc chat logs which may or may not exist) reshape this graph starkly

#

@quaint wagon any reason it goes back to march 2002 instead of august 2001 btw?

#

in the sparse data pre-2017, mi + sina vs li show some amount of inverse relation, which to me sounds alternating between mostly chatting (hence 1st 2nd person) and prose (hence 3rd person)

#

whereas by now the community is large enough that the movement cancels out

plucky gazelle
#

🔵 mi + sina 🔴 li

icy turret
#

..right fair

#

do we have any known posts from even earlier?

#

regardless of ilomuniability

quaint wagon
#

idk its exact active years but it may make a difference in the 2014-2019 range

river shard
#

facebook groups is a pain
the API didn't work back when facebook had it normal for groups, and now they restricted it further

icy turret
#

i look forward to mKS singlehandedly creating a working facebook scraper

quaint wagon
#

unfortunately, the first is proprietary

#

altho granted my work wasn't singlehanded on that first one but it was A Lot

icy turret
#

interesting

#

btw honrstly

#

like

#

is the reason for not using the proprietary one legal or academic (repeatability)

#

cause the contemporary fb group probably generates a negligible ampunt of traffic and having just a dunp that never gets updated is sufficient for muni

quaint wagon
#

I can't use it for private purposes, and nobody else would have access to it

icy turret
#

alright fair

fresh sentinel
#

it's repeatable because someone else can just write another facebook scraper /musi

icy turret
#

anu-predicate

quaint wagon
#

nimi o li lon poka la seme?

icy turret
#

a pakala mi

wide pawn
#

ken la o lukin e "anu ala e" kin?

icy turret
quaint wagon
#

tan.... seme

#

mi ken LUKIN A e ni: ILO LI PILIN IKE TAN MUTE NIMI

hidden tide
#

i'll just post screenshots

#

first off, the absolute scale! you can see how toki pona has grown over time

#

(i am interested to know what occurred the first half of 2023 where there is a notable trough)

icy turret
hidden tide
#

it's more visible in the absolute entropy scale

quaint wagon
#

first recorded usage of the entropy scale

hidden tide
#

the entropy scale is very fun

quaint wagon
# hidden tide (i am interested to know what occurred the first half of 2023 where there is a n...

kulupu lawa pi ma ni (en mi) li pini e tomo ale ni:
#toki-moku #toki-nanpa #kalama-tpt #nasin-tpt #tpt-ale-kin

mi pini e ona tan ni: mi lukin e toki mute pi nasin ni:
1: "toki!"
2: "toki, sina pona ala pona?"
1: "mi pona. sina seme"
ni la toki li kama pini. jan li toki wawa ala li toki lon weka tenpo la ona li kama ala toki suli.
mi pini e tomo la mi pilin e ni: tenpo toki li kama lili, tan ni: jan li kama lukin e toki pi jan ante lon tenpo pona!

taso! mi kama sona e ni: lon la jan li wile e ijo toki. ni li kama e toki wawa mute. mi weka e tomo pi ijo toki la jan o toki lon seme? ala a. ni la ona li kama toki ala.

#

(sina ken ala lukin e tomo la ona ale li pini. o kama jo e poki pi lukin pini.)

#

a. mi pana ike e ona tu. taso, tomo pi nasin sama li lon tenpo pi weka tomo li pini.

hidden tide
#

sona!

quaint wagon
#

sina wile pali e tomo toki e kulupu la, o pana e ijo toki!

hidden tide
#

ijo toki li pona

hidden tide
quaint wagon
#

wan. tu. tu wan. tu tu. luka. luka wan. luka tu.
ona li ni li pini lon poka pi mute ale. (ni li nanpa).

hidden tide
#

nanpa mute

#

sitelen ni la, sina ken lukin e suli pi kama suli nimi lon tenpo. linja walo sewi li nimi "pu." jan li toki mute ala e nimi ni la ona li sewi tawa nanpa wan.

#

mi supa ale e sitelen nasa la sina ken lukin e kama suli pi toki pona lon tenpo

meager steeple
#

nasin a

quaint wagon
#

nasa, toki pini pi wile sona li kama suli a lon tenpo

devout jettyBOT
#

tenpo wan la mi lukin e toki pi tenpo weka li kama sona e ni: sike pini la toki "anu seme" li nasa lili tawa ijo Osuka. ni li nasa tawa mi! toki ni li suli a tawa nasin mi toki

#

toki ona la ona li kepeken toki "anu seme" lon tenpo pi mute lili taso li pilin e ni: toki "X ala X" li pona nanpa wan

serene hollow
#

oh my god 😭

fresh sentinel
#

is this when kijetesantakalu was coined?

serene hollow
#

thats the record for the usage of "suwi" from august of 2001 to aug. 2024

fresh sentinel
#

the hovered spike I mean

icy turret
devout jettyBOT
#

what happened 😭

ilo musi Anjelita ↩️

[Reply to:](#1272180068721889290 message) oh my god 😭 📎

#

I don't think there's much data for that time period, so maybe someone used suwi a bunch and it became a significant percentage of all toki pona recorded that month

serene hollow
serene hollow