FSRS Megathread | Anki | Page 9

clever cargo Apr 8, 2025, 3:42 PM

#

you can give him write acess to your fork

unique salmon Apr 8, 2025, 3:42 PM

#

how

cosmic hedge Apr 8, 2025, 3:43 PM

#

I'd rather just pr it myself and deal with it that way 😭

unique salmon Apr 8, 2025, 3:43 PM

#

cosmic hedge I'd rather just pr it myself and deal with it that way 😭

I'll close mine then

clever cargo Apr 8, 2025, 3:43 PM

#

svelte not being easily modified by addons is why this needs to be discussed more than just one person being fine with it

sick moth Apr 8, 2025, 3:43 PM

#

Learn to git patch

cosmic hedge Apr 8, 2025, 3:43 PM

#

Tbf i do see a fair number of people ask about it.

clever cargo Apr 8, 2025, 3:44 PM

#

a note in the help modal then?

#

my gripe is "the first review ever made" is too broad

cosmic hedge Apr 8, 2025, 3:44 PM

#

clever cargo a note in the help modal then?

Its there already i think

sick moth Apr 8, 2025, 3:44 PM

#

Can't you have it show "N/10000 cards included"

unique salmon Apr 8, 2025, 3:45 PM

#

clever cargo a note in the help modal then?

Not "BRIGHT NEON COLORS AND BIG TEXT FLYING STRAIGHT INTO THE USER'S FACE" enough

clever cargo Apr 8, 2025, 3:45 PM

#

sick moth Can't you have it show "N/10000 cards included"

that would be nice

sick moth Apr 8, 2025, 3:45 PM

#

"-is:suspended" caught me out once

cosmic hedge Apr 8, 2025, 3:45 PM

#

it would be more complicated though

bold terrace Apr 8, 2025, 3:45 PM

#

"Ignore cards reviewed before"
... And all the subsequent reviews, their offspring, all their heritage, burn them to the ground as they never existed in the first place

#

🔥

cosmic hedge Apr 8, 2025, 3:46 PM

#

bold terrace "Ignore cards reviewed before" ... And all the subsequent reviews, their offspri...

thats one way to get the point across 😂

bold terrace Apr 8, 2025, 3:46 PM

#

Ignore cards (and their future reviews) for cards reviewed before could also fit, but maybe too long

unique salmon Apr 8, 2025, 3:46 PM

#

A lot of people don't even realize that you can click on settings to see a help text thingy

clever cargo Apr 8, 2025, 3:46 PM

#

in that case, the naming itself is misleading

unique salmon Apr 8, 2025, 3:47 PM

#

clever cargo Apr 8, 2025, 3:47 PM

#

oh its currently alr "ignore cards reviewed before"

unique salmon Apr 8, 2025, 3:47 PM

#

Ok, maybe not that many

bold terrace Apr 8, 2025, 3:48 PM

#

clever cargo oh its currently alr "ignore cards reviewed before"

yeah technically it's correct, but you have a false sense of "it will only ignore the reviews"

clever cargo Apr 8, 2025, 3:48 PM

#

unique salmon

they're not very discoverable at the moment

#

maybe that would be better addressed than going for the nuclear neon option all the time

clever cargo Apr 8, 2025, 3:49 PM

#

clever cargo they're not very discoverable at the moment

i say that, but as im looking at deck options, there's a ❓ in the corner of each section, and hovering changes the cursor

bold terrace Apr 8, 2025, 3:51 PM

#

Yeah but Help menu should be there to go "deeper" in knoweldge, not to get the knowledge right

unique salmon Apr 8, 2025, 3:51 PM

#

sick moth Can't you have it show "N/10000 cards included"

Would that be easier to implement than "first review ever" or "first review of a card in this preset"?

#

The warning being displayed only if the selected date > date of the first review, I mean

clever cargo Apr 8, 2025, 3:51 PM

#

bold terrace Yeah but Help menu should be there to go "deeper" in knoweldge, not to get the k...

depends on how easily understandable and accurate the setting's title is

cosmic hedge Apr 8, 2025, 3:52 PM

#

unique salmon Would that be easier to implement than "first review ever" or "first review of a...

everything involving reviews would mean that you would have to go through every review in a preset everytime you changed a setting just for a tooltip

#

i can see the red pr closed symbol now 😔

bold terrace Apr 8, 2025, 3:53 PM

#

"Ignore Cards introduced before" ?

cosmic hedge Apr 8, 2025, 3:53 PM

#

i mean i guess you could cache the first review of every card every time you open the window

unique salmon Apr 8, 2025, 3:53 PM

#

cosmic hedge everything involving reviews would mean that you would have to go through every ...

Then just make the warning appear if the date is non-default

bold terrace Apr 8, 2025, 3:53 PM

#

With ""Ignore Cards introduced before", you would not risk deviating the focus from the "card" to the "review"

unique salmon Apr 8, 2025, 3:53 PM

#

bold terrace With ""Ignore Cards introduced before", you would not risk deviating the focus f...

"Introduced" isn't obvious

clever cargo Apr 8, 2025, 3:54 PM

#

unique salmon Then just make the warning appear if the date is non-default

again, the default is 1970.

cosmic hedge Apr 8, 2025, 3:54 PM

#

i mean its an improvement

unique salmon Apr 8, 2025, 3:54 PM

#

I imagine some people will think it means "Created before that date" and some people will think "u wot m8"

bold terrace Apr 8, 2025, 3:54 PM

#

"Ignore Cards learned before" ?

cosmic hedge Apr 8, 2025, 3:54 PM

#

clever cargo again, the default is 1970.

why is that a problem?

clever cargo Apr 8, 2025, 3:54 PM

#

so having anything in that field would differ from the default

bold terrace Apr 8, 2025, 3:54 PM

#

"Learning" is a "core" word, should do the trick

cosmic hedge Apr 8, 2025, 3:55 PM

#

yeah but if they dont change it from the default it wont match any cards right?

clever cargo Apr 8, 2025, 3:55 PM

#

every review ever was done after 1970

cosmic hedge Apr 8, 2025, 3:55 PM

#

yeah so if its not changed from 1970 its not going to match any cards?

#

so no one needs warning

bold terrace Apr 8, 2025, 3:55 PM

#

@clever cargo you start to reach the point of : "Finding problems for the sake of finding ones" 😛

clever cargo Apr 8, 2025, 3:55 PM

#

bold terrace <@624224573176545288> you start to reach the point of : "Finding problems for th...

what?

bold terrace Apr 8, 2025, 3:56 PM

#

I mean, the fact the default is 1970 doesn't cause really an issue for showing the highlight when the option change no 🙂 ?

robust hill Apr 8, 2025, 3:57 PM

#

what if we just execute the user whenever theres an error on their behalf

clever cargo Apr 8, 2025, 3:57 PM

#

if its filled with any date past the first review, then its going to show a warning

bold terrace Apr 8, 2025, 3:57 PM

#

bold terrace "Ignore Cards learned before" ?

or maybe "Learnt" for UK-blokes

clever cargo Apr 8, 2025, 3:57 PM

#

which imo is too broad, given how many decks a user can have

cosmic hedge Apr 8, 2025, 4:02 PM

#

@unique salmon you need to fill it out with text first but yeah it does appear

#

above where it should but it's still there

cursive badge Apr 8, 2025, 4:03 PM

#

I'm still spooked by the Anki build system. I saw that there was a custom rust program that generated ninja files and went: "Ok, I'm not touching that Gordian knot until I really have to."

clever cargo Apr 8, 2025, 4:08 PM

#

bold terrace <@624224573176545288> you start to reach the point of : "Finding problems for th...

well, i can always open a pr in that case

quiet saddle Apr 8, 2025, 4:19 PM

#

cursive badge I'm still spooked by the Anki build system. I saw that there was a custom rust p...

for me it's the ping-pong game between the python code and the rust backend that throws me off. I may have misunderstood, but it seems the rust code instanciates the rust backend through the python bidding which sounds weird. I plan on digging a little bit more.

cursive badge Apr 8, 2025, 4:21 PM

#

quiet saddle for me it's the ping-pong game between the python code and the rust backend that...

I just start at the .proto files then work my way to python/rust from that.

quasi shadow Apr 8, 2025, 4:21 PM

#

just get used to it😅

#

I spent nearly one month to understand the framework of Anki codebase.

clever cargo Apr 8, 2025, 4:22 PM

#

still have no idea how the scheduler works

quasi shadow Apr 8, 2025, 4:22 PM

#

which part?

clever cargo Apr 8, 2025, 4:22 PM

#

arthur's fork has a good explanation on queues

clever cargo Apr 8, 2025, 4:22 PM

#

quasi shadow which part?

everything but queues 😅

#

will read up on it more

quasi shadow Apr 8, 2025, 4:23 PM

#

https://github.com/ankitects/anki/blob/ccab18b7ba624d888f3d881e14f04c830e3eaa44/rslib/src/scheduler/answering/mod.rs#L310

GitHub

anki/rslib/src/scheduler/answering/mod.rs at ccab18b7ba624d888f3d88...

Anki's shared backend and web components, and the Qt frontend - ankitects/anki

#

it is the rabbit hole.

#

you will be suprised how deep it is

quiet saddle Apr 8, 2025, 4:45 PM

#

quasi shadow I spent nearly one month to understand the framework of Anki codebase.

I'm not surprised, from my first impression I'll have to spend at least that much too before coding anything there.

polar maple Apr 8, 2025, 4:50 PM

#

@quasi shadow how about modifying FSRS so that the first rating determines a resulting fixed decay (1 -> 0.5, 3 -> 0.2)? A problem would be that the parameters would be mixed up having to support multiple forgetting curves so maybe an alternative evaluation could be: you train FSRS with a decay 0.2 and evaluate only on cards with first rating = 1, and then do the same thing with decay 0.5 and evaluate on only cards with first rating = 1

#

also good news, weighted exponential curves performs similarly to power curves. I'll work on some plots to see what the curves look like later

unique salmon Apr 8, 2025, 4:55 PM

#

polar maple <@449662392314494987> how about modifying FSRS so that the first rating determin...

I think that's too harsh if decay can then never change. It places too much weight on the first review

#

And considering that people have all kinds of weird habits and make entire threads about "What button do you press for the first review?", this really doesn't seem like a good idea

#

Making decay depend on D would be interesting, but that didn't work

polar maple Apr 8, 2025, 4:57 PM

#

unique salmon I think that's too harsh if decay can then never change. It places too much weig...

this is more to test jarrett's observation that maybe first rating = 1 would benefit from a higher decay

tepid spoke Apr 8, 2025, 4:58 PM

#

this graph is quite concerning. Why does it just go up and up oO

#

if I make it 1000 days, it does this. Which makes no sense to me

cursive badge Apr 8, 2025, 4:59 PM

#

You run out of new cards around Nov/Dec 2026?

tepid spoke Apr 8, 2025, 4:59 PM

#

I run out of new cards in two weeks

#

The graph indeed looks like all cards are new

unique salmon Apr 8, 2025, 5:00 PM

#

How many cards do you have in that preset?

tepid spoke Apr 8, 2025, 5:00 PM

#

~18000

unique salmon Apr 8, 2025, 5:01 PM

#

That's 720 days until you run out of new cards, so around 2 years

#

@quasi shadow please investigate this, it seems that the simulator treats all cards as new

tepid spoke Apr 8, 2025, 5:01 PM

#

It's also just plain wrong even for tomorrow, I'll have ~350 reviews, not 150-200

#

This must be some artifact from me splitting my one big deck into a huge number of sub and sub-sub decks

#

but all decks use the same preset, so it shouldn't matter for the Simulator. Or so I thought.

cosmic hedge Apr 8, 2025, 5:04 PM

#

tepid spoke The graph indeed looks like all cards are new

would you mind sending me your collection?

tepid spoke Apr 8, 2025, 5:05 PM

#

I can export that deck with scheduling info.

#

The whole collection is rather big

cosmic hedge Apr 8, 2025, 5:05 PM

#

yeah ok

ashen light Apr 8, 2025, 5:07 PM

#

cursive badge I'm still spooked by the Anki build system. I saw that there was a custom rust p...

the joys of multi-language programs

tepid spoke Apr 8, 2025, 5:08 PM

#

@cosmic hedge

📎 WK3_FSRS_TEST_EXPORT_NO_MEDIA.apkg

cosmic hedge Apr 8, 2025, 5:09 PM

#

tepid spoke <@388069992660205588>

i was expecting you to DM me but so long as you're comfortable thats fine XD

#

idk i guess im guarded with my own decks for some reason XD

ashen light Apr 8, 2025, 5:10 PM

#

also re: leeches and tagging, I was fully thinkinking leech was gonna be its own card prop rather than a tag or whatev, is:leech type stuff

ashen light Apr 8, 2025, 5:11 PM

#

cosmic hedge idk i guess im guarded with my own decks for some reason XD

we will judge you as a person based on how you do anki

tepid spoke Apr 8, 2025, 5:12 PM

#

I hope that exported fine. Didn't test exporting since the Subdeck-Inflation

unique salmon Apr 8, 2025, 5:14 PM

#

ashen light also re: leeches and tagging, I was fully thinkinking leech was gonna be its own...

If it's less problematic to implement that way, sure

ashen light Apr 8, 2025, 5:16 PM

#

I just think its better in general, not even about problematicness or not

#

(it also allows both leech types to exist at the same time, assuiming addons or whatev care about leech tags)

unique salmon Apr 8, 2025, 5:17 PM

#

ashen light (it also allows both leech types to exist at the same time, assuiming addons or ...

Do you plan to make the old leech detector keep using tags?

#

I'd rather both the new and the old detector do the same thing, for consistency

ashen light Apr 8, 2025, 5:19 PM

#

I figured old method would be removed

#

"leech after N fails" is a shitty metric by every metric 🍃

tepid spoke Apr 8, 2025, 5:21 PM

#

Do leeches un-leech after a while at the moment?

ashen light Apr 8, 2025, 5:21 PM

#

nope!

#

theres currently no unleeching mechanic

tepid spoke Apr 8, 2025, 5:21 PM

#

Then I have a surprisingly low amount of them

ashen light Apr 8, 2025, 5:21 PM

#

I had leeches but then I turned up the leech count to like 1000 so of course I never have any

#

it just felt not-useful

#

¯_(ツ)_/¯

cosmic hedge Apr 8, 2025, 5:22 PM

#

tepid spoke this graph is quite concerning. Why does it just go up and up oO

A lot of your cards are missing memory states. This is what the simulator looks like after slightly changing one of your parameters, saving and trying it again.

tepid spoke Apr 8, 2025, 5:22 PM

#

But why? The simulator worked fine not too long ago

cosmic hedge Apr 8, 2025, 5:22 PM

#

🤷‍♂️

tepid spoke Apr 8, 2025, 5:22 PM

#

And what does "missing memory states" actually mean?

cursive badge Apr 8, 2025, 5:23 PM

#

It is very annoying how they only apply to notes. I have considered duplicating my notes so there is only 1 card per note just so the leech tag is useful.

ashen light Apr 8, 2025, 5:23 PM

#

cursive badge It is very annoying how they only apply to notes. I have considered duplicating ...

oh this is probably why I basically turned the feature off

#

I knew there was a reason I just forgot

#

yeah note-level tagging of leeches is actually useless

cosmic hedge Apr 8, 2025, 5:24 PM

#

tepid spoke And what does "missing memory states" actually mean?

when you rate a card for the first time it calculates its memory state (stability, difficulty etc.) from the cards history. It then saves that and uses it for future reviews.

#

a lot of your cards are missing that save

tepid spoke Apr 8, 2025, 5:24 PM

#

That's so odd

#

how would that happen

#

And nudging the parameters causes a global re-calculation?

cosmic hedge Apr 8, 2025, 5:25 PM

#

yeah thats what the progress bar that appears after you hit "save" is showing you the progress of

tepid spoke Apr 8, 2025, 5:25 PM

#

never saw that, guess my PC is too fast or something :D

cosmic hedge Apr 8, 2025, 5:25 PM

#

suffering from success 😔

cursive badge Apr 8, 2025, 5:26 PM

#

Could an addon have clobbered the custom card data maybe? If I remember correctly all the FSRS state is stored in there.

cosmic hedge Apr 8, 2025, 5:27 PM

#

cursive badge Could an addon have clobbered the custom card data maybe? If I remember correctl...

used to be back in the js scheduler days

cursive badge Apr 8, 2025, 5:27 PM

#

Maybe I'm thinking of revlogs 😕

clever cargo Apr 8, 2025, 5:27 PM

#

its got its own memory state field now

tepid spoke Apr 8, 2025, 5:27 PM

#

I wrote a helper-addon to split the deck into subdecks

#

but all that does is find the cards, and call mw.col.set_deck on them

cosmic hedge Apr 8, 2025, 5:27 PM

#

tepid spoke I wrote a helper-addon to split the deck into subdecks

thats it

#

every time you move a card it gets its memory state erased

clever cargo Apr 8, 2025, 5:28 PM

#

cosmic hedge used to be back in the js scheduler days

but is there any point to the addon's "v=reschedule"?

tepid spoke Apr 8, 2025, 5:28 PM

#

How do I re-generate it from within the addon? :D

clever cargo Apr 8, 2025, 5:28 PM

#

or just holdover from the old days?

cosmic hedge Apr 8, 2025, 5:28 PM

#

clever cargo but is there any point to the addon's "v=reschedule"?

no idea XD

unique salmon Apr 8, 2025, 5:30 PM

#

ashen light I figured old method would be removed

Not immediately. For now it will be a toggle (read my word document)

ashen light Apr 8, 2025, 5:31 PM

#

eventually, obviously

#

but maybe maybe this method is so much better dae would just be ok with it being removed

#

🍃

unique salmon Apr 8, 2025, 5:33 PM

#

Btw jake, read the comments below this: https://forums.ankiweb.net/t/automated-leech-detection/56887/16?u=expertium
Another user also proposed not using tags/flags, but I don't think it will work

Anki Forums

Automated leech detection

If a card is a leech, it will be failed more often than FSRS predicts. That’s how we define leeches with the new detector. So yes, the probability of recall will be higher at higher DR, but since leeches have a lower p(recall) than FSRS predicts, they will be failed more often. So depending on how much lower it is exactly, it’s possible that...

#

I epxlained the issues here
https://forums.ankiweb.net/t/automated-leech-detection/56887/26?u=expertium

Anki Forums

Automated leech detection

So the UI would display a pop-up based on the p-value stored in card info, and when searching prop:is_leech, Anki would convert it to prop:p<0.01, correct? Actually, no, that still won’t work. In order to reduce the amount of time the leech status changes, in my specification of the detector I wrote that we should use two thresholds. If pthre...

ashen light Apr 8, 2025, 5:35 PM

#

is it bad if a handful of on-the-edge cards bounce back and forth between being a leech and not?

cursive badge Apr 8, 2025, 5:36 PM

#

ashen light is it bad if a handful of on-the-edge cards bounce back and forth between being ...

I think that actually happened quite often in my prototype.

ashen light Apr 8, 2025, 5:37 PM

#

but is it bad?

#

why is it a problem

tepid spoke Apr 8, 2025, 5:37 PM

#

it seems unavoidable

#

I have a bunch of cards that are part-time leeches like that

cursive badge Apr 8, 2025, 5:37 PM

#

🤷‍♂️

tepid spoke Apr 8, 2025, 5:38 PM

#

I'm honestly not sure how I should rate some cards. Like, how much "off-ness" I should tolerate

cursive badge Apr 8, 2025, 5:39 PM

#

cosmic hedge used to be back in the js scheduler days

Ok, I see why I was confused. It's all packed in the data column, but custom data is its own thing (presumably nested in the data column).

tepid spoke Apr 8, 2025, 5:40 PM

#

this looks indeed much more reasonable

cursive badge Apr 8, 2025, 5:41 PM

#

Interestingly DR is stored per-card which suggests you could get weird and do different DRs per-card instead of per-preset if you wanted.

cosmic hedge Apr 8, 2025, 5:43 PM

#

cursive badge Interestingly `DR` is stored per-card which suggests you could get weird and do ...

if i had to guess i'd say its probably so it doesn't have to check the preset every review?

unique salmon Apr 8, 2025, 5:43 PM

#

ashen light why is it a problem

User experience, man

#

People will be like "Why is my card going from 'leech' to 'not a leech' so often?"

cursive badge Apr 8, 2025, 5:43 PM

#

cosmic hedge if i had to guess i'd say its probably so it doesn't have to check the preset ev...

I assume so too, but it opens the door to tomfoolery ;p

ashen light Apr 8, 2025, 5:44 PM

#

"because you keep passing then failing it" ez

unique salmon Apr 8, 2025, 5:44 PM

#

So we have to do the bullshittery with two thresholds or with updating the status only after every N reviews

ashen light Apr 8, 2025, 5:45 PM

#

"my leech-p is above theshold_1 but it still marked as a leech, what gives?" - equally nonsensical complaint the other direction

cosmic hedge Apr 8, 2025, 5:45 PM

#

cursive badge I assume so too, but it opens the door to tomfoolery ;p

I always thought an "expected retention" row to the true retention table might be handy. You know like the average DR for every card in the search.

unique salmon Apr 8, 2025, 5:46 PM

#

ashen light "my leech-p is above `theshold_1` but it still marked as a leech, what gives?" -...

We won't expose thresholds or any other stuff to users

#

The detector will be a black box

ashen light Apr 8, 2025, 5:46 PM

#

that is super lame

unique salmon Apr 8, 2025, 5:46 PM

#

The only thing we will show is p(leech)

#

Well, and the leech status as a binary variable

ashen light Apr 8, 2025, 5:46 PM

#

why is it p(leech) if its a bool

unique salmon Apr 8, 2025, 5:47 PM

#

I mean that we will show both the probability and the binary leech/not a leech label

ashen light Apr 8, 2025, 5:47 PM

#

this feature is boring now man

unique salmon Apr 8, 2025, 5:47 PM

#

Power users can search for p_leech in the browse window

ashen light Apr 8, 2025, 5:47 PM

#

if we show the probability then my complaint will be a thing

#

"my leech-p is above theshold_1 but it still marked as a leech, what gives?" - equally nonsensical complaint the other direction
exists in any situation p(leech) is shown

lapis hearth Apr 8, 2025, 5:50 PM

#

And this

ashen light Apr 8, 2025, 5:52 PM

#

trends can be an addon, 0% chance it'll be in anki proper

cursive badge Apr 8, 2025, 5:53 PM

#

cosmic hedge I always thought an "expected retention" row to the true retention table might b...

It's a cool idea, but it has problems:

We are very limited on space. It could be different for Young & Mature which means 1-3 extra columns
We do not have a history of DR so it could be wildly wrong for "Last Week" etc. if you changed DR at any point.

cosmic hedge Apr 8, 2025, 5:53 PM

#

cursive badge It's a cool idea, but it has problems: - We are very limited on space. It could ...

ahh i didnt do it myself because of the space issue but i think the history of dr thing kinda ruins it

sick moth Apr 8, 2025, 6:02 PM

#

unique salmon The warning being displayed only if the selected date > date of the first review...

A search is pretty much instant, unless I'm forgetting something

unique salmon Apr 8, 2025, 6:05 PM

#

sick moth A search is pretty much instant, unless I'm forgetting something

@cosmic hedge

cosmic hedge Apr 8, 2025, 7:12 PM

#

sick moth A search is pretty much instant, unless I'm forgetting something

~~It's not as simple as "introduced:x". It also counts cards which were forgotten and then re-introduced after the date.~~
I think we can write some simple sql for this?

SELECT count(DISTINCT cid) FROM revlog
WHERE id > ignore_before AND type == 0
```?

#

https://github.com/ankitects/anki/blob/ccab18b7ba624d888f3d881e14f04c830e3eaa44/rslib/src/scheduler/fsrs/params.rs#L323 it's if first_of_last_learn_entries is before the cuttoff then its ignored

lapis hearth Apr 8, 2025, 10:05 PM

#

ashen light trends can be an addon, 0% chance it'll be in anki proper

No way you thought all this would be in anki proper😭 😭 😭 dae would set himself on fire before he would let that happen

#

unique salmon Apr 8, 2025, 10:38 PM

#

Right now we can't even decide on the specifics of the leech detector itself 😅

#

Oh, btw, I feel like I should clarify precisely what p(leech) means, statistically. I don't think I've explained this clearly before
With this detector, p(normal) aka 1-p(leech) can be interpreted as "Probability of observing this many or fewer successful reviews, assuming that probabilities given by FSRS are the true probabilities of recall", in other words, assuming that FSRS can predict the probability of recall perfectly accurately

#

It's a p-value for a one-sided statistical significance test where the null hypothesis is "The true probabilities are [whatever numbers FSRS predicted]"

#

So if the p-value is low, it means that it's very unlikely that we would see these outcomes if the probabilities predicted by FSRS were the true probabilities of recall (for this card)

lapis hearth Apr 8, 2025, 10:51 PM

#

You have made it more confusing for us simpletons

#

Okay couple of brain strokes later, I am beginning to understand it

unique salmon Apr 8, 2025, 10:52 PM

#

Basically, low p(normal) aka high p(leech) means that FSRS sucks at predicting probabilities of recall

lapis hearth Apr 8, 2025, 10:53 PM

#

So high p(leech) means card is so difficult that FSRS scheduling is useless and you need to find something else to help you recall it

#

So basically in other words, a leech

#

No amount of scheduling would help make this leech unleech

#

That makes sense

unique salmon Apr 8, 2025, 10:56 PM

#

lapis hearth No amount of scheduling would help make this leech unleech

Mmm, not exactly. More like "No amount of reviews will help FSRS accurately predict probabilities for this card"

#

Alright, with that out of the way, we need to decide on 2 things:

Tags/flags/custom data in card info?
Do we do it the simple way with only one threshold and checks after every review, and if the card keeps bouncing back and forth between being a leech and not being a leech - we say "it's not a bug, it's a feature"; or, alternatively, do we do it the complicated way with two thresholds or checks only every 2/3/4 reviews so that cards don't change their status too often?
@ashen light @cursive badge @cosmic hedge

ashen light Apr 8, 2025, 10:59 PM

#

tags are note-level (as opposed to card-level), making them a non-option. cards can only have 1 flag at a time, also making it not an option

#

a custom leech attr on card is the only reasonable thing

#

very much against checking every N reviews, means we need to keep track of an extra attr

unique salmon Apr 8, 2025, 11:02 PM

#

We can do two thresholds, though that creates another problem: if we show p(leech) or p(normal), some cards can end up counted as leeches and some not, despite at present having the same p(leech)

#

For example, if the first threshold is 5% and the second one is 25%, and p(normal) is 10%, whether it’s a leech or not depends on whether it has crossed the first threshold before or not. If it has crossed it before, it’s a leech, otherwise it’s not a leech.

ashen light Apr 8, 2025, 11:02 PM

#

people are gonna invent problems to complain about no matter which option is done

#

the bounce back and forth strat has an easier implementation

unique salmon Apr 8, 2025, 11:05 PM

#

Well, guess it's new data in card info + simple method then

#

Now the real question - who's gonna implement it?

ashen light Apr 8, 2025, 11:06 PM

#

I'm just annoyed I gotta do the math thing

#

is there an easy off the shelf equation I can grab from a standard stats library

unique salmon Apr 8, 2025, 11:06 PM

#

Oh come one, math is the easy part
#1282005522513530952 message

ashen light Apr 8, 2025, 11:06 PM

#

why you gotta do this fuckin tryhard poisson binomial thing literally nothing implements

#

ok port it to rust for me

#

protip: that ai version wasn't going to work

unique salmon Apr 8, 2025, 11:07 PM

#

ashen light protip: that ai version wasn't going to work

Why?

#

Like, "it bugs out and spits nonsense" doesn't work?

ashen light Apr 8, 2025, 11:07 PM

#

because it assumed rust vec's behave like numpy dataframes

#

I didn't run it but at a glance it wasn't gonna do what you wanted

#

for example pmf[j] = pmf[j] * (1.0 - prob) + pmf[j-1] * prob; the way its implemented in pmf[j] * (1.0 - prob) pmf[j] will always be zero so the first half of that equation is always 0

#

and so....is not gonna do what you want

#

unless we want to do that calculation for fun for some reason

#

I looked over it enough to see an obvious problem then just didn't give it any more thought

unique salmon Apr 8, 2025, 11:09 PM

#

https://play.rust-lang.org/?version=stable&mode=debug&edition=2024
Looks good to me

Rust Playground

A browser interface to the Rust compiler to experiment with the language

#

ashen light Apr 8, 2025, 11:10 PM

#

you didn't actually link anything

unique salmon Apr 8, 2025, 11:10 PM

#

Unless my Python implementation is also somehow bugged and I didn't realize it

#

Just copy-paste it

#

Works fine for 90%, 90%. This is indeed what you get if you do the math by hand

ashen light Apr 8, 2025, 11:11 PM

#

hm

#

guess I'll just use it

#

¯_(ツ)_/¯

#

any problems I can just blame on you

unique salmon Apr 8, 2025, 11:12 PM

#

Apparently not

ashen light Apr 8, 2025, 11:12 PM

#

my point still stands though, why you gotta use some tryhard stats thing

unique salmon Apr 8, 2025, 11:13 PM

#

Because I don't see any other way

#

We can't just assume that FSRS always predicts the same probability of recall for obvious reasons

#

If FSRS always predicted the same probability of recall, we could use the good ol' binomial distribution
(except that in that case there would be no reason to use FSRS in the first place cause it would be fucking useless)

#

Poisson binomial is a generalization of the binomial distribution for when probabilities of success aren't always the same, like in a coin toss

#

I mean, I guess we could try to come up with something COMPLETELY different that isn't based on fancy probability distributions, but nah

ashen light Apr 8, 2025, 11:16 PM

#

my point more is I just wanted to pull in a library that I could hand an array and have it do the math for me 🍃

unique salmon Apr 8, 2025, 11:17 PM

#

Think of it as Claude making a library for you

#

And now you're happy!

ashen light Apr 8, 2025, 11:22 PM

#

nah

unique salmon Apr 8, 2025, 11:22 PM

#

Save complaining for later, for the actually annoying parts, such as:

~~Unleeching~~
Pop-ups for both leeching and unleeching
Recalculating leechiness every time FSRS parameters change

ashen light Apr 8, 2025, 11:23 PM

#

unleeching isn't even hard

#

3 is the only actually annoying thing here

#

(but it already does other stuff anyway, just gotta hijack that process)

tepid spoke Apr 8, 2025, 11:43 PM

#

I just realized why I have so few leeches lol

#

every time I sync the deck with WaniKani, it overwrites the tags

#

Not like it matters. Nothing I can do with the info of it being a leech anyway.

cosmic hedge Apr 9, 2025, 12:25 AM

#

sick moth A search is pretty much instant, unless I'm forgetting something

https://github.com/ankitects/anki/pull/3910 you're right XD

GitHub

Feat/Ignored before card count by Luc-Mcgrady · Pull Request #3910...

Continuation of #3907

Peek.2025-04-09.01-07.mp4

Displays the number of cards that will be ignored by ignore cards reviewed before to hopefully abate some confus...

polar maple Apr 9, 2025, 12:42 AM

#

unique salmon Mmm, not exactly. More like "No amount of reviews will help FSRS accurately pred...

i disagree with this interpretation, we cannot distinguish if we were just unlucky or if FSRS was wrong

quasi shadow Apr 9, 2025, 2:48 AM

#

How about that?

quasi shadow Apr 9, 2025, 2:48 AM

#

polar maple <@449662392314494987> how about modifying FSRS so that the first rating determin...

#

It's the notebook optimizer.

#

It has a detail evaluation which groups the reviews based on the last rating.

#

I guess the forgetting curve is sharper when the last rating is again.

#

#

Maybe it's better to save the raw data for further analyses.

polar maple Apr 9, 2025, 3:44 AM

#

#

#

@quasi shadow @unique salmon this version of RWKV uses a weighted sum of 128 exponential forgetting curves. Maybe we should make FSRS decay scale with S?

#

also the 1-day stability plot might be a bit inaccurate since RWKV uses elapsed seconds

quasi shadow Apr 9, 2025, 3:47 AM

#

Seems like the forgetting curve is flat when S is small and becomes sharper as the S increases?

polar maple Apr 9, 2025, 3:49 AM

#

perhaps even decay = 0.1 could be beneficial for small S

#

S = 1 -> 0.1
S = 30 -> 0.2
S > 100 -> 0.5
and maybe interpolate this in log space

#

this could also be what we are seeing with first rating = 1 since it tends to result in lower stability after all

quasi shadow Apr 9, 2025, 3:56 AM

#

🤔 wait

#

my observation is the forgetting curve is sharper with first rating=1.

polar maple Apr 9, 2025, 3:58 AM

#

whoops i mixed it up

#

there might be some weird behavior when changing decay, let R(t, S, decay) be the function that gets the retention given a certain time, stability, and decay.
Then we would ideally want S1 > S2 => f(t, S1, decay1) > f(t, S2, decay2) but if decay1 != decay2 then this can be broken

quasi shadow Apr 9, 2025, 4:00 AM

#

Yeah, I know.

#

It means forgetting curves with different S will intersect in certain T (T > 0).

#

#

be like

quasi shadow Apr 9, 2025, 6:18 AM

#

It's the distribution of trainable decay.

cursive badge Apr 9, 2025, 6:30 AM

#

unique salmon Alright, with that out of the way, we need to decide on 2 things: 1) Tags/flags/...

I think if a leech revamp does happen it would ideally involve new prop(s). Tagging notes has always been a half-baked solution and we cannot use flags in native Anki because it could clash with user flags. EDIT: I guess another solution would be letting cards have tags, but that is another feature in itself.
I don't know. I haven't touched it since my prototype a few weeks ago but I don't think that we are ready for a "black box" with no knobs for the user to twiddle. I know it would be terrible UX but I never felt "I would be happy to just run this on any dataset" when I was playing with my prototype.

I would take some more convincing before trying to implement it natively in Anki but if @ashen light is interested enough to call off his strike I'm not going to be too negative and interfere 😂 .

lapis hearth Apr 9, 2025, 6:30 AM

#

unique salmon Alright, with that out of the way, we need to decide on 2 things: 1) Tags/flags/...

I am still sticking to my idea of trends

#

I think knowing whether the card is leech or not is very helpful but evenso more helpful is knowing that whether what you are doing is helping you learn the card or not (whether you are on the right track)

quasi shadow Apr 9, 2025, 7:30 AM

#

#

OK, now I know how to improve the optimal retention feature.

#

we need increases the cost per review when the desired retention descreses.

polar maple Apr 9, 2025, 7:35 AM

#

there seems to be a lot of 60k+ as well which could represent way more than 60 seconds in reality

#

time where the user either gives up for a while or has to purposefully spend re-encoding the card into memory

cosmic hedge Apr 9, 2025, 7:39 AM

#

quasi shadow

This is weird to me because I thought the problem with CMRR was it went to 0.7 too often. Increasing the costs with higher DRs to offset this would make it even more likely to result in 0.7 right?

quasi shadow Apr 9, 2025, 7:40 AM

#

quasi shadow we need increases the cost per review when the desired retention descreses.

increases the cost per review when the desired retention descreses

#

it means the cost is larger when the DR is lower.

#

so it will increase the CMRR

cosmic hedge Apr 9, 2025, 7:42 AM

#

quasi shadow increases the cost per review when the desired retention descreses

Oh right 😅
I thought you were looking to counterract the effect in the simulator but forgot it doesnt exist in the simulator.

cosmic hedge Apr 9, 2025, 7:54 AM

#

quasi shadow increases the cost per review when the desired retention descreses

Would this effect be worth doing with retreivability instead of DR?

#

I know in the simulator they both end up being pretty much the same thing

quasi shadow Apr 9, 2025, 7:55 AM

#

cosmic hedge Would this effect be worth doing with retreivability instead of DR?

it's more accurate if there is a backlog.

quasi shadow Apr 9, 2025, 8:44 AM

#

optimal retention
before: 0.7143667819857166
after: 0.8377484029026208

#

#

😎

#

Just add this line

unique salmon Apr 9, 2025, 8:46 AM

#

cursive badge 1. I think if a leech revamp does happen it would ideally involve new prop(s). T...

I was happy with a double threshold + dividing thresholds by 1.4

quasi shadow Apr 9, 2025, 8:46 AM

#

with this, you will spend 20% more time per review when your desired retention is 70% instead of 90%.

unique salmon Apr 9, 2025, 8:46 AM

#

quasi shadow

Can we estimate this from the user's history?

quasi shadow Apr 9, 2025, 8:47 AM

#

unique salmon Can we estimate this from the user's history?

yes but it need to calculate the R for the history

unique salmon Apr 9, 2025, 8:47 AM

#

I'll take that as a "yes"

quasi shadow Apr 9, 2025, 8:48 AM

#

😂 Nope

unique salmon Apr 9, 2025, 8:48 AM

#

FeelsBadAnki

bold terrace Apr 9, 2025, 9:12 AM

#

IMO with the leech detection stuff, in practice I see a few elements that make it not as useful as I'd hope initially :

A lot of card flagged by hit are cards with a few "bad streak". While it is indeed very low probability compared to FSRS model, in practice it's not that uncommon, specially for recently introduced cards.
Once the repetitions are higher, it start to make more sense, but still, sometimes you still get cards with moderate amount of reviews still being flagged because they had a very very bad start.
For cards with high number of repetitions, it doesn't really bring much more information than checking the number of lapse, since contrary to what I would have expected a few months ago, in my case at least, the more reps a card has, the less stability it also has in average compared to lower reps card. So discriminating "harder cards" based on # reps or # lapses is still .... very valid for FSRS

#

I don't know if some has practical experience with it and see different cases ?

#

This is a typical example. Got flagged for a bad start, but will only get considered "unleeched" when the history count will be big enough ... while the bad streak was 1 year ago, but since easy cards doesn't grow in terms of reps that quickly, it might still be one of the most leechy card of my deck even though it's quite an easy one (1 failed rep in 1 year, and the last failed was ~11 month ago)

#

When I tried the leechkit with my --last-review N, it felt better, but mostly because now it would flag only the one with a recent bad streak. The number of result would of course be way lower, something like 2-5 cards over 4000 active one

unique salmon Apr 9, 2025, 9:19 AM

#

quasi shadow

So what's the plan?
If we aren't going to estimate it for every user, will you estimate one average value (or two, like a - b*R) and just hard-code those?

#

It would be better if it was estimated for each user individually

#

But I guess a - b*R with a and b estimated from the 10k dataset would still be ok on average

quasi shadow Apr 9, 2025, 9:26 AM

#

unique salmon So what's the plan? If we aren't going to estimate it for every user, will you e...

hard code

bold terrace Apr 9, 2025, 9:31 AM

#

quasi shadow optimal retention before: 0.7143667819857166 after: 0.8377484029026208

And now imagine if instead of sum(R) we had a sum(R*f(S)) with f a function that would converge to 1 when S is big enough (360d) 😄

#Team90DR

#

Comparison of SUM(R) and SUM(R*f(S)) when new card/day change from 8 to 40 to 8 again.
You can see that for SUM(R), the more you add, the better.

For SUM(R*f(S))

The more "active card" you have, the better since S can grow for all of those
New/card that stay at low S are discounted (I mean, does a .9R on a 1h stability should be the same Memorized Value than a 365d stability one ?)
Since R is included in [DR,100] if you're a good boy, for people with high DR, R is a proxy to measure SUM(active cards) 🤷‍♂️

#

For f, sqrt, ln, or more fancy like 1 - Math.exp(-((8 / 365) * s)) (early rise, converge to 1, and at 365d is already close to 1) doesn't really change much the trend, considering S is already good enough

cosmic hedge Apr 9, 2025, 9:40 AM

#

bold terrace And now imagine if instead of sum(R) we had a sum(R*f(S)) with f a function that...

CMRR is sum(R) / cost so the stability of the cards will already be factored into the cost right?

unique salmon Apr 9, 2025, 10:10 AM

#

cosmic hedge CMRR is `sum(R) / cost` so the stability of the cards will already be factored i...

Cost is time

#

So no

cosmic hedge Apr 9, 2025, 10:15 AM

#

unique salmon Cost is time

time spent on cards, cards which are scheduled according to stability?

unique salmon Apr 9, 2025, 10:23 AM

#

cosmic hedge time spent on cards, cards which are scheduled according to stability?

Yeah, but (I think) Sound's point is that CMRR doesn't take into account how quickly R decays

#

Aka how well you know the card

unique salmon Apr 9, 2025, 10:23 AM

#

quasi shadow Seems like the forgetting curve is flat when S is small and becomes sharper as t...

I can test making the decay depend on log(S)

#

Well, later, once I'm done with neural D

unique salmon Apr 9, 2025, 10:31 AM

#

lapis hearth I think knowing whether the card is leech or not is very helpful but evenso more...

Just from eyeballing it, if I pretend that the red line doesn't exist, I can't even tell if there is any correlation at all 🤣

#

Try it, everyone and anyone
Try telling whether probability of recall is positively or negatively correlated with answer time based on this graph

#

#

But in case @quasi shadow still wants to do it, I recommend estimating a - b*R for all 4 grades, 8 parameters in total

unique salmon Apr 9, 2025, 10:41 AM

#

quasi shadow

If these graphs are to be believed, a and b can be different for diferent grades

quasi shadow Apr 9, 2025, 10:43 AM

#

unique salmon Just from eyeballing it, if I pretend that the red line doesn't exist, I can't e...

#

It's more clear if I show you the box graph.

unique salmon Apr 9, 2025, 10:44 AM

#

unique salmon But in case <@449662392314494987> still wants to do it, I recommend estimating a...

Actually, wait
We calculate costs separately for learning and reviewing, so 16 new parameters 🤣

quasi shadow Apr 9, 2025, 10:44 AM

#

Just hard code it

#

let's all

unique salmon Apr 9, 2025, 10:44 AM

#

I really think we need a benchmark where the goal is to accurately predict costs

#

Rather than R

#

We could just take FSRS-5 and use all this stuff we use for CMRR and the simulator, and run it on the 10k dataset, and compare predicted costs to real answer times

#

Then we can finally have a way to tell if we're making the CMRR/simulator better or worse with our changes

#

#

Please make a repo for benchmarking the accuracy of predicting costs

#

You already have FSRS parameters per user, so it's not like you will need to estimate them again

#

Just copy them from the 10k repo

#

And in the new repo you will run FSRS with parameters for each user and with cost estimations for each review exactly as in CMRR/simulator

#

No optimization

#

And at the end we will get average(|predicted cost - actual cost|) and sqrt(average((predicted cost - actual cost)^2))

#

MAE and RMSE

#

And then we can enter the new era of tweaking our cost prediction stuff 🤣

#

Just think about ALL THE TWEAKS
I'M TWEAKING SO HARD

quasi shadow Apr 9, 2025, 10:52 AM

#

My solution

cosmic hedge Apr 9, 2025, 11:09 AM

#

unique salmon Yeah, but (I think) Sound's point is that CMRR doesn't take into account how qui...

isn't "how quickly R decays" "stability"?

sick moth Apr 9, 2025, 11:19 AM

#

cosmic hedge https://github.com/ankitects/anki/pull/3910 you're right XD

Fantastic stuff!

lapis hearth Apr 9, 2025, 11:32 AM

#

unique salmon Try it, everyone and anyone Try telling whether probability of recall is positiv...

but this is not p(leech) against time is it

#

or would it have a similar shape

#

.

#

Whatever @cursive badge did here

unique salmon Apr 9, 2025, 11:38 AM

#

cosmic hedge isn't "how quickly R decays" "stability"?

Yes

unique salmon Apr 9, 2025, 11:39 AM

#

quasi shadow My solution

I'm serious. At this level of complexity we NEED a proper benchmark
Please do it. Just reuse parameters for each user that you already have in the srs-benchmark repo, run FSRS on each user, make it predict answer time using the same formulas as in CMRR/simulator
And then calculate the mean absolute error and RMSE of predicted answer times and real answer times

#

We are past the point where we can just say "Oh, but this change obviously improves how accurately costs are calculated", we need proper tools to assess further changes

unique salmon Apr 9, 2025, 11:40 AM

#

lapis hearth but this is not p(leech) against time is it

This is just probability of recall against answer time

bold terrace Apr 9, 2025, 11:42 AM

#

cosmic hedge isn't "how quickly R decays" "stability"?

Hmmm the thing is that the workload is more dependent on the interval than really the stability. What I mean, is that the interval can be reduced by reducing the DR. But in fact, the intrinsic quality of your memory is Stability, not really Interval

#

Stability is agnostic of DR

quasi shadow Apr 9, 2025, 11:43 AM

#

unique salmon I'm serious. At this level of complexity we NEED a proper benchmark Please do it...

If I have a full-time job in Ankitects, I will consider it.

unique salmon Apr 9, 2025, 11:43 AM

#

quasi shadow If I have a full-time job in Ankitects, I will consider it.

FeelsBadAnki
Fine. But then don't implement this change yet

#

This is the kind of change that isn't obviously an improvement, and needs benchmarking

quasi shadow Apr 9, 2025, 11:45 AM

#

unique salmon This is the kind of change that isn't obviously an improvement, and needs benchm...

It obviously makes the optimal retention with FSRS-6 similar to FSRS-5.

#

Without it, CMRR will give you an extreme low retention.

unique salmon Apr 9, 2025, 11:48 AM

#

quasi shadow It obviously makes the optimal retention with FSRS-6 similar to FSRS-5.

Man, I don't want to argue. I'll just be brief: from this point on any extra complexity added to the simulations, specifically to the part related to estimating answer times, has to be justified via benchmarking as described here (#1282005522513530952 message), otherwise I will not be happy

I mean, you are obviously free to disregard my opinion, but I genuinely hope you will understand that past a certain point of complexity you need proper tools and not just "It works. Source: it was revealed to me in a dream"

bold terrace Apr 9, 2025, 11:48 AM

#

@cosmic hedge : For example, let say your workload right now is 100 reviews/day for DR=90% with a total sized deck of 1000. Your score would be 900/100 = 9.
If now you set the DR=70%, and let say it divide by 2 the workload. You get now a score of 700/50=14.

So basically, the optimizer will just make the most gain by making you drop the workload as much as possible -> dropping the DR

If you include S in the numerator, now you have something like f(S)/workload(I) that compensate that, and also, it pushes the goal function to try to also not sacrifice S just for the sake of reducing a bit workload

#

Right now when you look at the graph that CMRR is trying to optimize, it's not even a U curve it's almost a purely increasing curves .... so basically yeah, you always get the minimum threshold of 70%, it's worthless

quasi shadow Apr 9, 2025, 11:49 AM

#

unique salmon Man, I don't want to argue. I'll just be brief: from this point on any extra com...

I don’t know why you didn’t say it when I and @cosmic hedge implemented the learning steps in the simulator.

unique salmon Apr 9, 2025, 11:49 AM

#

bold terrace Right now when you look at the graph that CMRR is trying to optimize, it's not e...

If you mean the one in the manual, it's workload, not workload/total knowledge, so the shape is different

quasi shadow Apr 9, 2025, 11:49 AM

#

It’s more complex.

unique salmon Apr 9, 2025, 11:50 AM

#

quasi shadow I don’t know why you didn’t say it when I and <@388069992660205588> implemented ...

You mean the Markov chain thingy, with 12 costs? That one is "obvious enough" IMO 🤣
The chances that that one somehow making estimations of answer times less accurate are very low

#

Though, on second thought, I want to see that one benchmarked as well

#

We can benchmark

Current implementation
Current implementation + Markov chain for learning steps
Current implementation + Markov chain for learning steps + your R correction

bold terrace Apr 9, 2025, 11:52 AM

#

unique salmon If you mean the one in the manual, it's workload, not workload/total knowledge, ...

Ah yeah my bad https://github.com/open-spaced-repetition/fsrs4anki/blob/main/fsrs4anki_optimizer.ipynb, I checked this one.

But yeah, it feels that CMRR to me is always "return 0.70" right now. And I think it's because the goal function (being only SUM(R) or SUM(R)/workload) is not taxing errors enough. Again, a question of setting the right tarif

unique salmon Apr 9, 2025, 11:53 AM

#

quasi shadow I don’t know why you didn’t say it when I and <@388069992660205588> implemented ...

Slightly unrelated, but you apply the same smoothing to those 12 costs, right? So that the final value is a weighted average of the default cost and user-specific cost, weighted by n reviews

quasi shadow Apr 9, 2025, 11:53 AM

#

unique salmon We can benchmark 1) Current implementation 2) Current implementation + Markov ch...

It’s tedious for me to benchmark it when there are only a few people who are concerned with it.

quasi shadow Apr 9, 2025, 11:53 AM

#

unique salmon Slightly unrelated, but you apply the same smoothing to those 12 costs, right? S...

Just check the code.

unique salmon Apr 9, 2025, 11:54 AM

#

quasi shadow It’s tedious for me to benchmark it when there are only a few people who are con...

I'm not asking you to benchmark it because I am concerned about it, though I certainly am. I'm asking you to benchmark it for the sake of making a better algorithm and making things better for anyone who will be using CMRR/simulator

#

If your correction makes the predicted answer times less accurate, it will affect everyone who uses CMRR/simulator

unique salmon Apr 9, 2025, 11:57 AM

#

quasi shadow Just check the code.

Idk where to look

#

Clearly not the python version, since that's not what is used in Anki

#

And the Rust version doesn't seem to have the new learning step simulation

#

https://github.com/orgs/open-spaced-repetition/discussions/36
I mentioned it here

GitHub

My ideas/recommendations to Jarrett, gathered in one place · open-...

The purpose of this discussion is for me to link to issues/PRs that are related to FSRS and where I have something to say and don't want Jarrett to forget about it/miss my comment. Leech detect...

quasi shadow Apr 9, 2025, 12:10 PM

#

unique salmon Idk where to look

https://github.com/open-spaced-repetition/fsrs-rs/pull/313

GitHub

Feat/FSRS-6 by L-M-Sherlock · Pull Request #313 · open-spaced-rep...

I'm trying to keep sync with:

Feat/FSRS-6 fsrs-optimizer#169
Feat/Short term iterates through learning steps. fsrs-optimizer#170

The major changes:

add an extra parameter to the formula ...

unique salmon Apr 9, 2025, 12:12 PM

#

quasi shadow https://github.com/open-spaced-repetition/fsrs-rs/pull/313

I don't think smoothing is applied to new costs?

#

Or to learning_step_transitions and relearning_step_transitions

#

All of them need to be smoothed

#

Btw, why are there so many 0.25?

lapis hearth Apr 9, 2025, 1:48 PM

#

unique salmon This is just probability of recall against answer time

Yes but you were talking about we could use p(leech) to determine whether a card is leech or not.

So plot p(leech) against time is what I am saying and see if there are trends to be seen.

unique salmon Apr 9, 2025, 1:57 PM

#

Oh wait, I replied to the wrong comment

#

That was meant to be a reply to Jarrett

#

To this comment

lapis hearth Apr 9, 2025, 2:30 PM

#

unique salmon Oh wait, I replied to the wrong comment

https://tenor.com/view/walter-white-breaking-bad-walter-white-screaming-breaking-bad-walter-walter-gif-27285909

Tenor

unique salmon Apr 9, 2025, 3:18 PM

#

Man, I want a benchmark for predicting answer time FeelsBadAnki

#

https://tenor.com/view/pepe-gif-8493382194019209370

Tenor

cursive badge Apr 9, 2025, 3:20 PM

#

I wish Anki recorded time-to-answer as well as total study time. I feel the time spent looking at the back kind of poisons the data for other uses.

unique salmon Apr 9, 2025, 3:21 PM

#

quasi shadow If I have a full-time job in Ankitects, I will consider it.

How much Dae would have to pay you? 🤣

bold terrace Apr 9, 2025, 3:33 PM

#

Also look how sum(R*f(S)) represent a better representation of the "gainz" you did, either if it's due to more new cards/day or by just reviewing them more

#

On the opposite side, sum(R) feels like reviewing a lot of cards per day was less useful than just introducing a shit ton of card I was able to recall 2d later

unique salmon Apr 9, 2025, 3:45 PM

#

quasi shadow My solution

If you want to calculate answer time as a function of R, sure. But not like this. Here - assuming I understand your code correctly - you just apply a correction to the already existing average (or median, whatever, that's not the point) time. That's not the same as answer time = a - b*R, that's "I took answer time that is not related to R at all and added some sort of correction to it"

#

I have no objections to answer time = a - b*R if and only if a and b are estimated for each user and for each grade separately. Otherwise we will lose accuracy instead of gaining it. Right now the median answer times are estimated for each user individually. If we use answer time = a - bR with fixed a and b, I'm 100% sure it will be worse than our current approach, since this function will be the same for each user instead of being based on user-specific data
As for the approach in your screenshot, where you just add a correction based on R to answer time that is not related to R - no, absolutely not, please don't

#

Man, I'm telling you, past the current level of complexity WE REALLY NEED A BENCHMARK

sick moth Apr 9, 2025, 3:56 PM

#

unique salmon Man, I'm telling you, past the current level of complexity WE REALLY NEED A BENC...

Pull requests welcome?

unique salmon Apr 9, 2025, 3:57 PM

#

sick moth Pull requests welcome?

sigh
I could maaaaaybe try to do it myself and then make a repo, but god it would be a nightmare

sick moth Apr 9, 2025, 3:57 PM

#

You should, you're asking for a nightmare

First job is to get a shovel and start digging 🙂

unique salmon Apr 9, 2025, 3:58 PM

#

Read the user review data from the .parquet file, get FSRS params corresponding to that user, use the code that estimates answer times to estimate answer times, record the difference between "predicted" and real answer time after every review, average it to get the average error
Repeat for ten thousand users
Average the average errors to get the average average error

#

Oh joy...

#

I'll have to stitch together the simulator code and the code that reads data from .parquet files

#

😭😭😭

#

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

#

Actually, no, more like I'll have to repurpose the already existing benchmarking code, but load the FSRS parameters instead of calculating them
And somehow make it output answer times

#

The more I think about how I'm going to do it, the more I'm making noises of a dying seal

#

It's not even that bad if you are Jarrett since he's the only one who actually understand the monstrosity that is the benchmarking code

#

But to everyone else the benchmarking code is barely comprehensible

quasi shadow Apr 9, 2025, 4:07 PM

#

unique salmon I don't think smoothing is applied to new costs?

It has:

#

unique salmon Apr 9, 2025, 4:08 PM

#

quasi shadow It has:

Uhhhh...ok, I'm just going to trust you 😅

quasi shadow Apr 9, 2025, 4:08 PM

#

unique salmon It's not even that bad if you are Jarrett since he's the only one who actually u...

I think Alex also understands it.

unique salmon Apr 9, 2025, 4:08 PM

#

unique salmon Apr 9, 2025, 4:12 PM

#

quasi shadow I think Alex also understands it.

@polar maple want to make a benchmark of answer time predictions?

For each of the 10k users, get their FSRS params from the srs-benchmark repo
For every review predict answer time, which currently is just a weighted average of user-specific median answer time and default answer time. It's currently "static" - we just estimate a bunch of numbers from the user's review history, no FSRS needed. So even the word "predict" isn't really correct here. But we could estimate answer time as a function of R or something, then we would need to actually run FSRS
Calculate the difference between real answer time and predicted answer time
Calculate the final error across all users and all of their reviews

#

(I assume the answer is "no")

#

Neither do I 😅

cursive badge Apr 9, 2025, 4:14 PM

#

I admit I looked at the benchmark code once and then decided it would be easier to just do my own thing because I found it hard to follow which columns were where in the code.

unique salmon Apr 9, 2025, 4:15 PM

#

Josh (joshuahamilton on Discord) also said it's hard to understand

unique salmon Apr 9, 2025, 4:15 PM

#

unique salmon <@142448513622605824> want to make a benchmark of answer time predictions? 1) Fo...

Honestly, I think I could do this if it's "static". But answer time as a function of R...no

cursive badge Apr 9, 2025, 4:19 PM

#

To be fair to Jarrett I think it's inevitable that this kind of thing would become a bit confusing. He's just doing it as his own side project and is under no obligation to spend extra time trying to make it easier for others to digest.

unique salmon Apr 9, 2025, 4:21 PM

#

unique salmon Btw, why are there so many 0.25?

@quasi shadow

quasi shadow Apr 9, 2025, 4:21 PM

#

unique salmon Btw, why are there so many 0.25?

because the user doesn't use Hard

#

so we don't know the probs of the next rating after hard

unique salmon Apr 9, 2025, 4:23 PM

#

The thing is, there isn't really any code that I can just steal and repurpose with minimal effort
The benchmarking code - I will break it if I try to disable optimization. Plus, I have no idea how to add the estimation of median review time to it in a way that is compatible with the rest of the code
The simulator code - I won't be able to read .parquet data and pass it into the simulator instead of randomly generated data

#

Plus a whole lot of little details that only Jarrett knows how to get right

quasi shadow Apr 9, 2025, 4:27 PM

#

unique salmon Apr 9, 2025, 4:27 PM

#

And considering that historically I have never, NOT EVEN ONCE managed to run the benchmarking code on the first try and always had to consult Jarrett, EVERY SINGLE TIME HE CHANGED ANYTHING about the benchmarking, I feel like it would be easier for me to get a job and pay Jarrett

cursive badge Apr 9, 2025, 4:27 PM

#

unique salmon The thing is, there isn't really any code that I can just steal and repurpose wi...

Reading parquet was pretty easy when I tried. I've never tried touching the simulator code so I cannot comment on that.

quasi shadow Apr 9, 2025, 4:27 PM

#

If you spent 3~4 hours per day over a year, you would understand it.

unique salmon Apr 9, 2025, 4:28 PM

#

cursive badge Reading parquet was pretty easy when I tried. I've never tried touching the simu...

Reading is easy, passing it into the simulator code (or whatever relevant parts of it, minus random number generation) is hard

#

God, this is so over

quasi shadow Apr 9, 2025, 4:29 PM

#

😂 That's what I did to contribute to Anki

unique salmon Apr 9, 2025, 4:30 PM

#

Unless Jarrett decides that he nobly wishes to never make any changes to calculating answer times without benchmarking them first, for the sake of Anki users

polar maple Apr 9, 2025, 4:31 PM

#

unique salmon <@142448513622605824> want to make a benchmark of answer time predictions? 1) Fo...

no

unique salmon Apr 9, 2025, 4:31 PM

#

figures

unique salmon Apr 9, 2025, 4:32 PM

#

quasi shadow My solution

At least promise me to not do this

polar maple Apr 9, 2025, 4:32 PM

#

polar maple S = 1 -> 0.1 S = 30 -> 0.2 S > 100 -> 0.5 and maybe interpolate this in log spac...

i tried scaling decay with this scaling, it does not work well

quasi shadow Apr 9, 2025, 4:33 PM

#

unique salmon At least promise me to not do this

I do it because I introduce the flat forgetting curve.

cursive badge Apr 9, 2025, 4:34 PM

#

Expertium really needs AGI to happen so they command an army of AI agents to go off and program all their ideas 😂

quasi shadow Apr 9, 2025, 4:34 PM

#

It's better than do nothing.

unique salmon Apr 9, 2025, 4:34 PM

#

cursive badge Expertium really needs AGI to happen so they command an army of AI agents to go ...

This but unironically

unique salmon Apr 9, 2025, 4:34 PM

#

quasi shadow It's better than do nothing.

No, that's the point - you don't know that it's better

hasty fractal Apr 9, 2025, 4:34 PM

#

"Create an Anki better than the one Expertium created". Checkmate.

clever cargo Apr 9, 2025, 4:34 PM

#

cursive badge Expertium really needs AGI to happen so they command an army of AI agents to go ...

at that point we'd just ask the agi to emulate a perfect version of anki

quasi shadow Apr 9, 2025, 4:34 PM

#

I know it's bad without it.

#

If you really run the unit test, you will know how bad.

unique salmon Apr 9, 2025, 4:35 PM

#

Your formula works only if cost is defined as "answer time at R=90%", but it's not

quasi shadow Apr 9, 2025, 4:36 PM

#

unique salmon Your formula works only if cost is defined as "answer time at R=90%", but it's n...

OK, I will modify the code to enable it only for CMRR.

unique salmon Apr 9, 2025, 4:36 PM

#

So the simulator will do it differently compared to CMRR? Please no...

#

I'd rather ditch CMRR entirely

quasi shadow Apr 9, 2025, 4:37 PM

#

Fine. Remove CMRR.

#

Forget it.

cursive badge Apr 9, 2025, 4:37 PM

#

clever cargo at that point we'd just ask the agi to emulate a perfect version of anki

Maybe the AI superintelligence will humour us and at least tell us our ideas are good. 😂

hasty fractal Apr 9, 2025, 4:38 PM

#

remove CMRR, improve simulator 👍

quasi shadow Apr 9, 2025, 4:39 PM

#

It's the worst feature I made.

unique salmon Apr 9, 2025, 4:40 PM

#

quasi shadow It's the worst feature I made.

CMRR? I think it's fine, at least right now

#

Maybe not if it always outputs 70% 🤣

clever cargo Apr 9, 2025, 4:40 PM

#

quasi shadow It's the worst feature I made.

fwiw the code was very cool, learnt a lot!

quasi shadow Apr 9, 2025, 4:40 PM

#

unique salmon CMRR? I think it's fine, at least right now

Nope. It has so many problems.

#

For example, the loss aversion.

#

It's introduced to increase the output.

unique salmon Apr 9, 2025, 4:41 PM

#

quasi shadow Nope. It has so many problems.

Can't we just run the simulations, with real deck sizes and real cards states and all that, and get CMRR 2.0?

quasi shadow Apr 9, 2025, 4:41 PM

#

Actually, the current simulator is incorrect.

unique salmon Apr 9, 2025, 4:41 PM

#

Instead of the current "spherical in vacuum" implementation

quasi shadow Apr 9, 2025, 4:41 PM

#

because of loss aversion

#

unique salmon Apr 9, 2025, 4:42 PM

#

quasi shadow because of loss aversion

I thought it's disabled for simulations?

quasi shadow Apr 9, 2025, 4:42 PM

#

unique salmon I thought it's disabled for simulations?

It's always 2.5.

unique salmon Apr 9, 2025, 4:42 PM

#

Like, I thought it's used only for CMRR

quasi shadow Apr 9, 2025, 4:42 PM

#

🤣

unique salmon Apr 9, 2025, 4:42 PM

#

God damn it man, please disable it

quasi shadow Apr 9, 2025, 4:42 PM

#

Nobody complain

unique salmon Apr 9, 2025, 4:42 PM

#

People want accurate workloads

quasi shadow Apr 9, 2025, 4:42 PM

#

You're the first one.

unique salmon Apr 9, 2025, 4:42 PM

#

For CMRR it's ok because people don't see the workload graph

#

"Time"

#

Because we show this graph, it'd better be accurate

#

CMRR doesn't show anything related to how much time is spent on reviews, so it's fine to cheat a little bit, users won't be able to see it

quasi shadow Apr 9, 2025, 4:45 PM

#

Forget CMRR

unique salmon Apr 9, 2025, 4:45 PM

#

Alright

#

I hope you will remove loss aversion from the simulator

quasi shadow Apr 9, 2025, 4:46 PM

#

I will

#

after merging the FSRS-6 PR

#

There are several benchmarks I need to complete

#

so maybe the next week

unique salmon Apr 9, 2025, 4:46 PM

#

Maybe make CMRR 2 with accurate deck sizes and card states? ankieyes

#

Aka just run the simulator with all of it's configs

#

Easy Days, sort order, blah blah

quasi shadow Apr 9, 2025, 4:47 PM

#

CMRR is designed for average users

#

but it's too hard

#

so I give up

unique salmon Apr 9, 2025, 4:48 PM

#

Literally just

for R in range(70, 100):
    workload, knowledge = simulator(R, all_other_shit)

#

And there you go, CMRR 2.0!

quasi shadow Apr 9, 2025, 4:49 PM

#

ask for A bloke or someone else

#

😅 I need to focus on FSRS-6

unique salmon Apr 9, 2025, 4:50 PM

#

unique salmon Literally just ```python for R in range(70, 100): workload, knowledge = simu...

@cosmic hedge wanna replace the current CMRR that assumes a specific deck size, a specific number of new cards/day, no already learned cards, etc. with CMRR Turbo Plus Ultra?

#

Instead of having CMRR as a separate entity, just make it a part of the simulator

polar maple Apr 9, 2025, 4:54 PM

#

i guess that especially without loss_aversion, CMRR would output 0.7 in most cases?

unique salmon Apr 9, 2025, 4:54 PM

#

According to Jarrett, with FSRS-6 - yes

#

Maybe we need to listen to Sound after all and use sum(R*f(S)) instead of sum(R)

#

But then the choice of f(S) is completely arbitrary

polar maple Apr 9, 2025, 4:56 PM

#

i vote for one of these that i described here

#

has a better meaning than something like R*sqrt(S)

ashen light Apr 9, 2025, 4:59 PM

#

cursive badge 1. I think if a leech revamp does happen it would ideally involve new prop(s). T...

you can do it

#

I believe in you

cursive badge Apr 9, 2025, 5:04 PM

#

ashen light I believe in you

Not that kind of convincing 😂

ashen light Apr 9, 2025, 5:10 PM

#

oh, so you can't do it

cursive badge Apr 9, 2025, 5:10 PM

#

You're a tricksy one Jake ;p

#

I mean: I'm not sure it is fully baked yet, and don't want to put a lot of effort into something that might be thrown away if it doesn't work well for most users.

unique salmon Apr 9, 2025, 5:25 PM

#

polar maple has a better meaning than something like `R*sqrt(S)`

Yeah, I really like the integral idea
For anyone who wants to mess around with it and implement it in the advanced stats add-on:

The only issue is that the choice of the range of time (t1, t2) is arbitrary
@cosmic hedge @bold terrace

#

📎 Forgetting_curve_integral.py

bold terrace Apr 9, 2025, 5:37 PM

#

unique salmon Yeah, I really like the integral idea For anyone who wants to mess around with i...

Hmmm f(S) = ?

unique salmon Apr 9, 2025, 5:38 PM

#

bold terrace Hmmm f(S) = ?

There is no f(S). Instead, we use R averaged over some period of time

#

Which implicitly takes into account S, since with higher S average over [t1, t2] will be greater

#

I wouldn't use it for graphs, but we can use it for CMRR 2, if a bloke implements it

#

Instead of workload/knowledge(at the end of the simulation), it will be workload/knowledge(average over some time)

bold terrace Apr 9, 2025, 5:40 PM

#

Maybe I'm wrong but won't it be somewhat linear proportion based on S ?

#

Since S is already kinda describing how R decline with time

unique salmon Apr 9, 2025, 5:41 PM

#

Not really

#

#

#

polar maple Apr 9, 2025, 5:42 PM

#

bold terrace Maybe I'm wrong but won't it be somewhat linear proportion based on S ?

it's the average R over a certain time period so it will be divided out by the amount of time

unique salmon Apr 9, 2025, 5:42 PM

#

polar maple it's the average R over a certain time period so it will be divided out by the a...

Average R

bold terrace Apr 9, 2025, 5:43 PM

#

unique salmon Not really

For S=180 and S=360 you would have how much ?

unique salmon Apr 9, 2025, 5:43 PM

#

bold terrace Apr 9, 2025, 5:43 PM

#

I see

unique salmon Apr 9, 2025, 5:43 PM

#

#

Alright, I'm gonna be away from my PC for an hour or two, so feel free to use the file I provided

bold terrace Apr 9, 2025, 5:50 PM

#

To be fair I think it would reward a bit too much very low Stability compared to very high one (Since let's be honest, S=1/S=2, the knowledge is not at all acquired), but I don't mind testing how it would look like. I just don't know how performant it will be to do that loop from t1 to t2 for every revlog entry (but I guess a few dozen should not hurt)

polar maple Apr 9, 2025, 5:55 PM

#

bold terrace To be fair I think it would reward a bit too much very low Stability compared to...

in this case probably t1 = 1 is fixed and t2 will be something we experiment with

#

in Expertium's code rn it is t2 = 10 which is far too low imo

#

we don't need to iterate between t1 to t2 actually, with the integral the computation is quick

bold terrace Apr 9, 2025, 6:00 PM

#

polar maple we don't need to iterate between t1 to t2 actually, with the integral the comput...

Jonathans-Laptop:tmp jschoreels$ python3 fs.py 
R at t1=1: 0.900000
R at t2=360: 0.331270
Average R within the [t1, t2] range: 0.409252
Brute force calculation of average R within the [t1, t2] range: 0.409252
Brute force calculation agrees with integral calculation: False
Jonathans-Laptop:tmp jschoreels$ python3 fs.py 
R at t1=1: 0.999615
R at t2=360: 0.900000
Average R within the [t1, t2] range: 0.944604
Brute force calculation of average R within the [t1, t2] range: 0.944604
Brute force calculation agrees with integral calculation: True

#

I see

#

def power_forgetting_curve(t, s, decay):
    factor = 0.9 ** (1 / decay) - 1
    return np.power((1 + factor * t / s), decay)

This function doesn't depend on anything else ? D ? FSRS parameters ?

polar maple Apr 9, 2025, 6:04 PM

#

nah it doesn't. if you wanted something that depends on everything then you could take the result of SSP-MMC as a score in itself, the "average cost to reach target stability given S,D,R, and FSRS params"

bold terrace Apr 9, 2025, 6:07 PM

#

S=1
Integral at 10 : 9.451472
Integral at 360 : 149.668518

S=360
Integral at 10 : 658.855147
Integral at 360 : 988.986838

#

I removed the avg

#

Maybe I shouldn't have lol

polar maple Apr 9, 2025, 6:16 PM

#

bold terrace ``` S=1 Integral at 10 : 9.451472 Integral at 360 : 149.668518 S=360 Integral a...

Integral at 10 : 658.855147 Does this mean t1=1 to t2=10? why is it a number larger than 10? if R = 1 for the whole duration theres no way it can sum up to such a large number

bold terrace Apr 9, 2025, 6:16 PM

#

t2=10

#

stabilities = [i for i in range(1,360)]
print(stabilities)
integrals = [integral_power_forgetting_curve(t2, s, decay) for s in stabilities]
print(integrals)
plt.plot(stabilities, integrals)
plt.show()

#

I just call this

#

def integral_power_forgetting_curve(t, s, decay):
    factor = 0.9 ** (1 / decay) - 1

    # Check that parameters are in valid ranges
    if not (0 > decay >= -1):
        raise ValueError("Decay must be in the range (0, -1]")
    if t <= 0 or s <= 0:
        raise ValueError("t and s must be positive")

    # Special case for decay ≈ -1
    if abs(decay + 1) < 1e-10:  # Using a small threshold to check if decay ≈ -1
        return (s / factor) * np.log1p(factor * t / s)

    # General case for decay ≠ -1
    return (s / (factor * (decay + 1))) * ((1 + factor * t / s) ** (decay + 1))

#

But yeah S=1 integral at 9.4 seems off

#

t_array = [i for i in range(1,100)]
integrals = [integral_power_forgetting_curve(t, 1, decay) for t in t_array]
plt.plot(t_array, integrals)
plt.show()

Feels also a bit off for stability=1

#

t_array = [i for i in range(1,100)]
integrals = [integral_power_forgetting_curve(t, 1, decay) for t in t_array]
plt.plot(t_array, integrals)
plt.show()

unique salmon Apr 9, 2025, 6:35 PM

#

If you want to get something that can be interpreted as average R over time, you need this

def average_f_power_forgetting_curve(t1, t2, s, decay):
    if not t2 > t1:
        raise ValueError("t2 must be greater than t1")

    # Calculate F(t2) - F(t1) where F is the antiderivative
    integral = integral_power_forgetting_curve(t2, s, decay) - integral_power_forgetting_curve(t1, s, decay)

    # Divide it by the difference in time to get the average
    return integral / (t2 - t1)```

The integrals themselves cannot be interpreted as average R, and their difference cannot be interpreted as average R, but rather, as area under the curve

#

You need the difference between integrals divided by the difference between times

#

If you want the area under the forgetting curve, remove division by (t2 - t1)

unique salmon Apr 9, 2025, 6:40 PM

#

bold terrace ``` t_array = [i for i in range(1,100)] integrals = [integral_power_forgetting_c...

This is meaningless, or at least I can't think of any useful interpretation

#

My idea is to use average R over the next year instead of average R at the end of the simulation for CMRR (again, if a bloke wants to make new CMRR)

bold terrace Apr 9, 2025, 6:45 PM

#

#

stabilities = [i for i in range(2,360)]
print(stabilities)
for t2 in range(10, 100, 10):
    integrals = [average_f_power_forgetting_curve(t1, t2, s, decay) for s in stabilities]   
    plt.plot(stabilities, integrals, label=f't2:{t2}')

unique salmon Apr 9, 2025, 6:47 PM

#

bold terrace

This is the case where you really should name your axes...axises...you know 😅

#

Anyway, with default FSRS-5 params I get MRR=84%, which is weird because in Anki I get 87%, but oh well, I ain't gonna look for discrepancies. Maybe default params changed, or maybe the default answer times changed
I'll see what I get if I use sum(avg_R(0, 365)) instead of sum(R). MRR should become higher, I think. Maybe. Actually, idk, we'll see

bold terrace Apr 9, 2025, 6:50 PM

#

#

I prefer mine 😄

polar maple Apr 9, 2025, 6:50 PM

#

bold terrace

add t2 = 365 and higher

bold terrace Apr 9, 2025, 6:51 PM

#

#

📎 fs.py

tepid spoke Apr 9, 2025, 6:55 PM

#

I thought about maybe trying out what normal optimized parameters but 95% retention would lead to.
Result: no, I'd rather not

bold terrace Apr 9, 2025, 7:00 PM

#

#

I prefer my 1-exp(...) 😛

#

Could also be more easily customizable : "What's your goal stability ?", it's just a factor to make it reach "1" sooner or later

unique salmon Apr 9, 2025, 7:19 PM

#

unique salmon Anyway, with default FSRS-5 params I get MRR=84%, which is weird because in Anki...

With R averaged over one year, I get slightly higher MRR (specifically, averaged over delta_t, delta_t+365, where delta_t = today - last_review_date, so it's R over "from today and one year into the future")
0.84 -> 0.86
With R averaged over 5 years, I get
0.84 -> 0.89

bold terrace Apr 9, 2025, 7:19 PM

#

#Team90DR Intensify lol

#

And with that R, you still get plenty points for very low S

#

imagine with a nice and smooth 1-exp

#

💦

hasty fractal Apr 9, 2025, 7:20 PM

#

isn't the water emoji lewd?

#

or it's like "this is hot"

unique salmon Apr 9, 2025, 7:21 PM

#

With R averaged over the next 100 years, I get 0.92

bold terrace Apr 9, 2025, 7:22 PM

#

Future FSRS parameter : What is your expected remaining lifespan

unique salmon Apr 9, 2025, 7:22 PM

#

Lol

bold terrace Apr 9, 2025, 7:22 PM

#

I like the idea that S=365 represent "the max"

hasty fractal Apr 9, 2025, 7:24 PM

#

let's be real: FSRS doesn't work at really high intervals, saying from experience. at that point, there just isn't enough data.

bold terrace Apr 9, 2025, 7:25 PM

#

Do you mean the interval are too big or too low ?

#

What would be for you the max S

#

that is relevant

#

It's interesting to remember that if all your card had a 365d stability, you could maintain 50K words by doing 137 reviews/day at 90% DR

#

~5min at 2s/review

#

so clearly, 365d might already be "too high" for realistic "anki endgoal"

#

If we accept 30min of daily anki to maintain 50K words, a stability of 60d would already be enough

hasty fractal Apr 9, 2025, 7:29 PM

#

bold terrace Do you mean the interval are too big or too low ?

I can't say that tbh. It's just that the retention is usually very different for these cards compared to others.

bold terrace Apr 9, 2025, 7:30 PM

#

hasty fractal I can't say that tbh. It's just that the retention is usually very different for...

Ok but at how much interval would you say you start to feel they have "long" interval ?

#

DR ?

unique salmon Apr 9, 2025, 7:31 PM

#

bold terrace To be fair I think it would reward a bit too much very low Stability compared to...

Which loop? The brute force approach will not be used

#

Brute force is just a sanity check

#

To make sure the integral math is mathing

bold terrace Apr 9, 2025, 7:31 PM

#

yeah I misunderstood

hasty fractal Apr 9, 2025, 7:31 PM

#

I would say around a year, but depends on content. If it's the general knowledge deck I internalise the cards very quickly and at that point, Anki feels quite unnecessary.

#

If its JP word it'll take longer.

#

btw, folks, do we have a roadmap for what's coming next in algorithmic/fsrs improvement?

#

ya'll put hundreds of messages everyday in this channel, a mere human can't possibly read all that

unique salmon Apr 9, 2025, 7:35 PM

#

hasty fractal btw, folks, do we have a roadmap for what's coming next in algorithmic/fsrs impr...

FSRS-6 with a new parameter for same-day reviews and with a flatter curve
Simulator now takes load balancing and Easy Days into account
Simulator now simulates same-day reviews way better
Load balancing is tweaked, so hopefully maybe potentially possibly Sound will finally stop complaining about LB decreasing retention, but I wouldn't bet on that
Maybe remove CMRR as it's kinda shit according to Jarrett
Maybe make CMRR+ Mega Ultra Giga Chad Sigma Edition if Luc (A bloke) wants to. Instead of CMRR being separate from the simulator, it will use all of the simulator settings, including sort order, Easy Days, etc.

#

This

bold terrace Apr 9, 2025, 7:38 PM

#

Expertium continuing to think he's the boss of everyone doing something 😄

unique salmon Apr 9, 2025, 7:39 PM

#

More like I'm the guy who's job is to remind everyone about that one really cool feature that I suggested a year ago and everyone forgot about

hasty fractal Apr 9, 2025, 7:40 PM

#

bold terrace 8) Expertium continuing to think he's the boss of everyone doing something 😄

lol this is true

#

im reminded of a character who loves packing but not really: what he actually likes is lolling on the sofa and telling others how to properly pack

unique salmon Apr 9, 2025, 7:42 PM

#

hasty fractal im reminded of a character who loves packing but not really: what he actually li...

literally me

#

Also, I tried CMRR with FSRS-6 parameters and decay, and yeah, it's just forever 0.7 😅

ashen light Apr 9, 2025, 7:43 PM

#

bold terrace 8) Expertium continuing to think he's the boss of everyone doing something 😄

one day he'll boss himself around

unique salmon Apr 9, 2025, 7:44 PM

#

unique salmon Also, I tried CMRR with FSRS-6 parameters and decay, and yeah, it's just forever...

So CMRR will probably be removed because it's lobotomized now

#

Then again, current CMRR isn't realistic in the first place since it doesn't take into account LB, Easy Days, sort order, real new cards/day limit, real review/day limit, real deck size, real card states, etc.

#

And I really hope Luc will just use the simulator code with all of its settings for the next-gen CMRR

#

...or maybe he won't, and then users will forever keep asking "What's the best value of desired retention?" until the end of the universe (or Anki), everyone will be coming up with their own rule of thumb, twenty bloggers will write twenty articles on the best value of desired retention, and then 10 years later somebody will ask "Why not just run the simulator for every allowed value of DR and check the workload?", and I will answer "Because nobody wanted to do implement it 10 years ago"

#

And after that we will be back to asking "What's the best value of desired retention?" until the end of the universe

bold terrace Apr 9, 2025, 8:07 PM

#

unique salmon And after that we will be back to asking "What's the best value of desired reten...

It was the default value all along

#

But some otherthinker came up with some "next gen optimizer"

#

And screw naive people

#

(I'm looking at you)

#

"Crunching Numbers" lol

polar maple Apr 9, 2025, 8:13 PM

#

to be fair aren't we just panic finding new methods/metrics in order to purposefully increase MMR

#

and the moment we get a high number we declare victory

unique salmon Apr 9, 2025, 8:13 PM

#

polar maple to be fair aren't we just panic finding new methods/metrics in order to purposef...

kind of

bold terrace Apr 9, 2025, 8:16 PM

#

To be fair, I don't think there is a huge huge hurry, mine is blocked to 0.70 since it has been introduced

#

I did the mistake to change my DR to that .70, once

#

Then my effective R was around 50-55%

#

and it took me 2-3 weeks to recover from that week pain

#

But CMRR was right ! to increase my knowledge, I had to drop DR very low, add a lot of words ...
... And be in a state with a shitton of card with stability <1d that would all contribute to my marvelous "total knowledge" that was that sum(R)

#

(I over exagerate since yes, the interval/stability is somewhat accounted in the workload, so it's not like it was completely ignored, but still)

#

Problem is that CMRR estimated that with a DR set to 70% I would fail 30% of the, when I failed in fact 45% of my cards 🥲

unique salmon Apr 9, 2025, 8:22 PM

#

More accurate FSRS-6 + sum(avg_r(delta_t, delta_t+1095)) instead of sum(R) + using the actual deck size and the actual card states could alleviate a lot of that

#

IMO, the biggest problem with CMRR is not the choice of the function to minimize/maximize, but the fact that the settings have barely anything to do with reality

bold terrace Apr 9, 2025, 8:23 PM

#

IMO a small warning : "If you plan to change your DR, please do it incrementally"

#

Would save many lifes

#

What about ditching forgetting curve, and just train different set of parameters for different DRxD range ?

#

We might even have more params than Alex doing so

#

😄

robust hill Apr 9, 2025, 8:31 PM

#

execute the complainers

cursive badge Apr 9, 2025, 8:33 PM

#

bold terrace IMO a small warning : "If you plan to change your DR, please do it incrementally...

Or be sneaky and make it act like a PID controller. Don't just immediately schedule based on the user DR, slowly adjust things internally over time based on the difference between DR and true retention.

bold terrace Apr 9, 2025, 8:35 PM

#

cursive badge Or be sneaky and make it act like a PID controller. Don't just immediately sched...

Yeah I thought about that and I was also thinking it could be nice coupled with the fact FSRS has some recency weight

#

Could make the recency weight a bit more aggressive for phases where DR change, to let it adapt quicker

unique salmon Apr 10, 2025, 8:54 AM

#

@cosmic hedge sorry for frequent pings guys, but I want to ask - do you want to implement next gen CMRR? And by that I mean just use the simulator with all of its settings to make CMRR as realistic as possible.
Currently, CMRR assumes fixed deck size, no learned cards, doesn't take into account sort order, Easy Days, etc. All of that can be fixed by reusing the simulator.
@quasi shadow wants to remove CMRR because with FSRS-6 it outputs 70% too frequently, and also because it's kinda crap overall, and while that's understandable, I think we should instead improve CMRR and make it more realistic by using real deck sizes, real card states, real new and review limits, etc.

#

Removing CMRR completely would be a net loss of functionality, and since there are obvious ways to make it more realistic, I think we should do that instead

#

Though, there is also the problem that Alex pointed out - we are in a situation where we want CMRR to output higher numbers, so we will declare any tweaks that make the output bigger good

bold terrace Apr 10, 2025, 9:42 AM

#

Perfection, 90% 😄

#

Look how daily outcome become so much predictable 🙂 ANd it's even by 5-day average there.

#

Without 5-day average, it would give this for Anki Scheduling daily R

#

Compare to this with Filtered DEcks

quasi shadow Apr 10, 2025, 10:45 AM

#

unique salmon <@388069992660205588> sorry for frequent pings guys, but I want to ask - do you ...

It's better to remove it than keep it as is. I agree that there may be a good design. But I'm not the one who will implement it.

quasi shadow Apr 10, 2025, 10:46 AM

#

unique salmon Removing CMRR completely would be a net loss of functionality, and since there a...

It is a net gain because it won't cause more confusion.

unique salmon Apr 10, 2025, 10:46 AM

#

quasi shadow It is a net gain because it won't cause more confusion.

People will ask "What's the best value of DR?", so it makes sense to have a tool to answer that question

#

I've said it yesterday, CMRR the biggest problem with CMRR is the unrealistic settings it uses

#

It should just use the same settings and the same deck and card info as the simulator, for maximum realism

#

So fixing (or at least improving) CMRR is just a matter of reusing the simulator config

quasi shadow Apr 10, 2025, 10:48 AM

#

unique salmon People will ask "What's the best value of DR?", so it makes sense to have a tool...

A calculate which always outputs zero is useless.

#

Or, it's harmful.

unique salmon Apr 10, 2025, 10:49 AM

#

quasi shadow A calculate which always outputs zero is useless.

Using more realistic settings will likely change it
I also have my idea with using the average R over the next year or two instead of R at the end of the simulation, to bump up the output (#1282005522513530952 message)

cosmic hedge Apr 10, 2025, 10:55 AM

#

unique salmon <@388069992660205588> sorry for frequent pings guys, but I want to ask - do you ...

Would the next gen CMRR fix it at all?

#

#1282005522513530952 message does this not help enough btw?

unique salmon Apr 10, 2025, 10:55 AM

#

cosmic hedge https://discord.com/channels/368267295601983490/1282005522513530952/135944899943...

It's not that it doesn't help, it's that it doesn't make sense

#

Next gen CMRR would certainly be more realistic, though that doesn't automatically guarantee that it won't always output 70%

#

But it's definitely more realistic than assuming that deck_size = 10*days_to_simulate and an infinite number of new cards that can be learned per day

cosmic hedge Apr 10, 2025, 10:57 AM

#

unique salmon But it's definitely more realistic than assuming that `deck_size = 10*days_to_si...

yeah always thought that was odd but figured it was just magic or something XD

cosmic hedge Apr 10, 2025, 10:57 AM

#

unique salmon It's not that it doesn't help, it's that it doesn't make sense

why doesn't it make sense? catch me up pls

unique salmon Apr 10, 2025, 10:57 AM

#

cosmic hedge why doesn't it make sense? catch me up pls

It only works if cost is defined as "time per review at R=90%", but it's not

#

It's adding apples to oranges

cosmic hedge Apr 10, 2025, 10:58 AM

#

ahh I suppose so

unique salmon Apr 10, 2025, 10:59 AM

#

So, are you up to the task?

cosmic hedge Apr 10, 2025, 10:59 AM

#

what the next gen CMRR

#

sure why not

#

i hope XD

unique salmon Apr 10, 2025, 11:00 AM

#

https://tenor.com/view/yes-yes-sir-yayy-kataman-gif-12260883688244422951

Tenor

#

90% of the work is just reusing the simulator code, literally

cosmic hedge Apr 10, 2025, 11:00 AM

#

yep

unique salmon Apr 10, 2025, 11:00 AM

#

I'll write you a detailed spec later

cosmic hedge Apr 10, 2025, 11:00 AM

#

i hope XD

#

you dont really need to

unique salmon Apr 10, 2025, 11:01 AM

#

Btw, why did you close the PR with the "smooth" button?

cosmic hedge Apr 10, 2025, 11:01 AM

#

dae said he didnt want it

unique salmon Apr 10, 2025, 11:01 AM

#

FeelsBadAnki

#

Will it remain like this?

cosmic hedge Apr 10, 2025, 11:02 AM

#

yep

unique salmon Apr 10, 2025, 11:02 AM

#

God damn it man

#

This is so ass

cosmic hedge Apr 10, 2025, 11:02 AM

#

its fine its not a huge issue XD

unique salmon Apr 10, 2025, 11:02 AM

#

All these settings except for Smooth Graph affect scheduling and are real settings that you can find in deck options. So grouping Smooth Graph - which only affects the plotting - with real settings seems like a bad UI to me.

#

User's shouldn't have to play "pick the odd one out"

cosmic hedge Apr 10, 2025, 11:04 AM

#

I think the hint would be it affecting the graphs which already exist

#

if someone did assume the button affected the actual results what would be the problem?
i suppose it belongs in advanced settings because it's a setting that very few people will need to touch.

unique salmon Apr 10, 2025, 11:11 AM

#

cosmic hedge if someone did assume the button affected the actual results what would be the p...

if someone did assume the button affected the actual results what would be the problem?
That they would look for it in deck options and never find it

cosmic hedge Apr 10, 2025, 11:12 AM

#

unique salmon > if someone did assume the button affected the actual results what would be the...

feel free to @ me when someone's looking for the "smooth graph" deck option 🤣

unique salmon Apr 10, 2025, 11:12 AM

#

Lol, alright

#

I mean, even if it creates confusion that doesn't last long, it still creates >0 confusion

cosmic hedge Apr 10, 2025, 11:15 AM

#

I don't think people expect 0 confusion when they open the "advanced settings"

unique salmon Apr 10, 2025, 11:16 AM

#

Another matter: https://forums.ankiweb.net/t/desired-retention-ui-overhaul/57678/33?u=expertium
Will you add this to your ever-growing list of "suggestions that Expertium pings me about every day?" 🤣

Anki Forums

Desired Retention UI Overhaul

Ok, how about an idea suggested by Brayan: answer buttons that show interval lengths The interval lengths above answer buttons would change instantly when desired retention is changed More from Brayan: put the fsrs parameters at the bottom of the FSRS section and add some title to the “query input” (idk what is called the form below...

cosmic hedge Apr 10, 2025, 11:16 AM

#

unique salmon Another matter: https://forums.ankiweb.net/t/desired-retention-ui-overhaul/57678...

I had an idea with that but realised it was just your original one flipped sideways XD

unique salmon Apr 10, 2025, 11:17 AM

#

cosmic hedge I had an idea with that but realised it was just your original one flipped sidew...

According to David, some people don't understand graphs

#

Like, graphs short-circuit their brains

cosmic hedge Apr 10, 2025, 11:17 AM

#

the horror 😔

unique salmon Apr 10, 2025, 11:17 AM

#

And IMO, the idea with answer buttons is just very neat and clear

#

We show users what they have already seen before - answer buttons with interval lengths

#

Instead of something completely unfamiliar

#

#

Idk how hard it would be to implement

cosmic hedge Apr 10, 2025, 11:22 AM

#

unique salmon Idk how hard it would be to implement

it would just be weights 1-4 run through the forgetting curve right?

unique salmon Apr 10, 2025, 11:32 AM

#

cosmic hedge it would just be weights 1-4 run through the forgetting curve right?

Yes, just the first 4 params multiplied by a coefficient that depends on DR. Plus learning steps

#

Should we display fuzz, though? That's the issue

cosmic hedge Apr 10, 2025, 11:35 AM

#

unique salmon Should we display fuzz, though? That's the issue

Since i'd be the one implementing it apparently and it would be easier not to then no XD

unique salmon Apr 10, 2025, 12:10 PM

#

cosmic hedge you dont really need to

📎 Next_gen_CMRR.docx

quasi shadow Apr 10, 2025, 12:34 PM

#

unique salmon

15m for again? So it also considers the learning steps?

lapis hearth Apr 10, 2025, 12:40 PM

#

unique salmon

I agree this is a good idea

cosmic hedge Apr 10, 2025, 12:57 PM

#

quasi shadow 15m for again? So it also considers the learning steps?

https://discordapp.com/channels/368267295601983490/1282005522513530952/1359853548669636788 He thinks so, but I don't think it should?

bold terrace Apr 10, 2025, 1:01 PM

#

Eeeeereh with decay -0.2, the power_forgetting_curve still has a value of 20% around 1000d for a stability of 1d ...

#

I'm sorry but the integral stuff sound fishy

#

Writing it in in a word document doesn't make it less fishy

#

And randomly chosing the avg retention over 5y to compensate for a bad function just also feel like you just deny everything else that you came up yourself @unique salmon

unique salmon Apr 10, 2025, 1:07 PM

#

quasi shadow 15m for again? So it also considers the learning steps?

Yes

unique salmon Apr 10, 2025, 1:07 PM

#

cosmic hedge https://discordapp.com/channels/368267295601983490/1282005522513530952/135985354...

No, it should display learning steps

#

Otherwise users will be like "WHERE ARE MAH LEARNIN' SHTEPSH?!?!?!"

unique salmon Apr 10, 2025, 1:09 PM

#

bold terrace And randomly chosing the avg retention over 5y to compensate for a bad function ...

It makes sense though. As I wrote, and as you yourself said many times, we care not only about how much we know at a specific point in time, but also about how slowly that knowledge is forgotten

#

It definitely makes way more sense than arbitrary f(S)

bold terrace Apr 10, 2025, 1:09 PM

#

unique salmon It makes sense though. As I wrote, and as you yourself said many times, we care ...

That's not what I'm arguing, I'm arguing about the integral usage 🙂

#

Well, it make no sense if the forgetting curve can't be trusted for extreme value, which it can't when I see that after 1000 days, a S=1d card will translate into a 20% probability

#

So f(S) make more sense if it goes from 0 to 1 in a lapse of time that we can interpret as "acquired"

unique salmon Apr 10, 2025, 1:11 PM

#

bold terrace Well, it make no sense if the forgetting curve can't be trusted for extreme valu...

Btw @polar maple that was also another reason why I was against the new decay

#

It forbids low R

#

Anyway, with the new decay we could either scrap CMRR (as Jarrett wants) and leave users forever wondering what is the best value of DR, or we could try to save CMRR somehow (as I want) to give users some answer

bold terrace Apr 10, 2025, 1:15 PM

#

unique salmon Anyway, with the new decay we could either scrap CMRR (as Jarrett wants) and lea...

Sure but it doesn't have to be "all your way" or "nothing

unique salmon Apr 10, 2025, 1:15 PM

#

bold terrace Sure but it doesn't have to be "all your way" or "nothing

Anyone is free to suggest changes to CMRR

bold terrace Apr 10, 2025, 1:19 PM

#

Also I think decoupling CMRR evaluation function with a f(S) than just reusing the same forgetting curve, would allow to have more like a discriminant that is not poisoned by artifacts from FSRS (the fact that extremely R won't drop below 1% for example)

unique salmon Apr 10, 2025, 1:19 PM

#

bold terrace Also I think decoupling CMRR evaluation function with a f(S) than just reusing t...

I'm not sure what you mean

cosmic hedge Apr 10, 2025, 1:19 PM

#

unique salmon Otherwise users will be like "WHERE ARE MAH LEARNIN' SHTEPSH?!?!?!"

yeah but the user knows their learning steps so its not as helpful

unique salmon Apr 10, 2025, 1:20 PM

#

cosmic hedge yeah but the user knows their learning steps so its not as helpful

If the user sees the learning steps above the real buttons but not above the fake buttons, don't you think that's confusing?

#

Like, I immediately think "that's hella confusing"

bold terrace Apr 10, 2025, 1:20 PM

#

What I mean is the fact that if the forgettive_curve was more realistic (meaning a S=1d card would have its R really really low after a few weeks), then there would be a no chance a card with S=1d would already have a 0.40 score.

#

I still think FSRS good prediction is based on the R it was trained on, and not based on the quality of its forgetting curve

#

So using the curve as a way to find the evaluation function of "how good a S is" feels wrong

cosmic hedge Apr 10, 2025, 1:23 PM

#

unique salmon If the user sees the learning steps above the real buttons but not above the fak...

aww and it doesnt count intra-day reviews changing the stability either

#

i say just give up and stick a link to the visualiser there

unique salmon Apr 10, 2025, 1:23 PM

#

cosmic hedge i say just give up and stick a link to the visualiser there

This one?
https://open-spaced-repetition.github.io/anki_fsrs_visualizer/
Please tell me you're joking

unique salmon Apr 10, 2025, 1:23 PM

#

cosmic hedge aww and it doesnt count intra-day reviews changing the stability either

Screw it, I think it's fine

#

Just display what the user would normally see with these parameters and learning steps above his real buttons

#

As long as what the user sees above the fake buttons is the same as what he sees above the real buttons, we're good

cosmic hedge Apr 10, 2025, 1:25 PM

#

unique salmon This one? https://open-spaced-repetition.github.io/anki_fsrs_visualizer/ Please ...

yeah im joking XD

cosmic hedge Apr 10, 2025, 1:25 PM

#

unique salmon Just display what the user would normally see with these parameters and learning...

so then if they have more than 1 learning step the only button thats going to show anything fsrs related is easy?

unique salmon Apr 10, 2025, 1:26 PM

#

cosmic hedge so then if they have more than 1 learning step the only button thats going to sh...

Sadly, yes

#

Thank Dae for making learning steps a nightmare

#

And making the entire scheduling system janky

unique salmon Apr 10, 2025, 1:35 PM

#

bold terrace Eeeeereh with decay -0.2, the power_forgetting_curve still has a value of 20% ar...

Btw, I'm starting to wonder if maybe the benchmark is fundamentally flawed, in the sense that because there is less data at lower retentions than at higher retentions, FSRS just adapts to higher retentions, and there is no way to make it stop doing that without making the metrics worse
@quasi shadow @polar maple

#

And if we agree that some change that makes metrics worse is in some sense "better", then...what's the point of the benchmark?

#

I guess we could solve it by asking Dae to make another 10k dataset, but this time make it have a uniform distribution of retentions by cherry-picking users with all kinds of retentions
Jarrett, I hope you agree that this is a good idea

#

So that there is more or less the same amount of data for all retentions

#

Like my uniform dataset with 100 users, but 100 times larger

#

Otherwise I can't think of any way to prevent overfitting to higher retentions

lapis hearth Apr 10, 2025, 1:49 PM

#

unique salmon And if we agree that some change that makes metrics worse is in some sense "bett...

I was questioning the integrity of the benchmark not so long ago. You told me RMSE correlates heavily with Retention, if I am not mistaken

#

Though I still do (regarding having FSRS-sec)

unique salmon Apr 10, 2025, 1:49 PM

#

lapis hearth I was questioning the integrity of the benchmark not so long ago. You told me RM...

Log-loss correlates with retention, RMSE with the number of reviews

lapis hearth Apr 10, 2025, 1:49 PM

#

unique salmon Log-loss correlates with retention, RMSE with the number of reviews

Yes that

unique salmon Apr 10, 2025, 1:50 PM

#

But that's a different problem, not exactly what I'm talking about above

lapis hearth Apr 10, 2025, 1:50 PM

#

You want to benchmark the benchmark

unique salmon Apr 10, 2025, 1:51 PM

#

What I'm talking about is that regardless of which metrics we use, we will end up overfitting to high retentions because most of the data is in the >50% retention range, with very few users with <50% retention
So a change that makes the forgetting curve less realistic, like the whole "A card will never reach probability of recall of 10%" thing, might look good on paper (uh, on the monitor)

lapis hearth Apr 10, 2025, 1:51 PM

#

unique salmon But that's a different problem, not exactly what I'm talking about above

I know. I am just saying the benchmark might not be 100% perfect

#

But Jarrett surely knows his benchmark

lapis hearth Apr 10, 2025, 1:52 PM

#

unique salmon What I'm talking about is that regardless of which metrics we use, we will end u...

Do you want to change the benchmark for something else

unique salmon Apr 10, 2025, 1:52 PM

#

unique salmon I guess we could solve it by asking Dae to make another 10k dataset, but this ti...

👆

#

Analogy: think of it as making an artificial city where the number of millionaires is the same as the number of poor people

lapis hearth Apr 10, 2025, 1:53 PM

#

Use me

unique salmon Apr 10, 2025, 1:53 PM

#

Nah, it would be anonymous

#

Dae would open his secret vault where he keeps user data 🤣

lapis hearth Apr 10, 2025, 1:54 PM

#

Who would care. Make me an honorary specimen

unique salmon Apr 10, 2025, 2:13 PM

#

There is another way, which is a lot more arbitrary and dumb but doesn't require getting a new dataset.
After optimizing FSRS on all 10k users, we calculate the final metric as a weighted average where weights are proportional to retention in a certain way. Specifically, we put all users into 40 categories: retention between 100% and 97.5%, retention between 97.5% and 95%, retention between 95% and 92.5%, etc.
Then we count how many users fall into each category. Then, when calculating the final log-loss and RMSE over the entire dataset, the user is weighted inversely proportional to the number of users in his "retention class".
What this means is that if someone has a retention of 90%, his weight will be lower because that's common. If someone has a retention of 10%, his weight will be huge because that's uncommon. So we assign more weight to people with uncommon values of retention and less weight to people with common retentions.

#

This doesn't even require re-running the algorithms in the benchmark, just re-calculating the final average across all collections

#

So we could do it like...today

#

@polar maple it's inspired by the approach of giving more weight to rare classes in classification problems on imbalanced datasets

#

Except that we're just making the classes up 🤣

#

I'm fully expecting that with this change FSRS-6 will look worse than FSRS-5

#

Because FSRS-6 has a curve that doesn't fit people at low retentions

#

Getting an actual uniform dataset would be a lot better, though

quasi shadow Apr 10, 2025, 2:44 PM

#

unique salmon I'm fully expecting that with this change FSRS-6 will look worse than FSRS-5

I will evaluate FSRS-6 in this way tomorrow.

#

The only thing I need to do is calculating the retention and saving it in the result with the metric, right?

#

Then we can compare algorithms in each retention level.

unique salmon Apr 10, 2025, 2:52 PM

#

quasi shadow The only thing I need to do is calculating the retention and saving it in the re...

Split users based on their retention (exclude same-day reviews and the first review) into sufficiently many groups, like 20 or 40
Calculate 1/n(users in the group) for each group
Calculate weighted average log-loss, RMSE(bins) and AUC across all 10k users, where each user has a weight of 1/n(users in the group), depending on which group he belongs to

Example: suppose there are 1000 users in the 90%-92.5% group. So if a user's retention is 91%, his weight is 1/1000

severe storm Apr 10, 2025, 3:27 PM

#

Is there a way to (correctly) guess how long it take for True retention: to move closer to desired retention?
My case: I have used anything between 72%-85% desired retention (mostly on the lower end) for at least 6 months (with change all cards on schedule), but like last week I have turned it up to 90% desired retention (without change all cards on schedule).

#

not really a problem just curious

bold terrace Apr 10, 2025, 3:28 PM

#

severe storm Is there a way to (correctly) guess how long it take for True retention: to move...

It can be as quick as "Reschedule all your card and do your backlog right now" or as long as "You'll have to gradually review cards scheduled with your old DR and they will be rescheduled with the new one"

#

A compromise is with FSRS Helper Addon, only reschedule the further away cards

severe storm Apr 10, 2025, 3:30 PM

#

I got 1000 cards 🥴 "due" if I take the quick rout haha. But I think I can "endure" true retention < desired retention for a while

bold terrace Apr 10, 2025, 3:30 PM

#

You can always do the compromise if you want to speed up a bit things 🙂

#

You can do a tiny batch per day

#

taking only the one only in far far future

severe storm Apr 10, 2025, 3:31 PM

#

That will mess with me mentally I think haha

#

But I think I have figured it out.
As you said, it will take as long as the amount of days that most of my cards have with old DR. I think looking at the review intervals tab on "stats" might help.
If I look at where the cumulative 50% & 80% (randomly taken) is, it might give me some sort of idea.
running total of 50% is @ 56-60 days review interval, and running total 80% is @ 152-155 days review interval. So I would guess it's somewhere @ 120. Because during this time I am also new cards etc

polar maple Apr 10, 2025, 4:13 PM

#

bold terrace Eeeeereh with decay -0.2, the power_forgetting_curve still has a value of 20% ar...

yeah a problem is that in the data that we have, we cannot distinguish users who study purely using anki and users who just happen to have some of their knowledge in anki and end up studying elsewhere like with language immersion or school. So, it could be true that the forgetting curve for the average user just looks something like that, never going to 0

polar maple Apr 10, 2025, 4:14 PM

#

unique salmon Btw, I'm starting to wonder if maybe the benchmark is fundamentally flawed, in t...

at least for evaluation theres still a decent amount of data at low DR, e.g. this calibration chart for the first 500 users

unique salmon Apr 10, 2025, 4:16 PM

#

polar maple at least for evaluation theres still a decent amount of data at low DR, e.g. thi...

I mean like, if most users have retention around 90%, it means that if we try to find decay that provides the best metrics, it will be whatever decay is best for 90% retention

#

So I think we REALLY should ask Dae to make a uniform dataset

#

Or do the thing I described above, which is worse

polar maple Apr 10, 2025, 4:17 PM

#

also was just showing that RWKV's flatter forgetting curves still achieves good calibration on low R so its not necessarily a big issue to have a flat forgetting curves

unique salmon Apr 10, 2025, 4:20 PM

#

I'd like to see the calibration graph for FSRS-5 and 6

polar maple Apr 10, 2025, 4:20 PM

#

FSRS-5-recency, first 500 users, 0.5 decay

#

i don't have one for FSRS-6

unique salmon Apr 10, 2025, 4:21 PM

#

polar maple i don't have one for FSRS-6

Just change decay to -0.2 and run it
There is a new parameter for same-day reviews, but screw it

polar maple Apr 10, 2025, 4:21 PM

#

K

unique salmon Apr 10, 2025, 4:22 PM

#

Oh, and I mean "run the optimizer", not "run it with the same parameters as for the old curve"

#

So you'll have to optimize parameters for the new curve

unique salmon Apr 10, 2025, 4:24 PM

#

polar maple FSRS-5-recency, first 500 users, 0.5 decay

Also, this looks like a mess. Didn't we remove some of the curves?

polar maple Apr 10, 2025, 4:24 PM

#

yeah i still have the old code

#

at least that plot was generated way back then

unique salmon Apr 10, 2025, 4:25 PM

#

ok

polar maple Apr 10, 2025, 4:31 PM

#

quasi shadow It's the distribution of trainable decay.

was this done with the 5-way split on the 10k dataset? if so, does it improve test performance?

quasi shadow Apr 10, 2025, 5:11 PM

#

polar maple was this done with the 5-way split on the 10k dataset? if so, does it improve te...

yep

quasi shadow Apr 10, 2025, 5:12 PM

#

polar maple was this done with the 5-way split on the 10k dataset? if so, does it improve te...

https://github.com/open-spaced-repetition/srs-benchmark/blob/Expt/trainable-forgetting-decay/result/FSRS-5-dev.jsonl

GitHub

srs-benchmark/result/FSRS-5-dev.jsonl at Expt/trainable-forgetting-...

A benchmark for spaced repetition schedulers/algorithms - open-spaced-repetition/srs-benchmark

unique salmon Apr 10, 2025, 5:15 PM

#

quasi shadow yep

Maybe we should do that instead of the fixed decay then?
The problem is estimation of S0. You need to know decay in advance to accurately estimate S0. Maybe do what you did, and then do a second optimization with fixed decay that was found during the first optimization?

#

It's 2x slower, but should work better

polar maple Apr 10, 2025, 5:16 PM

#

quasi shadow https://github.com/open-spaced-repetition/srs-benchmark/blob/Expt/trainable-forg...

how much better is it? idk which file to compare it to

unique salmon Apr 10, 2025, 5:30 PM

#

unique salmon It's 2x slower, but should work better

The more I think about it, the more I think that's the best course of action
If we make decay trainable, that alleviates the problem that different values of decay are better at different retentions, which is what we have been arguing about all the time. And it should be more accurate than any fixed value of decay. The problem is S0. Actually, even other parameters may (and likely will) still have different values depending on the choice of decay. I don't think FSRS params are "decay-agnostic", though I don't have a solid proof of that.
So the solution is to run optimization twice: once with variable decay, to find which value of decay is good, and the second time with fixed decay from the first run, to fine-tune the parameters

#

We could only run it once if parameters are "decay-agnostic", but again, I doubt that they are
By "decay-agnostic" I mean "parameters will converge to the same values regardless of the choice of decay"

polar maple Apr 10, 2025, 5:39 PM

#

@unique salmon same 500 users

unique salmon Apr 10, 2025, 5:42 PM

#

polar maple <@530106856593424407> same 500 users

Well, color me green and call me a pickle

#

It actually looks good below 50%

#

Somehow

polar maple Apr 10, 2025, 5:43 PM

#

not too surprising given the RWKV curves

#

but where are my confidence intervals?

#

i updated fsrs-optimizer

#

i thought you added confidence intervals or something

unique salmon Apr 10, 2025, 5:44 PM

#

I thought so too 😅

polar maple Apr 10, 2025, 5:44 PM

#

the update must've failed or something, you mentioned you removed some lines but i think i can still see them all

polar maple Apr 10, 2025, 6:06 PM

#

@unique salmon the swap at p=0.45 is interesting

#

i think ill get another 500 users to see if this repeats

unique salmon Apr 10, 2025, 6:59 PM

#

polar maple <@530106856593424407> the swap at p=0.45 is interesting

https://github.com/open-spaced-repetition/fsrs-optimizer/pull/169#issuecomment-2794715383
Mind voicing your thoughts? Or giving me a thumbs up, that works too 🤣

GitHub

Feat/FSRS-6 by L-M-Sherlock · Pull Request #169 · open-spaced-rep...

candidate for FSRS-6
Log Loss: 0.3273 -> 0.3257 (-0.0016)
RMSE(bins): 0.0518 -> 0.0510 (-1.5%)
Model: FSRS-5-dev
Total number of users: 9999
Total number of reviews: 349923850
Weighte...

#

I don't see any problems with my idea, aside from making optimization two times slower

polar maple Apr 10, 2025, 7:02 PM

#

unique salmon https://github.com/open-spaced-repetition/fsrs-optimizer/pull/169#issuecomment-2...

is S0 even a big issue anymore? isn't S0 already a learnable parameter after the initial estimation? so it doesn't really matter if we initially estimate it with a different decay, as long as the optimization process will still move S0 to a good value

unique salmon Apr 10, 2025, 7:02 PM

#

polar maple is S0 even a big issue anymore? isn't S0 already a learnable parameter after the...

The problem is that if we change decay, we have to re-estimate parameters

polar maple Apr 10, 2025, 7:03 PM

#

i don't see why we need to fix decay for a second optimization and why this would necessarily benefit over just a joint optimization of all parameters at once

unique salmon Apr 10, 2025, 7:03 PM

#

I mean, I guess we could just double the number of epochs?

polar maple Apr 10, 2025, 7:03 PM

#

you could do that, idk, or just leave it as-is

#

i did check that increasing epochs does improve performance a bit but i think this is already a tradeoff that jarrett has decided on

polar maple Apr 10, 2025, 7:22 PM

#

@unique salmon on users 501-1000, looks like theres an actual pattern

unique salmon Apr 10, 2025, 7:23 PM

#

polar maple <@530106856593424407> on users 501-1000, looks like theres an actual pattern

https://tenor.com/view/pepe-nervous-sweating-concerned-monkas-gif-15154684

Tenor

unique salmon Apr 10, 2025, 7:29 PM

#

polar maple was this done with the 5-way split on the 10k dataset? if so, does it improve te...

Model: FSRS-5
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3273±0.1525
FSRS-5 RMSE(bins) (mean±std): 0.0518±0.0332

Model: FSRS-5-recency
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5-recency LogLoss (mean±std): 0.3256±0.1519
FSRS-5-recency RMSE(bins) (mean±std): 0.0493±0.0321

Model: FSRS-5-dev (optimizable decay)
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5-dev LogLoss (mean±std): 0.3220±0.1488
FSRS-5-dev RMSE(bins) (mean±std): 0.0466±0.0290

bold terrace Apr 10, 2025, 7:29 PM

#

If it's part of the cost function, can't it be optimized at the same time ?

polar maple Apr 10, 2025, 7:30 PM

#

unique salmon Model: FSRS-5 Total number of users: 9999 Total number of reviews: 349923850 Wei...

i think FSRS-5 and FSRS-5-recency here uses decay=0.5, what i want is a comparison between optimizable decay and decay=0.2

unique salmon Apr 10, 2025, 7:30 PM

#

polar maple i think FSRS-5 and FSRS-5-recency here uses decay=0.5, what i want is a comparis...

Well, can't do that

polar maple Apr 10, 2025, 7:30 PM

#

just for completeness here is the combined chart for users 1-1000

unique salmon Apr 10, 2025, 7:30 PM

#

I mean, I can, but it will take a ton of time

unique salmon Apr 10, 2025, 7:31 PM

#

bold terrace If it's part of the cost function, can't it be optimized at the same time ?

What do you mean?

polar maple Apr 10, 2025, 7:31 PM

#

unique salmon I mean, I can, but it will take a ton of time

yeah, but jarrett probably has the results somewhere since he did benchmarking for FSRS-6

#

and i cant be sure that optimizable decay is with all other parameters equal with FSRS-6

#

so doesn't hurt to just ask directly

unique salmon Apr 10, 2025, 7:32 PM

#

polar maple and i cant be sure that optimizable decay is with all other parameters equal wit...

Optimizable is pretty much guaranteed to be better than fixed

polar maple Apr 10, 2025, 7:32 PM

#

unless it overfits too much

unique salmon Apr 10, 2025, 7:32 PM

#

polar maple and i cant be sure that optimizable decay is with all other parameters equal wit...

This is all Jarrett posted

bold terrace Apr 10, 2025, 7:48 PM

#

unique salmon What do you mean?

The fact the decay need to be trained before training params, if the decay is part of the forgetting curve can't it be optimized at the same time ?

#

gradient descent and doing the derivate of the forgetting curve by the decay

unique salmon Apr 10, 2025, 7:51 PM

#

bold terrace The fact the decay need to be trained before training params, if the decay is pa...

Uh, it's complicated.
It needs to be fixed for the first 4 parameters, since they are estimated separately. For other parameters, as I said, they likely depend on the value of decay, so if you change decay, optimal parameters will no longer be optimal
BUT
In FSRS-5 the first 4 parameters are also optimized via gradient descent after they are estimated initially. So now the only problem is that parameters that are optimal at one value of decay are not optimal at the other. But running the optimizer for more epochs will likely solve it. After each epoch the change of the decay parameter will be smaller and smaller

#

If you want to ask "If we can optimize decay, why are even bothering with fixed decay?" - I have no idea 🤣

#

Jarrett just decided to use fixed decay for...reasons that I don't know

bold terrace Apr 10, 2025, 7:53 PM

#

Sure, if the decay get optimized it might/will change the value of those 4, but if I remember correctly the few lessons I did with gradient descent, you do the derivative of the cost function for every parameter, and you "glide the slope" of all those dimensions until you reach a minimum

#

so you would glide that bias and those 4 parameters, leading you to the point they would balance themselves out ?

unique salmon Apr 10, 2025, 7:53 PM

#

unique salmon Jarrett just decided to use fixed decay for...reasons that I don't know

And I somehow forgot about his experiments with optimizable decay

#

And now I'm like "Wait, why are we doing fixed decay again?"

#

Idk, maybe we all collectively had a brain fart

unique salmon Apr 10, 2025, 7:54 PM

#

bold terrace Sure, if the decay get optimized it might/will change the value of those 4, but ...

Pretty much

bold terrace Apr 10, 2025, 7:55 PM

#

But maybe there are reasons I don't see why it is better fixed

unique salmon Apr 10, 2025, 7:56 PM

#

I don't either

#

It improves metrics and is more adaptive than fixed (well, duh, obviously)

unique salmon Apr 10, 2025, 7:57 PM

#

polar maple unless it overfits too much

See #1282005522513530952 message
It doesn't, or at least not enough to become a problem

#

It wouldn't improve log-loss and RMSE if it overfitted a lot

#

Though I still think we should choose a reasonable range for it, not just (0.01, 1)

#

According to this graph, I'd say 0.1-0.7 is reasonable
Green is just me trying to eyeball the best fit

polar maple Apr 10, 2025, 7:59 PM

#

unique salmon It wouldn't improve log-loss and RMSE if it overfitted a lot

but we cannot compare learnable decay to decay = 0.5 when decay = 0.2 does way better on the metrics already

small crow Apr 10, 2025, 8:00 PM

#

Why did this card go from 48% to 100% after the manual reschedule? It's a card that I got right so I feel tike difficulty actually is closer to the beginning 50% so i just reset the card as a knee-jerk recation and rated it again.

polar maple Apr 10, 2025, 8:00 PM

#

unique salmon According to this graph, I'd say 0.1-0.7 is reasonable Green is just me trying t...

jarrett already posted a distribution of learnt decay values, isn't that one not bounded to (0.01, 2)?

bold terrace Apr 10, 2025, 8:03 PM

#

Thing that always bugs me a bit is, sure we get a good decay that should fit most people with this, but just like default params won't be ideal for an individual, I guess decay should also behave the same way isn't it ?

unique salmon Apr 10, 2025, 8:15 PM

#

polar maple jarrett already posted a distribution of learnt decay values, isn't that one not...

Yeah, my bad, it's (0.01, 1) in the code. I think we should change it to (0.1, 0.7)

unique salmon Apr 10, 2025, 8:15 PM

#

small crow Why did this card go from 48% to 100% after the manual reschedule? It's a card t...

Have you optimized parameters inbetween these two reviews?

#

Though, 100% D with no lapses is strange either way

unique salmon Apr 10, 2025, 8:17 PM

#

polar maple but we cannot compare learnable decay to decay = 0.5 when decay = 0.2 does way b...

Ah. Yeah, we're just waiting for Jarrett to post full benchmark results

unique salmon Apr 10, 2025, 8:19 PM

#

bold terrace Thing that always bugs me a bit is, sure we get a good decay that should fit mos...

Are you saying that optimizable decay is better than fixed? If so, then yes, I think so too. Though we'll have to see benchmark results to be sure

small crow Apr 10, 2025, 8:22 PM

#

unique salmon Have you optimized parameters inbetween these two reviews?

I had, but I've not been using the "reschedule cards on change" cards in the deck options for that card's deck (that automagically puts those there I think), instead using FSRS helper to do so and then catching up on lapses. Is there a way to look for other cards that were rescheduled that day with that manual reschedule to see if they've also had that happen to them?

bold terrace Apr 10, 2025, 8:24 PM

#

small crow Why did this card go from 48% to 100% after the manual reschedule? It's a card t...

You can share your parameters, and also screenshot the graph of "Card Difficulty" in the stats view

#

But my gut feeling is that you're very new to Anki and I guess the optimizer didn't bother really put much different difficulty, just put everything in one big basket

small crow Apr 10, 2025, 8:26 PM

#

bold terrace You can share your parameters, and also screenshot the graph of "Card Difficulty...

how do you get to the card difficulty graph of a single card? I know how to look at it for decks, but didn't know it was possible to do so for a single card.

bold terrace Apr 10, 2025, 8:26 PM

#

for the deck I mean

#

here the info is pretty simple, the card never failed, was ~50% D before, it's 100% now

#

but then I would expect all your cards to be at 100%

small crow Apr 10, 2025, 8:29 PM

#

FSRS5 parameters with DR@92%:
0.9842, 8.0109, 41.2131, 100.0000, 7.3324, 0.5695, 1.7045, 0.0010, 1.3330, 0.3374, 0.8130, 1.9629, 0.1152, 0.3734, 2.2973, 0.1129, 3.0047, 0.4220, 0.7896

with 747 cards in the deck

bold terrace Apr 10, 2025, 8:31 PM

#

Hmm

small crow Apr 10, 2025, 8:31 PM

#

yeah, i don't get it

bold terrace Apr 10, 2025, 8:31 PM

#

Yeah no having 100% D doesn't make sense for that card

small crow Apr 10, 2025, 8:32 PM

#

bold terrace Yeah no having 100% D doesn't make sense for that card

I think it has to do with some shenanigans I did described in this thread
https://discord.com/channels/368267295601983490/1350593441654116463

bold terrace Apr 10, 2025, 8:33 PM

#

Did you try to reschedule it with the FSRS plugin ?

small crow Apr 10, 2025, 8:33 PM

#

yeah, it didn't budge the difficulty, interval, or due date when trying to rechedule with the right-click context menu or the reschedule all cards option

bold terrace Apr 10, 2025, 8:34 PM

#

With your param, the 27d interval / 48% D seems to be the correct values

#

You can't right click -> Forget -> set Due Date 0 and re-review it ?

small crow Apr 10, 2025, 8:35 PM

#

which is what made me super curious about why it jumped to 100% difficulty. you don't know of a way to specifically search for manually rescheduled cards on that day to see if it happened to some others, do you?

bold terrace Apr 10, 2025, 8:35 PM

#

hmmmmm

#

no but I would maybe just search for all cards with high D with no review failed

#

-rated:180:1 prop:d>0.99

#

something like that

small crow Apr 10, 2025, 8:37 PM

#

bold terrace You can't right click -> Forget -> set Due Date 0 and re-review it ?

this was my quick solution, yeah. But then i got worried that there's 50 other cards with a higher than intended difficulty because of whatever caused this to be 100% difficulty

#

ah, not the only card

bold terrace Apr 10, 2025, 8:40 PM

#

did you try the right click -> recompute memory state ?

#

or did you just did the deck -> reschedule

small crow Apr 10, 2025, 8:41 PM

#

taht's the same as "update memory state and rechedule" right?

bold terrace Apr 10, 2025, 8:41 PM

#

I'm not entirely sure

small crow Apr 10, 2025, 8:41 PM

#

yeah, no dice.

bold terrace Apr 10, 2025, 8:41 PM

#

I know in the past when some bugs happened, the memory state had to be refreshed

unique salmon Apr 10, 2025, 8:42 PM

#

Out of curiousity, try changing the last digit of any parameter, like from 1.2345 to 1.2346, just to recalculate memory states, and check that card again

small crow Apr 10, 2025, 8:43 PM

#

I'm like 99% sure it's because these cards had a review history before 3/15, then got reset through the Cards->Reset function in the card browser as the cards i found that have this issue are like that, lol.

small crow Apr 10, 2025, 8:43 PM

#

unique salmon Out of curiousity, try changing the last digit of any parameter, like from 1.234...

i'm not sure if this'll change anythign because i've been opitimizing parameters at least once a month since then, but lemme try it

#

also no dice :c

unique salmon Apr 10, 2025, 8:45 PM

#

small crow also no dice :c

https://forums.ankiweb.net/c/anki/fsrs/19
Make an issue on the forum

Anki Forums

FSRS

A place to discuss the Free Spaced Repetition Scheduler (FSRS), introduced in Anki 23.10.

small crow Apr 10, 2025, 8:45 PM

#

I did use the "rechedule on change" option, lemme try something else.

bold terrace Apr 10, 2025, 8:47 PM

#

Maybe reseting repetitions/lapse could maybe help but not sure

#

really sounds like a very tricky/specific issue

small crow Apr 10, 2025, 8:49 PM

#

unique salmon https://forums.ankiweb.net/c/anki/fsrs/19 Make an issue on the forum

should i export the deck and attach it to the post? What other stuff should i put in there, other than some screenshots, the parameters, maybe a link to the shenanigans I did that day

small crow Apr 10, 2025, 9:18 PM

#

oh LOL it won't let me included links in my post ㅠㅠ

unique salmon Apr 10, 2025, 9:19 PM

#

small crow should i export the deck and attach it to the post? What other stuff should i pu...

Screenshots, parameters (in text) and a short description

small crow Apr 10, 2025, 9:28 PM

#

Also, is the intervals on those cards becoming lower even while passing reviews...supposed to happen? I just just noticed after making the post, lol.

unique salmon Apr 10, 2025, 9:33 PM

#

small crow Also, is the intervals on those cards becoming lower even while passing reviews....

If you have changed parameters inbetween reviews, yes, it could happen

unique salmon Apr 10, 2025, 10:00 PM

#

@polar maple I'm benchmarking decay=-0.5 vs decay=-0.2 vs optimizable decay within the (0.1, 0.7) range, and the optimizable one is like baaarely better. There is a clear difference between decay=-0.5 vs decay=-0.2, but not much difference between decay=-0.2 vs opt. decay.

Model: FSRS-6 (opt decay)
Total number of users: 102
Total number of reviews: 2123285
Weighted average by reviews:
FSRS-6 LogLoss (mean±std): 0.3731±0.1780
FSRS-6 RMSE(bins) (mean±std): 0.0665±0.0355

Model: FSRS-5 (decay=-0.2)
Total number of users: 102
Total number of reviews: 2123285
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3733±0.1780
FSRS-5 RMSE(bins) (mean±std): 0.0666±0.0355

Model: FSRS-5 (decay=-0.5)
Total number of users: 102
Total number of reviews: 2123285
Weighted average by reviews:
FSRS-5 d=-0.5 LogLoss (mean±std): 0.3780±0.1802
FSRS-5 d=-0.5 RMSE(bins) (mean±std): 0.0691±0.0346

I've only done 100 users so far, so I will report back tomorrow. But weirdly enough, it seems like the fixed one is just too good for some reason. Then again, maybe the optimizable one needs more epochs.

polar maple Apr 10, 2025, 10:05 PM

#

unique salmon <@142448513622605824> I'm benchmarking decay=-0.5 vs decay=-0.2 vs optimizable d...

i wonder if optimizable decay just needs some heavy regularization to reduce overfitting

#

it could be worth having a modified version of the script that also saves the training loss for each user to see if opt decay fits the training data much better than decay=0.2 or not

small crow Apr 10, 2025, 10:40 PM

#

unique salmon If you have changed parameters inbetween reviews, yes, it could happen

while true, i really think something is weird with the cards and how the math is mathing on them because using the parameters from 2025-03-14 that I have, a 333 should have an interval of 36 days, while using my most current set of parameters from 2025-04-09 says they should be 62 days, and not the 8 days as show in the first card in this message here:
#1282005522513530952 message

I kinda actually wanna see about setting my PC in the future to see how the next rating affects the difficulties and intervals. let's time travel~

but first: BACKUP TIME

edit: i was wrong, after two reviews into May, the intervals started decreasing. so it really just is that the cards now have a difficulty of 100% lol, I think it'd be faster if I were to just reset them and go from there

oh, I found all the cards with resched:27 -resched<27 lol, and a lot of them are exhibiting this behavior. I there a way to remove the manual review/scheduling data that gives it a 1 to see if that's messing with it? thanks

quasi shadow Apr 11, 2025, 2:26 AM

#

polar maple but where are my confidence intervals?

😂 I forget to release it.

quasi shadow Apr 11, 2025, 3:46 AM

#

polar maple Apr 11, 2025, 3:53 AM

#

nice, i think it needs more samples but i dont think there is an obvious trend

quasi shadow Apr 11, 2025, 4:22 AM

#

#

FSRS-5 LogLoss: 0.4939, FSRS-6 LogLoss: 0.5114

#

😅

#

@polar maple

quasi shadow Apr 11, 2025, 5:36 AM

#

The retention data has been added into https://github.com/open-spaced-repetition/Anki-button-usage/blob/main/button_usage.jsonl

GitHub

Anki-button-usage/button_usage.jsonl at main · open-spaced-repetit...

Contribute to open-spaced-repetition/Anki-button-usage development by creating an account on GitHub.

quasi shadow Apr 11, 2025, 6:07 AM

#

I'm benchmarking optimizable decay.

#

polar maple Apr 11, 2025, 6:12 AM

#

quasi shadow FSRS-5 LogLoss: 0.4939, FSRS-6 LogLoss: 0.5114

is this with the 1/n scaling that expertium described? i guess the outlier affects it too much, the n=1 bucket has a huge effect on the outcome

quasi shadow Apr 11, 2025, 6:14 AM

#

polar maple is this with the 1/n scaling that expertium described? i guess the outlier affec...

yep

#

Btw, the optimizable decay reduces RMSE(bins) ~5% compared with decay = -0.2

#

#

And it also performs well in low R region.

polar maple Apr 11, 2025, 6:16 AM

#

looks promising!

bold terrace Apr 11, 2025, 6:18 AM

#

Nice ! Also, if I'm not wrong, since the fixed decay was computed on the same training set than the optimizable one, on those numbers it's normal there is not much difference, the big benefit would be for users not matching the training set right 🙂 ?

bold terrace Apr 11, 2025, 6:21 AM

#

unique salmon <@142448513622605824> I'm benchmarking decay=-0.5 vs decay=-0.2 vs optimizable d...

If I take those numbers for example, it's normal the optimized doesn't outperform the fixed on the same data the fixed was optimized, since the fixed is an optimized one in the first place (but that won't change anymore).

quasi shadow Apr 11, 2025, 6:22 AM

#

bold terrace Apr 11, 2025, 6:23 AM

#

So to evaluate it, you should probably compare a fixed decay trained on the 10K user but applied to different user and see how much performance decrease, vs the score with their own optimized decay

quasi shadow Apr 11, 2025, 6:23 AM

#

FSRS-6-dev is FSRS-6 with optimizable decay.

#

😂 Fine. More work for me to implement it in Rust.

polar maple Apr 11, 2025, 6:29 AM

#

bold terrace So to evaluate it, you should probably compare a fixed decay trained on the 10K ...

if we only have this 10k dataset then what you want is something like, we find a fixed decay by only looking at the first 5k users, then evaluate this fixed decay on the remaining 5k users and also try learnable decay on the remaining 5k users as well? yeah it's true that we may be overfitting the dataset with certain hyperparameter and algorithm choices, but i think this is swept under the rug as being arguably not influential

bold terrace Apr 11, 2025, 6:31 AM

#

Sorry 😦 But yeah, having training/test set is even another step @polar maple, but here I think it's even more unfair for the "Optmized vs Fixed" decay, since here Fixed=Optimized(10k)... So optimizing the Decay on the same set, will just get you of course the same results

#

But I agree ideally even the optimizer for 1 user should be done on a training set, and then evaluated on a test set

unique salmon Apr 11, 2025, 8:49 AM

#

bold terrace Sorry 😦 But yeah, having training/test set is even another step <@1424485136226...

An optimizable decay should still perform better than fixed decay because it can adapt to each user individually rather than only being good on average

bold terrace Apr 11, 2025, 8:50 AM

#

unique salmon An optimizable decay should still perform better than fixed decay because it can...

But is it how it's evaluated in Jarrett's graph ? I mean, isn't it like a "I optimize it once, and then I evaluate the full set on it ?"

unique salmon Apr 11, 2025, 8:51 AM

#

Speaking of which
Model: FSRS-6
Total number of users: 774
Total number of reviews: 26313110
Weighted average by reviews:
FSRS-6 LogLoss (mean±std): 0.3383±0.1601
FSRS-6 RMSE(bins) (mean±std): 0.0511±0.0342

Model: FSRS-5
Total number of users: 774
Total number of reviews: 26313110
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3384±0.1599
FSRS-5 RMSE(bins) (mean±std): 0.0512±0.0341

According to my tests opt. decay is pretty much the same as decay=-0.2 🤔

bold terrace Apr 11, 2025, 8:51 AM

#

So in your test, you optimized different decay for every users ?

#

Or maybe I just misunderstood what you said 🙂

unique salmon Apr 11, 2025, 8:52 AM

#

bold terrace So in your test, you optimized different decay for every users ?

For every user individually, yes

unique salmon Apr 11, 2025, 8:53 AM

#

quasi shadow

Let's combine the 5 leftmost bins into one, so that it has a larger sample size

unique salmon Apr 11, 2025, 8:54 AM

#

unique salmon Speaking of which Model: FSRS-6 Total number of users: 774 Total number of revie...

@quasi shadow show me your code

#

Idk how you get better results

#

Oh wait, maybe my implementation is bugged

#

How do I do this properly? 😭

#

class FSRS(nn.Module)

quasi shadow Apr 11, 2025, 9:09 AM

#

unique salmon <@449662392314494987> show me your code

https://github.com/open-spaced-repetition/fsrs-optimizer/pull/176

GitHub

Expt/trainable forgetting decay by L-M-Sherlock · Pull Request #17...

unique salmon Apr 11, 2025, 9:10 AM

#

unique salmon How do I do this properly? 😭

How do I properly do this part?

#

In other.py

#

I guess it doesn't matter since you will benchmark opt. decay on your own anyway, but still

#

-self.model.w[19] doesn't work, 'FSRS5' object has no attribute 'model'

quasi shadow Apr 11, 2025, 9:12 AM

#

The self is model.

#

So you don't need to .model again, I guess.

unique salmon Apr 11, 2025, 9:13 AM

#

The thing is that optimization goes fine and different users have different decay, but then the results are pretty much identical to decay=-0.2. So I'm guessing that I screwed up outside of the optimization

quasi shadow Apr 11, 2025, 9:13 AM

#

I cannot review your code if you don't use GitHub and git (

quasi shadow Apr 11, 2025, 9:22 AM

#

unique salmon Let's combine the 5 leftmost bins into one, so that it has a larger sample size

#

Like this?

unique salmon Apr 11, 2025, 9:23 AM

#

Yes

#

Interesting. So FSRS-6 performs better across all retentions, except for super low ones

#

The graph is pretty awkward though

#

There are 3 graphs, but xticks are on only one. And without a horizontal line it's kinda hard to tell where 0 difference is

#

Having a line like this would be good, but the line is unrelated to the curve, it's related to the differences

#

Ah man, this is awkward 😅

#

@bold terrace ok, so this stuff is kinda hard to read, so TLDR: FSRS-6 with it's "A card with S=1 day can never reach R=10%" super-duper-flat curve...somehow performs better than FSRS-5 even for people with retentions like 40% and 50%

#

It performs better almost universally, except for people with retention around 20%

#

So...flat curves are just better

bold terrace Apr 11, 2025, 9:28 AM

#

unique salmon Ah man, this is awkward 😅

Interesting

#

Probably some external influence on how people use Anki

unique salmon Apr 11, 2025, 9:30 AM

#

unique salmon Apr 11, 2025, 9:30 AM

#

bold terrace Probably some external influence on how people use Anki

What if this is FSRS's way of compensating for external reviews?

#

That's the only explanation that I see

bold terrace Apr 11, 2025, 9:32 AM

#

unique salmon What if this is FSRS's way of compensating for external reviews?

Yes that was also what I was thinking. Probably things never drop under 20% because the people still have a "baseline" knowledge coming from outside anki

unique salmon Apr 11, 2025, 9:32 AM

#

Yep

bold terrace Apr 11, 2025, 9:32 AM

#

and people with lower R might be people not really doing anything else outside anki

#

That would be interesting (but difficult I guess ?) to do some clustering on Users, Reviews .... to try to see if we can't also "profile" things

#

FOr example, by splittng my deck into "Normal D" and "High D", I discovered the default FSRS parameters are actually quite good for my Normal D !

#

And only my High D collection has a worst logloss/RMSE if not optimized

quasi shadow Apr 11, 2025, 9:50 AM

#

#FSRS Megathread