#FSRS Megathread

16000 messages ยท Page 16 of 16 (latest)

cosmic hedge
#

both 1 and 50 years show 0.71 as well

unique salmon
#

Can't argue with that
Alright, so here's what I want you to do:

  1. Filter out first reviews and same-day reviews
  2. Calculate R for each review and record both R and duration of the review
  3. Plot duration as a function of R, obtaining 4 graphs like these
  4. Fit a linear function to the data and save parameters a and b (time=aยทR+b)
  5. Instead of using fixed time per review, like "7 seconds per Good" for example, use time=aยทR+b
#

Jarrett didn't want to do this because a and b will either have to be calculated during optimization and then stored somewhere, or recalculated every time we run the simulator

#

So it's kind of a pain in the ass

#

But it could significantly improve the accuracy of the simulator, and potentially fix CMRR
(ok, well, maybe not significantly, but still)

unique salmon
#

๐Ÿค”

#

Well, screw the integral

#

It's time=f(R) time!

bold terrace
#

I think for low decay the CMRR is doomed anyway

unique salmon
#

We can do time=f(R) AND use sum(R*S) as well

bold terrace
#

I mean, since the forgetting curve will "never" go to very low R, well, just adding more cards (and never reviewing them) is more time efficient than having a "healthy R"

unique salmon
bold terrace
#

Maybe a "Target R" ?

#

For example, if the user "Target R" 85%, maybe the optimal DR could be 70% or 80% based on his decay

unique salmon
#

?

bold terrace
#

(Since all cards will be [DR, 100], but without a homogeneous distribution between those 2 points)

#

For example if the forgetting curve was a pure linear line, DR=90% would give me an average R of 95% I guess

unique salmon
#

So you want a DR that is not DR...got it

#

(I didn't get it)

bold terrace
#

Hmmm let me think of a way to express it ๐Ÿ˜†

#

The user want an average R of 90%, let's say.

#

But if you put DR=90%

#

the average R will be more like 95%

#

(let's assume between 100 and 90%, it just decline linearly)

#

The DR is more like the "minimum R"

#

It's the average R of "today's review"

#

But in the grand scheme of the whole collection, it's the minimum

#

Is it more clear ๐Ÿ˜† ?

#

Look, my DR is 90%

#

but the only the card due today will have an average around 90%

#

most of them are just between 90 and 100

#

So now let's take a scenario where going from 100 to 90 takes 1 day, but going from 90 to 70 takes 365 days. and from 70 to 60, it takes 3650 days.
If the user wants an average retention of 70% .... Weeeell since most cards will be stuck at 70% R for years, you could ask for a DR of 68% for ex

#

So the DR for a card to be "in average at the target R", would have to take into account : Current R, S, D, decay ๐Ÿ˜†

#

The whole shebang

unique salmon
#

Sooo how is it related to finding the optimal workload/knowledge ratio?

quasi shadow
bold terrace
#

It's related that right now, it might be normal that the optimal workload/knowledge will always be to be the lowest R possible ๐Ÿ˜†

unique salmon
bold terrace
#

Because you can't beat a non zero R with zero review due ๐Ÿ˜†

#

So let's add billions of card that you will never review again

#

if they have >0 R, you'll have a better time than trying to make them float above any "normal DR"

#

So you need to introduce something like : "Ok user, what do you want as a real average R for your whole collection" ?

#

If he says 90%, now you know that all those billions cards with 10-20% review have a huge cost compared to his expectation

#

So automatically, you'll have to find a DR as low as possible, while keeping the average collection R as close as possible as his expectation

#

Typically, if 100%->90% is linear, a Target Collection R of 95% would translate into a DR of 90%, for ex

#

Target Collection of 70%, might lead to a DR of 68% (because most cards will drop from 100 to 75 very quickly compared to how long they will take around 68-75)

#

Problem right now is that a DR of 20% truly means "Never let me review it again" with a decay of 0.1
Soooooo of course the optimizer will want to have a shitton of card with the lowest possible DR, there's no penalty from having it

#

Introduce a penalty from diverging from user expectation (The "DR" for the whole collection), and CMRR will compute for you what would be the most optimal DR for today's review

#

(the tl;dr in bold ๐Ÿ˜† )

#

What is done with S right now, is just a way to introduce some kind of penatly "If you don't know it well, you won't have good score". But when you try to optimize knwoledge/workload, and there is a direct way to make workload ~= 0, well, you will need to put S under amphetamine to hope having any kind of effect on that

#

You're fighting with linear function compared to an exponential reduction of workload, that won't be sufficient if you don't put something that will compensate it. Having a "cost" that will explode when R gets very low, can compensate it

bold terrace
#

If it's still not clear, I'll find a way I promise ๐Ÿ˜†

unique salmon
#

Maybe it should be something like knowledge/(max(workload, minimum workload)), but idk how "minimum workload" would be calculated

bold terrace
#

I'm not entirely sure it would help because I think right now the workload might be extremely low

#

It would be interesting to have, for that curve, the value of the Knowledge, and the one of the Workload

#

I'm wondering of under 50%, basically the workload is like 0

#

"1 review every 10 year"

#

And truth is, CMRR would still be right to answer then : Just put DR=1%

#

Even if you'd recall 60% of everything, your workload would be almost 0

unique salmon
#

Like, idk, 100 cards/max(30 minutes, 1 minutes)

bold terrace
#

Well I think if it's true the user could have 0 workload but still manage to have his expected Collection-DR, why not recommend him a very very low DR, right

#

I think that's maybe the missing part : Not differentiating Collection-DR and Due-DR

#

Due-DR might be 80% and you'd get a Collection-DR of 90%

unique salmon
#

what

bold terrace
#

And the optimal way to set your Due-DR would be based on your expected Collection-DR

#

hehehehehe

#

Still not clear ๐Ÿ˜„ ?

bold terrace
#

The Due-DR=90% which lead to a Collection-R of 95%, is it clear ๐Ÿ™‚ ?

unique salmon
#

Then you're just re-defining DR without achieving anything

#

We're trying to find optimal DR, not redefine DR for no reason

#

You don't achieve anything by redefining it like this

bold terrace
#

But optimal Due-DR based on what ๐Ÿ™‚ ?

unique salmon
#

Knowledge/workload

bold terrace
#

CMRR is working already perfectly fine for this, no ?

#

If the guy has a decay of 0.1

#

And it will take 1y to go from R=100% to 70%, and 100y from 70% to 1%

#

CMRR is perfectly right to advice you to set your DR to 1%

unique salmon
bold terrace
#

Because it reduce workload, since by nature, cards won't go very low in terms of R

#

So by defining a "Target R" for the collection, CMRR would compute the best DR to have for due review ๐Ÿ™‚

#

If he wants a 95%, a DR=90% might work. If he want 60% but his decay is super super super low, then maybe even putting a DR of 40% might be better ๐Ÿ™‚

#

You need to understand the problem well before trying to find answers ๐Ÿ˜„

#

Right now you're just trying to find a function that give you something you think would be better without interpretating the numbers

#

@cursive badge / @polar maple already suggested it, but feel free you two to tell me if it was not your interpretation ๐Ÿ˜†

polar maple
#

if i understand correctly, the user sets something like "time spent above R > 0.9" and CMRR optimizes for this?

#

if it's this then @unique salmon should like it since it now ties back to sum(S) formulas

bold terrace
#

Typically : For this exam, I need to know 90% of all those cards

polar maple
#

ok so the real review time would be like 85% or something

bold terrace
#

Setting a DR of 90%, would lead him to a ~95% R on collection level

bold terrace
#

I say Due-DR=90% lead to Collection-DR=95% by taking the assumption the forgetting curve is a linear function between 100% and 90%

unique salmon
#

Again, that's just redefining DR

#

It's not getting us closer to finding what DR is the best

bold terrace
#

No it's not at all

#

It's really 2 different values

#

that relate to each other based on the parameters of an user

#

And in fact it is very close to the integral interpretation

#

Because if an user wants to have 80% DR on a card in average, you have to account the time where he had above and below that 80%

#

It's just that the integral right now is used to have some kind of Score where the minimum score would be having a "R=0"

#

But this doesn't change anything to the fact that if the guy will never go to 0 because of his decay, then CMRR will be able to maximize its Knowledge/Workload just by ... almost zero-ing the workload ๐Ÿ˜†

#

When you think about it, and you realize it, you realize it's very dumb what is done right now

#

You ask how to minimize a function that is purely increasing for low decay

#

And you try to find way to change it, because the score is relative to ... R=0

#

Which would lead to a 0 workload ๐Ÿ˜†

#

I'm sure you can do better than that @unique salmon ๐Ÿ˜„

polar maple
#

but now we allow the threshold for S to be user-defined rather than be fixed at 90%, also the definition is now slightly different at a collection level rather than a card level

bold terrace
#

I mean, having R=10% is still a "positive score", so if the workload is ~=0 while having it, of course CMRR will take it

#

Would be dumb to not take it ๐Ÿคท

#

I mean wouldn't it a good case to use logloss ?

polar maple
#

i don't think that log loss has justification in this context

#

@bold terrace so what exactly would the metric be after a simulation is done? time spent such that the collection's average remains above the threshold R? i suppose this exact version could be cheated by only learning 1 card and learning it very well

bold terrace
#

Yeah collection average is bad

#

Could be computed on card level though

#

What is the average retention you want for a card ? Let's say 95%.
Then CMRR goes brr and compute what would be based on the forgetting curve to reduce as much as possible workload while achieving that 95%

#

Maybe it could be 90%, maybe it could be 80% for more aggressive decay

#

Because let's face it

#

If you ask the user "What is the lowest DR you want" (not the average), well the minimum DR recommended would be the DR, so yeah in that case, we're not solving anything

#

But asking "What is the max K/Workload can I get ?" when Workload can go exponentially closer to 0 with every bit of R lost, of course the "Score Function" won't be able to compensate it

#

Except if you taylor it specifically to make the graph convex again, but then you're just building function to get what you want, wouldn't be less biased than doing a "return .85"

#

But once again

#

First we need to know what is truly the goal, if you say "Minimize Knowledge/Workload", well CMRR is doing it perfectly fine right now : Set your DR at the lowest value possible if you have a low decay, NO card will ever go to 0%, so just never do any review, just add new cards ๐Ÿคท

#

So either redefine the goal, or just remove CMRR as it is

#

(Well actually CMRR can work fine for short-deadline, no new items to add, and with high decays, so it's not completely useless)

unique salmon
bold terrace
#

If it's expressed more or less in the description of the calculator, why not

unique salmon
#

As long as for different users CMRR outputs all values between 70% and 95%, I think it's fine

unique salmon
#

We can add it to the manual, but not to Anki itself

polar maple
#

return randint(70, 95)

unique salmon
#

If someone really cares - go read the manual then

bold terrace
polar maple
#

btw i think that if you use a minimum workload formulation then only the minimum workload will be used

bold terrace
#

But personally I used it and trusted it once in my life, never anymore ๐Ÿ˜†

#

I think it's when I joined this discord to try to understand FSRS a bit better

#

gosh I got f***ed badly by it

#

"Drop your DR to 70%"

#

I did

#

My R for the following days : 55-60%

unique salmon
bold terrace
#

That's the day I learnt to not trust too quickly what FSRS gives ๐Ÿ˜†

unique salmon
#

If as DR decreases, knowledge decreases faster than workload, that will solve the problem

polar maple
#

tbh i always thought from the name that CMRR was a lower bound on what your desired retention should be rather than the optimal value for it

unique salmon
#

It was called "Optimal retention" initially

bold terrace
unique salmon
#

If we call it "Optimal minimum recommended retention", then it's confusing. And hard to remember

bold terrace
#

If we know it, for example if we know that point is 72% for decay=0.1, you could then just bound the output of CMRR to [72%

robust hill
#

just put a recommendation that 85% is the optimal retention

bold terrace
robust hill
#

if someone asks why then say cuz i chose that

unique salmon
#

I mean, that's already the case with 90%

bold terrace
#

Yeaaaaah IMO the 'cost' of failure is WAY WAY WAY more than what we expect

robust hill
#

well a lot of ppl seem to think

unique salmon
#

90% is the default for literally no reason other than "90 is a cool number"

robust hill
#

90% desired retention is remembering 90%

bold terrace
#

Also, the shorter the interval, the faster you detect the flip-coin reviews

robust hill
#

i swear like 90% of people think this

bold terrace
#

IMO there's no equation that describe well how getting things wrong at long interval is fucking bad

lapis hearth
#

Why would they. They are a competitor of Anki and It may not play in their favour

unique salmon
bold terrace
#

SM2 was into something when their logic was "You lapse ? You start over ๐Ÿ˜† "

unique salmon
robust hill
#

best dr 82% dr

#

๐Ÿ”ฅ

bold terrace
#

There's a guy in the #language-learning channel, @severe storm , he explained how he was working, 2-3 learning step of 30min to make sure no cards are just "flip coins", and then high DR to make sure everything get stabilized ASAP

#

He's doing 100 new japanese words/day

#

100 is a lot

#

And he did it for a long time now

lapis hearth
#

What about your neural net

#

Expertium pinged you. Dekki ai is seemingly able to incorporate a neural net with flashcard reviewing.

bold terrace
#

The whole "Find your optimal Knowledge/Workload" is for lazy mindset people

#

"I want to become fluent in 20min/day"

#

๐Ÿ˜†

severe storm
bold terrace
#

"I'm not lazy, I'm an optimizer"

robust hill
#

look what 95% dr would do for me

#

3x workload

bold terrace
#

FOR NOW

robust hill
#

maybe next year

bold terrace
#

That's the fucking key point : FOR NOW

robust hill
#

id try this strategy

robust hill
#

unfortunately

#

its too late for me in this year

bold terrace
#

I bet that if you survive the 95% at first, you will stop half-acquiring things and your overall stability will increase

robust hill
#

my average is 3.4

polar maple
robust hill
unique salmon
#

I want shiny numbers to write about

#

You're like 90% done with it

#

Come on

severe storm
severe storm
#

Today was 2.5 but that's because a bunch of new cards were immediate easy skips

robust hill
#

but my question is maybe

#

this is a good language learning strategy but idk about med

#

how much time on avg for 100 new cards excluding reviews

severe storm
#

I don't do this strategy for my grammar cards

#

I don't feel the need for it there

robust hill
#

and when

#

and also

#

are you sure its the 95% dr working in force

#

or is it the 3 learning steps

lapis hearth
#

@quasi shadow Is FSRS not compatible with an NN

bold terrace
#

@robust hill you're free to ask of course but there's a lot of discussion with @severe storm from the past day in #language-learning , might be interesting to check ๐Ÿ˜„

robust hill
#

okay will read

severe storm
robust hill
#

perhaps one of you two could show me where it starts ๐Ÿฅน

unique salmon
#

And realistically, the benefits of the neural net would have to be enormous to justify the switch

severe storm
unique salmon
#

SM-2 -> FSRS is a fairly easy choice
FSRS -> neural net is more debatable

severe storm
#

But yes it is too early to say

bold terrace
#

It's in off-topic sry

severe storm
#

The only immediate effect I can determine is that my true retention went up by like 15%

robust hill
#

but how much extra time

unique salmon
severe storm
#

When it stabilises I can say more

#

And unfortunately I will never be able to give a more scientifically objective evaluation because I went from 50 new words with 2 cards each to 100 words with 1 card each

#

So I can't directly compare

#

And isolate the effects of 95% DR and extra learning steps

bold terrace
severe storm
bold terrace
# severe storm And isolate the effects of 95% DR and extra learning steps

And both might help too. I've been discussing (well, alone) about "Knowledge-Stability" (flip-coin reviews) vs "Time-Stability". Some words, you might get them wrong 20% of the time because you "guessed wrong" between 2 options. In those case, having too few learning steps + low DR means you might succeed them just enough so they grow a bit in terms of interval, but then they crash suddenly

severe storm
#

And my retention back then would be higher ceteris paribus than it would be now because there were two cards per note

#

Despite this I currently have fewer relearns in terms of percentage than back then. This is also despite the fact that every single relearn has an extra step now.

#

But I cannot say anything definite right now

severe storm
#

That they're both helpful is evident

#

If I hadn't changed anything else about my routine then I could directly compare and therefore say with confidence exactly how 95% DR and extra learning steps affect the bottom line

#

But did change other things, and not insignificantly at all

#

So I need to go a little bit off of vibes

polar maple
unique salmon
polar maple
#

so in a sense RWKV and FSRS-5-recency diverge in their predictions by ~10% in R

robust hill
#

is there a fsrs retention algorithm graph for 1 vs 2 vs 3 learning steps

#

๐Ÿ”ฅ

unique salmon
#

nope

robust hill
#

make one

#

i will give you a high five

severe storm
#

Is 3 learning steps something people do at all? I've never heard of it and I just randomly got the idea when I saw the optiom

bold terrace
#

We have certain people here I won't mention that have around 20-30 learning steps / new card

robust hill
#

๐Ÿ”ฅ

bold terrace
#

Failing every 10 seconds

robust hill
#

and i think the cards stick

robust hill
#

1m 7m 15 m

severe storm
#

I don't think that's useful

#

What's making my third step useful is that it's after an hour

robust hill
#

๐Ÿฅน

severe storm
#

It ensures that the new cards only pass if you actually remember them after doing something outside of anki and waiting an hour.

unique salmon
#

I wrote about the 5 benefits of using Alex's net before

  1. We can make R more accurate
  2. We won't have to show parameters, which means one less thing for users to worry about
  3. We can support proper same-day scheduling instead of the current mess
  4. We can throw in new input features, like time of the day, workload, etc. Not just interval lengths and grades
  5. We can remove "Optimize", which means even less stuff for users to worry about
    1 and 4 are probably not super important at this point. FSRS-6 with just interval lengths and grades is fine
    2 is nice
    3 is great!
    5 is nice

So we have 2 questionable benefits, two nice benefits and one great benefit

#

Idk if this would be enough to convince Dae, probably no

polar maple
#

ok 11.2% RMSE, 7.8% average absolute difference

#

this is pretty significant imo

robust hill
#

do NOT remove optimize

#

leave it and act like its doing something

unique salmon
robust hill
#

confirmation bias ๐Ÿ”ฅ

#

yes but do like

severe storm
#

Placebo

robust hill
#

rand, (already optimized, optimized now)

unique salmon
#

Alex's net would be pre-trained on thousands of users and then used "as is"

robust hill
#

ifclickedbefore2days
write
already optimized
Ifclickedafter2days
write
optimized now

#

๐Ÿ”ฅ coding

polar maple
#

so RWKV is better able to separate the cards, probably

unique salmon
#

Just release the thing ๐Ÿ˜ญ
Stop edging

polar maple
#

for this one i was testing RWKV the curve version

unique salmon
#

Bro is edging so hard

#

He shared the metrics with me in DMs, but refuses to release it for the benchmark

polar maple
#

ill do it sometime

unique salmon
#

About all that stuff I wrote about

polar maple
#

apparently i don't have the RWKV-P file at hand so i can't do the same rmse & abs calculation with it

unique salmon
#

You should compare it to FSRS-6 btw

polar maple
#

yea i had a look at the raw probabilities, a lot of what is pushing RMSE is FSRS-5 under-predicting prob with the high decay

bold terrace
#

IMO having both RMKV and FSRS would be awesome ๐Ÿ˜†

#

Being able to select one or the other

#

But I guess it will be on a global level again ๐Ÿ˜ฆ

unique salmon
#

Nah, too much cognitive load

#

Choice paralysis or whatever it's called

bold terrace
polar maple
#

LSTM is a nicer choice for a drop-in replacement i think

robust hill
#

put a secret code

#

that allows the manual readers

#

to enable the choice

#

๐Ÿ”ฅ

unique salmon
polar maple
#

fair enough

unique salmon
#

Make RWKV-1B (1 billion parameters) that only people with an RTX 4090 can run ๐Ÿคฃ

#

So that people with beefy PCs can make their algorithm even more accurate

polar maple
#

i cant train 1 billion params

#

if RWKV does go into anki then it would probably be a smaller version for weaker devices

#

maybe 10k params

#

sure on my cpu i can run 200 reviews per second but idk about phones

unique salmon
#

I semi-regularly see people on r/Anki saying "I have an Android 4 (or 5) device, why is AnkiDroid not working?"

#

Well, ok, I've only seen it 2 times, but still

#

2 is alot in this context

cursive badge
ashen light
#

are we getting gpu-accelerated anki?

#

lets go

#

if it uses a gpu it might even get some vc funding now

lapis hearth
#

But so far, there is all talks and nothing is coming to fruition

#

๐Ÿฅฒ

quasi shadow
#

but Anki is if the nn has a reasonable forgetting curve.

robust hill
#

for the preset in whcih i learn languages

#

in preset of physiology mcq optimization

#

which is just straight memorization

#

seems like it might be better to learn stuff instead of spam memorization

lapis hearth
#

has someone actually tried testing it

unique salmon
old sedge
#

I'm getting absurdly high intervals on fsrs

#

stuff like, 26 days apart for stuff I've barely seen

#

here are my parameters:
3.6981, 6.5516, 15.4492, 27.3702, 7.2747, 0.4875, 1.5402, 0.0010, 1.4661, 0.1985, 0.9395, 1.8605, 0.1889, 0.2166, 2.1902, 0.2315, 2.9898, 0.4861, 0.5830

#

my desired retention is 90%

bold terrace
#

The first 4 params are your initial stability

#

It seems FSRS things that if you press Good, 15d Stability is a good interval for you to go to DR=90%.

What's your DR ?

#

Could you also press the Button "Evaluate" and give the logloss/RMSE?

old sedge
bold terrace
# old sedge .

Sorry ! So yes, those 4 numbers represent your first interval for all 4 buttons

bold terrace
#

This is logloss or RMSE ?

#

Logloss I guess

#

Since the output is like
Log loss: 0.3520, RMSE(bins): 2.90%. Smaller numbers indicate a better fit to your review history.

#

Well 0.2490 is pretty good !

#

So it means FSRS should be able to predict quite well your intervals

#

If you think it's still too long, maybe put a bigger DR ? Like 95% ?

#

(Typical example why having Evaluate helps troubleshooting much faster situations like this @unique salmon xD)

unique salmon
old sedge
old sedge
bold terrace
bold terrace
#

This is what your interval could look like if you continue to press Good

old sedge
#

I still set my retention to 91% just in case
yea 91 not 95. it's a good in between number

bold terrace
#

If you use "Hard" it seems FSRS is tuned to make your interval grow much much lower

#

so it's also an option, if you feel your retention is wacky

#

But be careful that if you press "Hard" but then succeed 100% of them, FSRS will learn to not care about your Hard and improve the multiplier

old sedge
#

I thought I weren't supposed to use the hard button, since it fiddles with rhe vard ease

bold terrace
#

With FSRS it's not an issue

old sedge
bold terrace
#

The ease hell is something related to SM2

old sedge
#

oooo ok

#

I'm more used to sm2 hehe

bold terrace
#

IF you don't know : press good

#

Pressing Good is 99% the right answer ๐Ÿ˜†

old sedge
bold terrace
#

Well if you failed it of course

old sedge
#

oh ok

bold terrace
#

But if you tell me your next interval is bigger, I guess you recalled it right

#

Also : Don't confuse Hard/Again. Again means you failed, Hard means success

old sedge
#

I know lol

bold terrace
#

If you use Hard as a fake Again, bad things happen ๐Ÿ˜†

old sedge
#

hard is a pass button

#

it's just not as much of a pass button as good or easy

bold terrace
old sedge
#

since it implies uou had trouble recalling the material

bold terrace
#

You see, you have 2 params for Hard/Easy

#

Basically, if you press Hard, you'd get a 23% multplier on the increase (thus why you do +4 instead of +17, 4/17 is 23%)

tepid spoke
#

Are you pressing "Good" on the initial review?

old sedge
#

yes lol

bold terrace
#

And if you press Easy, you'd get 17*3=51d increase (Leading you to 70d interval)

old sedge
tepid spoke
#

Well, if you tell Anki/FSRS you already know the new card well, by pressing Good, of course the Intervals will be long

#

Is "Good" the Honest answers? As in, did you already know the material on the card, and your recollection of it was Good?

bold terrace
tepid spoke
#

If you literally just learned the answers on the backside, and didn't know it before, the only correct rating is Again

bold terrace
#

Personally, my starting stabilities are : Again (0.13d), Hard (1d), Good (3d), Easy(31d). Easy feels enormous, but it's normal, if I press Easy on a new card = I already knew it from well before

tepid spoke
#

I don't think I ever pressed Easy on a new card

#

So whatever number FSRS puts there is meaningless

unique salmon
old sedge
# tepid spoke If you literally just learned the answers on the backside, and didn't know it be...

I'm studying a hard subject. I'm studying vietnamese and how Chinese characters (chu nom). I create cards off stuff I see while immersing and generally since I know the context the word appeared in I have a pretty good short term memory recall of the word. this way I can often guess what the word means on my first try since I've already seen it before. notice how I said short term; I offen forget those words after a few days

tepid spoke
#

What I do in such cases is just instantly bury the new card instead of reviewing it

bold terrace
#

Personally I'd press Good, but then that's why FSRS adapted to give me a initial stability of 3d for first good

old sedge
#

good idea

bold terrace
#

and 31d for easy

#

In the end ...

#

It really doesn't matter

#

(Said some Chester guy)

#

IMO we don't trust enough the algorithm. I suspended some cards that had 2-3 of stability. I reintroduced them 1-2 months later. I got most of them right ๐Ÿคท

tepid spoke
#

Well, I went with "Just trust the algorithm" for a really long time, and it went poorly

bold terrace
#

I think with the trainable decay of FSRS-6, it should be better to model those kind of situation

old sedge
bold terrace
tepid spoke
#

If I'd have trusted the algorithm blindly, I'd now have a ton of cards scheduled 5+ years or more away, while having already forgotten a lot of them

old sedge
#

yea im a new anki user

tepid spoke
#

With that few reviews, it might be better to stick with default parameters for now

bold terrace
# old sedge 1,112 reviews

It's still very very young, so I'd say : For now, keep a good "review hygiene", meaning, try to be consistent in how you review things, and the algo will adapt to you ๐Ÿ™‚

#

you already havea very low logloss so it's very good

#

I don't know your RMSE ?

old sedge
bold terrace
#

The RMSE is more or less "The percentage that FSRS could be wrong"

bold terrace
# old sedge 4.98%

5% is decent ! It means that "more or less" (it's a bit black magic) if FSRS think you'll get 90%, you'll have around 85-95% retention

#

With time it often drop around 3%

#

So you do great ๐Ÿ™‚

wind palm
#

[I'm reading from the top, it's been busy here, so if I'm missing things I'll see them shortly.]

The clouds are starting to part, thank you!

The gap I am struggling with is that we're not teaching FSRS how to do something. FSRS is building a model based on our data. But because the model makes assumptions and predictions, FSRS doesn't just memorize all of the problems in the book, it creates its own theory of how addition works based on those problems and uses that to predict what the sums will be. So the thing we want to test is FSRS's theory against the original data to see how far off FSRS's model is from the real answers. [I suspect there's still something I'm not getting about this ...]

So now that I understand the train/test-split-the-data idea -- I don't know how much use that would be. I definitely see how it would be a purer and more robust test, but there would have to be a way to split the data that gave you 2 full data sets. But if both sets were full enough and thorough enough to match the user's habits and give a great exemplar ... wouldn't they be similar enough to each other that there's no point in separating them?

tepid spoke
#

The risky part with an early interval that's so high is that if you actually don't remember it well, it'll take really long to come back up

wind palm
#

So you could only evaluate the parameters in the next month with new data.
Would it surprise you to know that this is exactly what I do? ๐Ÿ˜…

After I use my parameters for a month, before I reoptimize, I run Evaluate -- testing the old parameters with the additional new data. I have the Evaluate result from last month to compare that to. Then I reoptimize and Evaluate again to see what changed.

tepid spoke
#

That's what I'm kinda stuck in right now. I have a lot of cards FSRS was wrongly really confident I'd remember for 1~2 years or more, and they're now slowly coming out, completely trashing my actual retention percentage

#

which in turn makes the FSRS optimizer "panic" in a sense, and making my intervals incredibly short where it produces an almost unmanageable amount of daily reviews

bold terrace
bold terrace
tepid spoke
#

I've been told that for the last 1.5 years or so

#

so far it hasn't :D

bold terrace
#

Do you reschedule some of your older cards ?

#

You could do partial-rescheduling

#

it can help to reduce the issue

tepid spoke
#

I rescheduled all with an interval longer than 6 months, about 2 months ago

bold terrace
#

Ok cool !

tepid spoke
#

so in 4 months, I should be caught up

old sedge
#

how to reschedule cards on ankidroid?

bold terrace
#

I think your case is also very specific since you learn Kanjis in a vacuum right

tepid spoke
#

But I'm out of new cards now, and STILL getting more reviews, not less

bold terrace
bold terrace
tepid spoke
#

Well, the deck contains ~2000 Kanji, and ~7500 vocabs to reinforce the Kanji

old sedge
#

yea ik how to do it on desktop

bold terrace
tepid spoke
#

The Vocabs are in a vacuum, but the Kanji have the Vocabs as context

old sedge
bold terrace
old sedge
tepid spoke
#

Well, I see them in reviews of the Vocabs that contain them

bold terrace
#

In my case, most cards with 1y interval don't stress me at all because I'll probably see those dozens of time in books

tepid spoke
#

It's WaniKani, yeah

#

I don't really stress about forgetting a Vocab, since WaniKani isn't geared towards them primarily

#

But I also forget a lot of Kanji, which greatly upsets me

bold terrace
#

Remind me what's your DR ?

tepid spoke
#

I dropped from 90% to 88% now, since otherwise the review amount is too unmanageable

bold terrace
#

Maybe the trainable decay will help a bit too

tepid spoke
#

But in reality I'm howering around 80-85%

bold terrace
#

Ok !

#

Yeah yet 80-85 for a DR of 88-90% is pretty good !

tepid spoke
#

With "Young Cards" usually around 90%, and mature ones 80% or less

bold terrace
#

Don't forget the LB will make your actual R drop a bit more

tepid spoke
#

The main issue I feel like is actually fatigue

#

I fail notably more cards towards the end of a review session than at the beginning

#

With the order being random

old sedge
bold terrace
tepid spoke
#

I can't design my day around Anki

#

So the reviews are done when they are done

bold terrace
#

I know people don't like that but ... you can always suspend cards ๐Ÿ™‚

#

I did it and it felt super great

#

300 cards over 3500 sacrified, 25% of my workload gone

tepid spoke
#

That doesn't make them disappear though, and I need to know them eventually

bold terrace
#

Yeah but later ๐Ÿ™‚

#

You do it for fun right

tepid spoke
#

I will need to eventually pass N1

bold terrace
#

If you get burn out it won't help

#

Yeah but N1 you can get it by understanding the language

#

Not by knowing exotic kanji ๐Ÿ˜„

tepid spoke
#

N1 calls for even more Kanji than on WaniKani

#

so I'll need to somehow learn another 400 or so

bold terrace
#

N1 feels still a bit difficult, I'm like 10% good answer only

#

But N2 is more like at 20-30%

tepid spoke
#

A lot of those Kanji are utter nonsense you'll almost never need

#

but it's in there...

bold terrace
#

I think if you can read japanese you won't have any trouble with N1

polar maple
bold terrace
tepid spoke
#

"Being able to read Japanese" isn't that simple

#

I can read a kids book or regular manga

#

but not a financial report or research thesis

unique salmon
bold terrace
#

N1 didn't felt that high level anyway though

tepid spoke
#

The problem with N1 is that it does not test production AT ALL

bold terrace
#

A lot of people succeed N1 without being fluent at all

tepid spoke
#

so you can pass N1 and not speak a word

bold terrace
#

But if you're fluent, you should be able to succeed right ?

polar maple
tepid spoke
#

If you pass N1, you should be able to have decent ability to listen and read

#

but you can be completely unable to speak still

bold terrace
#

okok

tepid spoke
#

not that rare even

bold terrace
#

but still my point remains : There's no point burn out-ing to do N1 sooner

#

Except if you have a job offer that impose you to succeed it right now

wind palm
#

I'm pretty sure the response was, "nah (the entire server is on fire on a daily basis and until folks can behave like grown-ups, we can't take on anything new)."

We seem to have gotten past that issue though, and I've had everything ready to go to switch to a channel for a while now. But every time I try, this megathread is in the midst of discussion/debate/complaint/protest, and there hasn't been a good time to shut it down and shift over. If I could copy or convert the thread into a channel, and you could just keep rolling along, that would be perfect. But Discord doesn't believe in that. So I keep trying to catch a pause. Y'all have been victims of your own volume.

cc: @clever cargo @hasty fractal

[There was also the issue of the desire for the channel to be help and dev which wasn't going to be a good fit outside of help, unless you were going to answer all the help questions. That seems to have solved itself by the basic questions being posted in their own #1266615749779390474 threads, and only the most extreme and exhausting questions getting dropped in here for y'all to take care of ... which suits me just fine.]

tepid spoke
#

Well.... N1 is what separates me from a Work-VISA

#

N2 is not enough, point wise

#

need N1

old sedge
#

I'm gonna screenshot one of my cards to use as an example, since I study kanji too

bold terrace
#

Seems I'm only at 24% for N1 lol

#

It matches my mock tests

polar maple
bold terrace
#

๐Ÿ˜†

unique salmon
#

Well, let's say that's how I wish RMSE could be interpreted

#

In a perfect world of interpretable machine learning algorithms and metrics...

#

And we had to write something in the manual, so here we are

old sedge
unique salmon
bold terrace
#

For example this kanji I'm sure my brain would remember it as "Ah yeah, there's a ๅ…ฌ at the botton top right kanji"

#

But maybe we should shift to #language-learning (Well personally I'll just go to sleep)

old sedge
#

okay, gn

tepid spoke
#

I was about to say... That Kanji does not exist on Jisho :D

old sedge
#

that's because it's not a kanji lol

#

it's a chu nom

#

basically a Chinese character coined by the vietnamese

wind palm
tepid spoke
#

But yeah, this is what I'm looking at: https://japanprcalculator.com/
If I punch in my details, with No JLTP level, I end up at 60 points. 65 with N2, 70 with N1

#

and 70 is the minimum

bold terrace
#

I guess being software dev since 10 years already bring me to 90 points

#

(9Mยฅ, that's the offer I see passing)

cursive badge
# wind palm [I'm reading from the top, it's been busy here, so if I'm missing things I'll se...

My example of overfitting was a bit of an extreme one to help get the point across. As Sound said, it is more relevant to neural networks that do have the ability to just memorize answers if they have enough parameters.

A key point is that we are teaching FSRS something. We give it a good starting point and only let it tweak the rules a little bit but we are teaching it: "this is how you need to set all the dials on the scheduling machine to make it work best for this user"
It is possible that when you change these dials it works slightly better on the reviews you have already seen, but slightly worse on new reviews that you have not seen yet.

The way you would do the test/train split is by choosing parts of the revlog for a preset that are only for optimisation and parts that are only for evaluation.
e.g.:

  • Evaluate using the last months reviews, only optimise on reviews more than a month old.
  • Mark certain cards to only use for evaluation, use the rest in optimisation.
unique salmon
cursive badge
#

The thread is moving quite fast today. I have no idea if I have repeated someone while writing my response ๐Ÿ˜…

tepid spoke
#

We need sub-threads

unique salmon
#

Just sayin'

#

(mods plz)

cursive badge
unique salmon
#

I saw it

wind palm
# unique salmon <@820710428081389599> ๐Ÿ‘†

That's now how I interpret it for people either, if that's what you're worried about? I wouldn't say it's about whether you will get the answer right or wrong -- that's your retention.

RMSE is about whether FSRS can/will/does schedule the card at the right time -- or how often/by how much it misses the right point in your memory curve. FSRS predicts what days you'll get the answer right and what days you'll get it wrong. If FSRS determines that the 10th is the right date to study a card with that history, and you study it early, on the 5th, and get it wrong -- that's a miss. But if you study it late on the 15th and get it right, that's a miss too.

So RMSE 5% means when FSRS schedules a card the interval could be up to 5%-points too soon or too late. [It is 5-points on the scale of R, not 5% of R, right?]

unique salmon
wind palm
cursive badge
#

I always feel like it is easier to teach these kind of things in person with a whiteboard / notepad to scribble diagrams. It's hard to do only using text and having to write full responses ๐Ÿ˜…

unique salmon
#

The exact values depend on how we choose the bins. And we can choose them in pretty much infinitely many ways

#

At least log-loss is nicer in that sense, it just has one formula that everybody uses

wind palm
unique salmon
#

This is also why a health check with Good - Acceptable - Poor will be nice: no need to think, here are simple words in plain English

wind palm
wind palm
unique salmon
#

And tbh, even I am not sure what to say, other than "Don't misuse Hard, answer honestly, don't just use Anki as a note taking app ignoring the SRS aspect"

#

If someone's "FSRS health" is Poor aka FSRS doesn't fit their review history well, idk what to recommend

#

It's hard to identify what's causing it. The best I can think of is "Don't misuse Hard, answer honestly, don't just use Anki as a note taking app ignoring the SRS aspect", as I said

cursive badge
#

I really enjoyed doing one-on-one teaching at university.

cursive badge
polar maple
# wind palm Yes, just about R -- but R is changing day to day, and the interval set so that ...

looking at only the R for a particular day does not tell the full story about individual card scheduling, for example maybe you studied two cards that day, their true underlying probability of success were [0, 1] and you predicted [0.5, 0.5]. While you got the predictions wrong for individual cards, the average number of successes for that day was also 0.5 since from [0, 1] you get exactly one Again and one passing grade

#

now for longer time intervals, if you are looking at R for the past year and you are studying for 2 months where last month you got 85% R whereas your target is 90%, then an algorithm could possibly start scheduling at 95% in order to balance out the 85% in order to achieve a perfect 90%

#

now to be clear this isn't what FSRS deliberately does, but FSRS could be doing it by accident sometimes

#

these exact issues also exist in RMSE (bins) which is why i'm not a fan of it being considered the primary metric

unique salmon
#

Man, imagine how nice life would be if we were doing binary classification
Just use accuracy as your primary metric. Super intuitive and easy to understand ๐Ÿคฃ

#

...unless you have a heavily imbalanced dataset

#

Then you can get 99%/99.%/whatever% accuracy just by predicting the most common class

#

Actually, is there ANY machine learning task where you have a nice and simple and interpretable metric with no caveats? ๐Ÿค”

hard ibex
#

i also added FSRS Helper for the first time and "rescheduled all" suddenly gave me a big backlog why is this so

#

i think i did not understand what the reschedule all button does ๐Ÿ’€

lapis hearth
#

Is there a way to calculate the third recommended learning step from this

unique salmon
#

You can compare the numbers to the regular FSRS-6-recency to see how much it overfits

#

@quasi shadow I think we should ask Dae to be the judge and decide whether we're keeping the "data scientist" Evaluate or the current Evaluate

bold terrace
#

If it works Iโ€™ll modify my addon to do automatic deck switch for cards based on rules like leech classification

#

Why is it so hard to find the "Create Deck" button though ๐Ÿ˜†

#

You check all the top menu, the right click result

#

then you see it's just ... below

#

with the "below menu items"

#

the items that need to be below, you don't know why

unique salmon
#

He was actually relatively quick with responding to this PR, so maybe a few days

bold terrace
#

Uuuuuuuuuuuuuuuuuh

#

I managed to get WORSE log loss AND rmse after an optimize

#

isn't it a bit strange ?

#

The cards from the deck changed though, so maybe the final check to take or not the params depends on something else ?

#

LogLoss : 0.4422 -> 0.4463
RMSE : 7.08 -> 8

unique salmon
#

You have to evaluate on the same cards

bold terrace
#

Ok !

#

But then it's funny because training on older cards made better parameters

#

I would have expected that the outcome of the Optimize, would not pick a worst curve, just because it was trained on different data ๐Ÿค”

#

But anyway, it seems FSRS D is a better clustering metric than my performance_drop_count/ratio to train different parameters on ๐Ÿ˜›

#

Which is quite obvious since in the "all together" scenario, D was already some kind of clustering variable (Based "Again numbers")

unique salmon
#

Honestly, at this point I'm more worried about CMRR. We are severly understaffed - Luc is the only one doing CMRR stuff - and short on time because of the beta FeelsBadAnki

#

Lol
Am I some kind of niche Internet micro-celebrity now? ๐Ÿคฃ

bold terrace
#

Petition to have this in the doc somewhere ๐Ÿ˜†

#

I have my little search query from: expertium mentions: jarrettye reset to go check it each time I'm wondering if moving reseted cards from a deck to another will mess or not the optimize

#

Adding it here would be fine right ?

#

Isnt this the opposite of what was said earlier ๐Ÿ˜† ?

bold terrace
unique salmon
formal peak
#

hey! I just recently switched to FSRS (ik im late ๐Ÿ˜ญ ) and I was wondering how is the "reschedule all cards" on the fsrs helper add on is different from when u click "reschedule cards on change"

unique salmon
formal peak
#

what's fuzz again?

unique salmon
formal peak
#

is that for cards to not be shown on the same day

#

oh okok I was kinda ish right lol

#

so if I don't click on it how long does it take for all of my cards to become fsrs?

unique salmon
#

Depends on how many cards you have in total and how many you are reviewing per day

#

Simplified example: let's say you have 1000 cards and you review 100 of them per day. Then it would take 10 days, assuming each day you review 100 other cards, not the same cards from the last day

formal peak
#

ah so however long it takes to get them all done maybe?

#

so if I were to do all 1000 in a day then it would convert all 1000 of them in that one day?

unique salmon
#

Yep

formal peak
#

wait but wouldn't that have made all 1000 of them converted anyways if Anki showed made it available to me to review on that one day?

#

or is it the 1000 that just happened to be due that day maybe the next day I had 2000 due that weren't there the day before and those would not have been converted yet?

unique salmon
#

If you review 1000 cards, you review 1000 cards ยฏ_(ใƒ„)_/ยฏ
I'm not sure what you are asking

formal peak
#

sorry my b im not the best at articulating my questions haha ๐Ÿ˜…

#

uhhh ok

#

so if let's say I switch to fsrs, but I don't click the "reschedule all cards" immediately, then Anki will slowly turn all of my cards into FSRS right?

unique salmon
#

Yes

formal peak
#

so how long that process takes depends on how many cards I study per day

unique salmon
#

And how many cards you have in total

formal peak
#

oh okok

#

so if I have 1000 cards in total and let's say only 100 of them were due today and I did all 100 of them, then 100 of them would be converted and 900 would not have been converted?

unique salmon
#

Yep

formal peak
#

and then if 200 of them were due tmr and I did all 200 of them then those 200 would be converted and id have a total of 300 that were converted and 700 that weren't

#

so a card gets converted when I do it on its current timeline due date without fsrs?

unique salmon
#

A card gets "converted" whenever you review it while FSRS is on

formal peak
#

oh thats a good way to put it lol thanks

bold terrace
#

FSRS-Porn

quasi shadow
#

Wow, Woz responded to the comparison between SM-17 and FSRS.

unique salmon
#

Go tell Guillem that

bold terrace
#

My boys

#

For CMRR : R*S^2 ? S for "I know it better" and a S-square because "I know it better and the cost of getting wrong at that stage is bigger"

#

But basically instead of "Cost for Again" being a static factor, it could be expressed through S

unique salmon
bold terrace
#

Sure

#

But the concept is nice. Higher Stability = More Cost

#

Independently from willing to know better

#

Or maybe through those parameters ๐Ÿค”

unique salmon
#

@cosmic hedge I want you to try the stuff I described yesterday, with time=aยทR+b, and then also try sum(RยทS) on top of that. So we calculate time based on R AND use sum(R*S) and not just sum(R). In other words, we change both the numerator and the denominator in workload/knowledge
If even that fails, then we'll give up, and I will finally stop pinging you all the time ๐Ÿ˜…

#

If Jarrett is right, time=aยทR+b should be sufficient to unfuck CMRR
If not, we can try that also using sum(SยทR)
And if that's still not enough, then bye-bye CMRR

bold terrace
#

sum(S*R) I think I tried it

#

R, R*S, R*sqrt(S) I think I did

#

but it's always good to try again

#

I think my previous exponential would not have an issue because it penalize super heavily super low stability compared to >5-10 one, but I might be wrong since the optimizer could find a way to do just enough reps to have 5d stability, but with low enough decay, you'd still get 50% after ... thousands of years

polar maple
cosmic hedge
#

calculating r for the simulator isn't easy so I wanna check there's nothing else first

cosmic hedge
unique salmon
cosmic hedge
#

well we know what r does

unique salmon
#

?

cosmic hedge
#

well if some other metric has a correlation with duration and is 100* easier to implement then i'd use that

unique salmon
unique salmon
wet plume
#
  1. What will you be doing next with your free time
  2. Do you have any plans for writing papers not related to srs
  3. There has been a lot of research from china in the machine learning field. Planing on transitioning to such a company?
#

OH another question

#

how did you manage to have nerdy intrests have a job and work on fsrs for so long?

quasi shadow
quasi shadow
quasi shadow
#

Oh dear.

#

They made its knowledge graph manually...

#

๐Ÿ˜… I hope it's an automatic work.

wet plume
#

Thx for the link

quasi shadow
#

I'm using Math Academy and have recommended it to my friend.

#

Maybe I will try to create a system like it.

wet nexus
#

Sorry, but could someone confirm that I've interpreted the FSRS vs. SM-17 benchmark correctly? But based on what I see, FSRS with its default parameters, and therefore without optimizing with user data, is apparently just as good at predicting performance as SM-17 with individual optimization data? In other words, FSRS is in the worst case as good as the best case in SM-17...

quasi shadow
unique salmon
#

I'll leave it to Jarrett to explain the difference in how optimization is done

wet nexus
#

I mean, fsrs with default parameters has a similar rmse to sm-17 based on data from users who have used sm-17 already with the algorithm adapted to their memory

unique salmon
#

We haven't tested FSRS with default parameters there

wet nexus
#

Oh, I see, thanks

unique salmon
#

And fsrs-vs-sm17 is based on a tiny sample size anyway, whereas the main benchmark is based on 10,000 users and hundreds of millions of reviews

quasi shadow
#

I made a cross-comparison between SM17 and FSRS-6.

#

In the view of SM-17, FSRS-6 calibrates the data well.

unique salmon
# quasi shadow

You should probably add an annotation to explain how to read this graph

quasi shadow
#

๐Ÿ˜… I wouldn't like to repeat Woz.

unique salmon
#

No, really, please add an annotation. It's not obvious at all what's going on. Is higher = better? Lower = better? Closer to 0 = better? Can we get a definition of B-W?
(I know closer to 0 = better, but other people don't)

polar maple
#

srs-benchmark:

#

fsrs vs sm17:

#

i wonder if AVG would have better log loss than SM-17?

bold terrace
#

AH it's elapsed days meaning since last review ?

#

or since introduction?

unique salmon
quasi shadow
#

Maybe I can add AVG to the comparison.

#

added

#

benchmarking now

bold terrace
#

There's something I don't understand in AVG. If you do the Average Retention on a collection, well you might get a good prediction if let's say, most people put DR=80-90% anyway, no ?

#

I mean, the fact they had DR=90% and Avg R close to 90%, is what allow you to have an AVG that is not that bad, since most of the test set will be made on (probably) people with AVG R of 90%, no ?

#

And once you have that AVG (R), how do you make it a scheduling agent ? Transforming the R into Stability/Interval ?

#

I mean, if you train AVG on 70% of my review and test it on the 30%, well, those 30% were already scheduled in a way to make it close to 90% AVG R

#

But saying simply "I expect this review to have AVG(R) retrievability, is not really answering the question "WHEN will it have a R of 90% ?"

#

Ideally you'd want to test your R on things that were not pre-scheduled for you

#

Or maybe I miss something stupid and I'm sorry for the confusion ๐Ÿ˜†

quasi shadow
#

AVG is useless for scheduling.

#

It's just a baseline to evaluate predictions.

bold terrace
#

ah ok it's just for testing the prediction

#

I see

#

If an user used SM2, and a simply AVG is already able to do a good job at estimating those prediction, then it means we should then do better than it

quasi shadow
#

However, if the scheduling is perfect (TR = DR), we cannot distinguish perfect predictions from AVG.

bold terrace
#

Ok got it

#

Thanks !

#

Ideally we'd like a prediction function that could even be more precise than giving prediction and would be able to say "This will be a 1, this will be a 0"

quasi shadow
#

@polar maple you are right

bold terrace
quasi shadow
#

Only the AUC is poor.

#

๐Ÿคฃ

unique salmon
# quasi shadow

Wait what, that can't be right
AUC of AVG should be around 0.5

#

Like in the main benchmark

#

It can't be worse than random, especially THAT much worse

#

AUC of 0.08 would mean that you can get an insanely good algorithm if you just inversed AVG's predictions

quasi shadow
#

it's moving average

#

it's updated for each review

#

if the last grade is pass, it increases. if the last grade is fail, it decreases

#

OK, at least AVG cannot beat SM17 in the cross-comparison.

unique salmon
quasi shadow
#

I will push the commit later

#

You can check the code and raw result

bold terrace
#

Sure there is no 1 or 0 confusion ๐Ÿ˜† ?

#

"So in practice if you get an AUC-ROC score between 0 and 0.5 you might have a mistake in the way you labeled your classifier targets or you might have a bad training algorithm. If you get a score of 0.2 this shows that the data contains enough information to get a score of 0.8 but something went wrong."

#

So it means if you take -AVG, you get a AUC of 0.92

#

๐Ÿค”

#

Another guy say what I copy pasted is bullshit though

#

And I don't understand much about AUC

#

lol

quasi shadow
#

But AUC is not a good metric for our case, as I mentioned before.

bold terrace
quasi shadow
bold terrace
#

Good lord I understand why I loved math but disliked stats

#

๐Ÿ˜†

quasi shadow
#

Oh shit

#

it's too good to believe

#

๐Ÿ˜‚ What's wrong with the collections from SuperMemo users?

#

I need reviewer

#

@polar maple @unique salmon would you mind reviewing it?

polar maple
#

@quasi shadow

quasi shadow
#

copilot is dumb๐Ÿ˜ 

#

no vscode anymore, I need cursor๐Ÿ˜‚

#

It's 5:27 a.m. in China. My circadian rhythm is messed up in holiday.๐Ÿ˜…

cosmic hedge
quasi shadow
#

Interesting, my friend in Japan also haven't slept.

#

๐Ÿ˜… It's the geek's circadian rhythm.

cosmic hedge
quasi shadow
cosmic hedge
#

he reviews my pr's at times i 100% assume hes asleep

quasi shadow
#

It's time to go to bed.

polar maple
#

@quasi shadow I have a suspicion for why FSRS and AVG do so much better than SM. If the pandas dataframe is set up in the same way as it is in srs-benchmark then the problem is that FSRS is asked to make predictions about the review at the moment just before it happens, while SM is presumably doing the prediction after the last review of that card. In the time between the last review and now, FSRS is getting extra information

quasi shadow
#

so I need to store the stability and decay for each card after each review?

#

In the next review, we need to use the stored stability and decay to predict the retrievability.

polar maple
#

the current tensor column still needs to be kept since after revealing the label for the current review, you add (tensor, label) to the replay buffer

polar maple
bold terrace
#

I was also wondering, but more on a funny side, what could be the precision of a true baseline like just the constant function ๐Ÿ˜‚. You set it to the average of a user to simulate we know the DR and then you just evaluate all reviews prediction based on that โ€ฆ

#

80โ€ฆ80โ€ฆ80โ€ฆ80

unique salmon
bold terrace
#

Aaah OK sorry I thought it was a moving AVG evaluated after each new review

#

sry sry

lapis hearth
#

@unique salmon

#

Isnt this the dekki guy

unique salmon
#

@quasi shadow @polar maple wanna implement it in the benchmark?

unique salmon
bold terrace
#

Yeaaaah it's going into the direction "More training set = Better than having a split" which can be quite harmful

#

But I think it might be for the best to have attention of Dae on those thematics now

#

Might slow down a bit this PR but for the long term it might give other people less involved in those discussion to understand why, well, in a world of neural networks, letting the algo infer rules like : "if it's 1it's odd, if it's 2 it's even, .... if it's 27... if it's 286..." because he had all the dataset as a training set and never just created a rule "finish by 0,2,4,6,8 : even, else odd"

lapis hearth
#

@quasi shadow Umm Jarrett Did I break FSRS or what

#

I have been getting these 35 seconds for a while now for the same card. D is dropping, but interval is not changing

quasi shadow
#

did you FSRS-6๏ผŸ

lapis hearth
lapis hearth
quasi shadow
#

25.05.2?

lapis hearth
#

yes

quasi shadow
#

Does the stability increase?

lapis hearth
#

Does not seem like it

#

It has been 1 minute for a while

#

I see D dropping

#

This has happened to 4 of my other cards

#

What is weird for me is that this card has never reached a D=99% or a 100%

#

So it did not even reach maximum difficulty

#

D is dropping and good interval stays the same

#

Not rising

#

Here is another card as well

#

D has reached 60% which is fairly moderate

#

Yet it cannot go above 35 seconds

#

I am going to have to press easy on both of these card because effing hell i cannot get out of this cycle

bold terrace
#

If i recall correctly, from @unique salmon , low decay (0.1) means super slow between 90% and 70%, but super quick drop from 100% to 90% no ? So maybe it has an impact for same-day review with people with such bizarre revlog ?

#

But at least know you have your short term memory model ! Just keeping spamming

#

If you press "Easy", does it help ?

lapis hearth
#

It would push my card to 2d

bold terrace
#

I mean, I think after 30-40 good, one per minute, should be easier right ?

#

Yeah maybe FSRS learnt that your "Goods" doesn't mean much, don't know

#

Or mean much until D is lower

#

It's nice, you have dynamic learning steps now

lapis hearth
#

Well D is low enough. At D= 60% that is where the majority of my cards at

bold terrace
#

So you'll have very good recall of those

lapis hearth
#

but i dont have such a very bizzare interval behaviour

#

at that D

quasi shadow
#

D has no impact on short term stability.

#

Could you send me your collection?

#

I need to reproduce this problem.

lapis hearth
#

I have sent you the deck with the problem for now

#

Has 10k cards

quasi shadow
#

The stability is 6 days after I optimized the parameters.

lapis hearth
#

Huh????

#

But why did it make that weird behaviour in the first place+

quasi shadow
#

I will fix it soon

#

OK, it increases from 35s to 42s!

#

๐Ÿ˜‚ maybe it fits your memory

#

@lapis hearth the PR will fix it

bold terrace
quasi shadow
#

nope

#

I just increase the decimal places

bold terrace
#

Yeah 0.001 getting stuck on itself was what I meant

#

Well

#

I guess @lapis hearth you're not edge-case tester

lapis hearth
#

Fuck me

#

I knew something was off

#

I always run into problems like these and end up helping with discovering bugs, but that's what happens when you are overloading FSRS to its extreme

#

I got FSRS, I am using the entire FSRS

lapis hearth
quasi shadow
#

...the only problem is

#

the stability increases so slowly

lapis hearth
quasi shadow
#

I need to set a lower bound for w[19]

lapis hearth
#

The intervals seem to be too brutal

#

Even too brutal for me

quasi shadow
#

What's the reasonable number of learning steps?

lapis hearth
quasi shadow
#

I can't help a lot here

quasi shadow
lapis hearth
#

I mean the way FSRS 5 was dealing with short term scheduling was good enough

#

My problem was that it couldnt drop below a certain interval

unique salmon
lapis hearth
#

What do you suggest I do with the cards caught in this 35s hell-cycle

#

Reschedule

#

Press easy

#

or what

quasi shadow
#

the next release will solve it

lapis hearth
#

Optimizing readjusted it

#

But the problem is bound to happen again

quasi shadow
#

the ad-hoc solution is disable FSRS short-term schedule

lapis hearth
lapis hearth
#

Only that it did not go below a certain interval point

bold terrace
#

You can always use Easy too

#

FOr example, ifyou get 5 times in a row something, I assume it's easy no ?

#

Or set some learning step that would be just bigger than the "blackhole point"

#

something like "35s 60s"

#

If 60s doesn't lead to >60s, try 65, 70 ...

#

Or compile anki with jarett commit

lapis hearth
#

Never thought about that

#

Well the problem is already discovered and Jarrett will be fixing it

#

Optimizing has made the problem go away for now

#

But if this ever comes again until the next release comes, i will be trying this

bold terrace
#

It's possible that with all those reviews at 35s, FSRS concluded "Well, he'll definitely have more than 35s stability if he answers good", thus helping jumping out of the blackhole

quasi shadow
#

the data_processing needs refactoring

#

I need to include the first learning entry for each card

#

wait... the collection file from SuperMemo doesn't include it.

quasi shadow
#

๐Ÿ˜… i give up

quasi shadow
unique salmon
#

What is LCL?

#

What the hell is an XmR chart...

quasi shadow
#

I have no idea, man.

#

Maybe I need to read the paper.

bold terrace
#

15-20 ... words... by day

#

And what is even a "process change". I google LabPlot but it's all about showing data, not SRS-scheduler-correction or whatnot

#

It almost feel like the post was written by a bot

#

Search on FSRS on their website, nothing too

ashen light
#

.5 retention on two days?

bold terrace
ashen light
#

did they spend the day flipping coins instead of answering

bold terrace
#

But to be honest it almost feel like ChatGPT hallucinated post

unique salmon
bold terrace
unique salmon
#

Except that his replies don't make anything more clear ๐Ÿคฃ

bold terrace
#

Why would he mention Jarrett to then, say "Oh yeah it's 15-20 reviews per day"

#

It feels like a very targeted ads no ?

ashen light
#

its got a kde.org subdomain so its not like some vc funded nonsense

bold terrace
#

Like those guys that ping you on Linkedin to sell their custom version of some profiling tool

#

Also "The FSRS couldn't reach the target"

#

The guy doesn't even sound like someone that does anki

#

"FSRS is not able to reach the target"

ashen light
#

or at least has the same name

#

ยฏ_(ใƒ„)_/ยฏ

bold terrace
#

I guess he wants a collab

ashen light
#

why is he using the labplot twitter for this though

bold terrace
#

So a PBC is a "Process Behavior Chart"

#

Ok I think I got it : He try to sell the fact that thanks to his PBC, he was able to go from a Desired Retention of 90% to one of 85%, which allowed him to increase his True Retention from 81% to 95%, and reduce the variation from 0.137 to 0.047

unique salmon
bold terrace
#

Which doesn't make any kind of sense to reduce your DR to increase your TR

unique salmon
bold terrace
# unique salmon Yeah, but he doesn't explain exactly what he did

Yeah and also "Why changing the DR to a lower value". I guess with PBC he's able to see changes not due to normal random variation but from different behavioural pattern, which can be nice indeed, but I don't understand why he claim then that thanks to it, he was able to solve the issue

ashen light
#

what is this black box using to increase retention

bold terrace
#

Since it's an account for PlotLab and he speaks only of PBC I guess it's related to that but I can't understand what ๐Ÿ˜†

unique salmon
#

He just increased TR by plotting a graph. Yep, that's how it works. You plot a graph and your TR increases

ashen light
#

that sounds about right for what happened

bold terrace
#

Or ...

#

He's just saying "I had 2 phases in my reviews, and thanks to LabPlot LBC, I'm able to detect that there was 2 phases"

#

Because the "Process Behavioural Change" detected a change of behaviour ?

#

And the how is not really the point, but the fact he was able to detect it ?

#

Golden rule of Twitter : Don't use twitter to communicate things that should be described with more context ๐Ÿ˜†

ashen light
#

should I make a possibly inflamatory reply: "my thoughts are this is a handwavy graph with nothing supporting it"

bold terrace
#

The confusing part of my interpretation, is the fact he said himself he optimized his params, changed his DR...
... So why do you even need a tool to detect "Change of Behaviour" when ... you are the very source of this change of behaviour

cosmic hedge
lapis hearth
bold terrace
#

Done !

unique salmon
hasty fractal
#

this will be real useful to us students who create cards from material/lectures and also use pre-made stuff.

#

IMO that's a lot of med students. Maybe 90%? Then there are even langauge learners with a mix of pre-made and self-made cards (I personally only know Danika though).

#

@unique salmon your opinion is also welcome expertium. anything u have to say.

hasty fractal
unique salmon
unique salmon
#

I guess we could have 8 values of S0 instead of 4, for (first_review_date - creation_date) < median cards and for (first_review_date - creation_date) >= median cards, but I highly doubt that it would pass the "log-loss must decrease by at least 0.0015 per new parameter" standard that we have set for ourselves

#

And it would be a pain to implement

#

Instead of splitting cards into groups we could somehow directly incorporate (first_review_date - creation_date) into he calculation of S0, but I have literally zero clue how

polar maple
#

do we have the creation date info in the 10k dataset?

unique salmon
#

idk

#

If we don't, then sorata's idea DEFINITELY won't happen

polar maple
#

maybe jarrett has it

unique salmon
#

welp

tepid spoke
#

How would that even work? The creation time of the latest cards I studied was 2 years ago

#

But I hadn't seen them before

wind palm
wind palm
#

You can continue the conversation about FSRS in #fsrs-discussion .

* If you want to respond to something in here -- you are encouraged to forward that message to #fsrs-discussion. If you want the author to be notified you're responding, you'll need to ping them in your response.