#FSRS Megathread
16000 messages ยท Page 16 of 16 (latest)
Can't argue with that
Alright, so here's what I want you to do:
- Filter out first reviews and same-day reviews
- Calculate R for each review and record both R and duration of the review
- Plot duration as a function of R, obtaining 4 graphs like these
- Fit a linear function to the data and save parameters a and b (time=aยทR+b)
- Instead of using fixed time per review, like "7 seconds per Good" for example, use time=aยทR+b
Jarrett didn't want to do this because a and b will either have to be calculated during optimization and then stored somewhere, or recalculated every time we run the simulator
So it's kind of a pain in the ass
But it could significantly improve the accuracy of the simulator, and potentially fix CMRR
(ok, well, maybe not significantly, but still)
That's strange
๐ค
Well, screw the integral
It's time=f(R) time!
I think for low decay the CMRR is doomed anyway
We can do time=f(R) AND use sum(R*S) as well
I mean, since the forgetting curve will "never" go to very low R, well, just adding more cards (and never reviewing them) is more time efficient than having a "healthy R"
If even that doesn't work, then we'll give up
Maybe a "Target R" ?
For example, if the user "Target R" 85%, maybe the optimal DR could be 70% or 80% based on his decay
?
(Since all cards will be [DR, 100], but without a homogeneous distribution between those 2 points)
For example if the forgetting curve was a pure linear line, DR=90% would give me an average R of 95% I guess
Hmmm let me think of a way to express it ๐
The user want an average R of 90%, let's say.
But if you put DR=90%
the average R will be more like 95%
(let's assume between 100 and 90%, it just decline linearly)
The DR is more like the "minimum R"
It's the average R of "today's review"
But in the grand scheme of the whole collection, it's the minimum
Is it more clear ๐ ?
Look, my DR is 90%
but the only the card due today will have an average around 90%
most of them are just between 90 and 100
So now let's take a scenario where going from 100 to 90 takes 1 day, but going from 90 to 70 takes 365 days. and from 70 to 60, it takes 3650 days.
If the user wants an average retention of 70% .... Weeeell since most cards will be stuck at 70% R for years, you could ask for a DR of 68% for ex
So the DR for a card to be "in average at the target R", would have to take into account : Current R, S, D, decay ๐
The whole shebang
Sooo how is it related to finding the optimal workload/knowledge ratio?
๐ My solution succeeds.
It's related that right now, it might be normal that the optimal workload/knowledge will always be to be the lowest R possible ๐
We'll see if the proper version succeeds
Because you can't beat a non zero R with zero review due ๐
So let's add billions of card that you will never review again
if they have >0 R, you'll have a better time than trying to make them float above any "normal DR"
So you need to introduce something like : "Ok user, what do you want as a real average R for your whole collection" ?
If he says 90%, now you know that all those billions cards with 10-20% review have a huge cost compared to his expectation
So automatically, you'll have to find a DR as low as possible, while keeping the average collection R as close as possible as his expectation
Typically, if 100%->90% is linear, a Target Collection R of 95% would translate into a DR of 90%, for ex
Target Collection of 70%, might lead to a DR of 68% (because most cards will drop from 100 to 75 very quickly compared to how long they will take around 68-75)
Problem right now is that a DR of 20% truly means "Never let me review it again" with a decay of 0.1
Soooooo of course the optimizer will want to have a shitton of card with the lowest possible DR, there's no penalty from having it
Introduce a penalty from diverging from user expectation (The "DR" for the whole collection), and CMRR will compute for you what would be the most optimal DR for today's review
(the tl;dr in bold ๐ )
What is done with S right now, is just a way to introduce some kind of penatly "If you don't know it well, you won't have good score". But when you try to optimize knwoledge/workload, and there is a direct way to make workload ~= 0, well, you will need to put S under amphetamine to hope having any kind of effect on that
You're fighting with linear function compared to an exponential reduction of workload, that won't be sufficient if you don't put something that will compensate it. Having a "cost" that will explode when R gets very low, can compensate it
If it's still not clear, I'll find a way I promise ๐
Maybe it should be something like knowledge/(max(workload, minimum workload)), but idk how "minimum workload" would be calculated
I'm not entirely sure it would help because I think right now the workload might be extremely low
It would be interesting to have, for that curve, the value of the Knowledge, and the one of the Workload
I'm wondering of under 50%, basically the workload is like 0
"1 review every 10 year"
And truth is, CMRR would still be right to answer then : Just put DR=1%
Even if you'd recall 60% of everything, your workload would be almost 0
That's why I wrote knowledge/(max(workload, minimum workload))
Like, idk, 100 cards/max(30 minutes, 1 minutes)
Well I think if it's true the user could have 0 workload but still manage to have his expected Collection-DR, why not recommend him a very very low DR, right
I think that's maybe the missing part : Not differentiating Collection-DR and Due-DR
Due-DR might be 80% and you'd get a Collection-DR of 90%
what
And the optimal way to set your Due-DR would be based on your expected Collection-DR
hehehehehe
Still not clear ๐ ?
whaaaaaaaaaaat
The Due-DR=90% which lead to a Collection-R of 95%, is it clear ๐ ?
Then you're just re-defining DR without achieving anything
We're trying to find optimal DR, not redefine DR for no reason
You don't achieve anything by redefining it like this
But optimal Due-DR based on what ๐ ?
Knowledge/workload
CMRR is working already perfectly fine for this, no ?
If the guy has a decay of 0.1
And it will take 1y to go from R=100% to 70%, and 100y from 70% to 1%
CMRR is perfectly right to advice you to set your DR to 1%
I'm just gonna wait for @cosmic hedge to either implement it or say "nah, too much trouble"
Because it reduce workload, since by nature, cards won't go very low in terms of R
So by defining a "Target R" for the collection, CMRR would compute the best DR to have for due review ๐
If he wants a 95%, a DR=90% might work. If he want 60% but his decay is super super super low, then maybe even putting a DR of 40% might be better ๐
You need to understand the problem well before trying to find answers ๐
Right now you're just trying to find a function that give you something you think would be better without interpretating the numbers
@cursive badge / @polar maple already suggested it, but feel free you two to tell me if it was not your interpretation ๐
if i understand correctly, the user sets something like "time spent above R > 0.9" and CMRR optimizes for this?
if it's this then @unique salmon should like it since it now ties back to sum(S) formulas
Basically he would set "What is the DR I want on the collection level"
Typically : For this exam, I need to know 90% of all those cards
ok so the real review time would be like 85% or something
Setting a DR of 90%, would lead him to a ~95% R on collection level
Exactly
I say Due-DR=90% lead to Collection-DR=95% by taking the assumption the forgetting curve is a linear function between 100% and 90%
Again, that's just redefining DR
It's not getting us closer to finding what DR is the best
No it's not at all
It's really 2 different values
that relate to each other based on the parameters of an user
And in fact it is very close to the integral interpretation
Because if an user wants to have 80% DR on a card in average, you have to account the time where he had above and below that 80%
It's just that the integral right now is used to have some kind of Score where the minimum score would be having a "R=0"
But this doesn't change anything to the fact that if the guy will never go to 0 because of his decay, then CMRR will be able to maximize its Knowledge/Workload just by ... almost zero-ing the workload ๐
When you think about it, and you realize it, you realize it's very dumb what is done right now
You ask how to minimize a function that is purely increasing for low decay
And you try to find way to change it, because the score is relative to ... R=0
Which would lead to a 0 workload ๐
I'm sure you can do better than that @unique salmon ๐
in a certain sense this is similar to interpretations you could come up with for sum(S * R)
but now we allow the threshold for S to be user-defined rather than be fixed at 90%, also the definition is now slightly different at a collection level rather than a card level
Yep, sum(S * R) is simply giving a score when the zero-score is giving to either having 0 stability or 0 R. But 0 R can means having an almost-zero workload
I mean, having R=10% is still a "positive score", so if the workload is ~=0 while having it, of course CMRR will take it
Would be dumb to not take it ๐คท
I mean wouldn't it a good case to use logloss ?
i don't think that log loss has justification in this context
@bold terrace so what exactly would the metric be after a simulation is done? time spent such that the collection's average remains above the threshold R? i suppose this exact version could be cheated by only learning 1 card and learning it very well
Yeah collection average is bad
Could be computed on card level though
What is the average retention you want for a card ? Let's say 95%.
Then CMRR goes brr and compute what would be based on the forgetting curve to reduce as much as possible workload while achieving that 95%
Maybe it could be 90%, maybe it could be 80% for more aggressive decay
Because let's face it
If you ask the user "What is the lowest DR you want" (not the average), well the minimum DR recommended would be the DR, so yeah in that case, we're not solving anything
But asking "What is the max K/Workload can I get ?" when Workload can go exponentially closer to 0 with every bit of R lost, of course the "Score Function" won't be able to compensate it
Except if you taylor it specifically to make the graph convex again, but then you're just building function to get what you want, wouldn't be less biased than doing a "return .85"
But once again
First we need to know what is truly the goal, if you say "Minimize Knowledge/Workload", well CMRR is doing it perfectly fine right now : Set your DR at the lowest value possible if you have a low decay, NO card will ever go to 0%, so just never do any review, just add new cards ๐คท
So either redefine the goal, or just remove CMRR as it is
(Well actually CMRR can work fine for short-deadline, no new items to add, and with high decays, so it's not completely useless)
Alex will probably disagree, but I'm fine with fiddling with how workload and knowledge are calculated if in the end we will make CMRR output diferent values and not just a constant
If it's expressed more or less in the description of the calculator, why not
As long as for different users CMRR outputs all values between 70% and 95%, I think it's fine
We don't need to tell users how it's calculated
We can add it to the manual, but not to Anki itself
return randint(70, 95)
If someone really cares - go read the manual then
Yeah sure
btw i think that if you use a minimum workload formulation then only the minimum workload will be used
But personally I used it and trusted it once in my life, never anymore ๐
I think it's when I joined this discord to try to understand FSRS a bit better
gosh I got f***ed badly by it
"Drop your DR to 70%"
I did
My R for the following days : 55-60%
We need to either make knowledge decrease faster than workload for low DR, or cap the workload at some minimum value
That's the day I learnt to not trust too quickly what FSRS gives ๐
If as DR decreases, knowledge decreases faster than workload, that will solve the problem
tbh i always thought from the name that CMRR was a lower bound on what your desired retention should be rather than the optimal value for it
It's kind of both
It was called "Optimal retention" initially
I don't know how easy it is to determine at what R, and for what Decay, the derivate of the forgetting curve is almost 0
If we call it "Optimal minimum recommended retention", then it's confusing. And hard to remember
If we know it, for example if we know that point is 72% for decay=0.1, you could then just bound the output of CMRR to [72%
just put a recommendation that 85% is the optimal retention
Problem solved
if someone asks why then say cuz i chose that
I mean, that's already the case with 90%
Yeaaaaah IMO the 'cost' of failure is WAY WAY WAY more than what we expect
well a lot of ppl seem to think
90% is the default for literally no reason other than "90 is a cool number"
90% desired retention is remembering 90%
Also, the shorter the interval, the faster you detect the flip-coin reviews
i swear like 90% of people think this
IMO there's no equation that describe well how getting things wrong at long interval is fucking bad
Why would they. They are a competitor of Anki and It may not play in their favour
Actually, I think the other reason is that 90% (well, 10% forgetting index, but whatever) is the default in SuperMemo
SM2 was into something when their logic was "You lapse ? You start over ๐ "
If their neural net outperforms FSRS, they can boast about it
Dude. I'm starting to believe in 95% DR
There's a guy in the #language-learning channel, @severe storm , he explained how he was working, 2-3 learning step of 30min to make sure no cards are just "flip coins", and then high DR to make sure everything get stabilized ASAP
He's doing 100 new japanese words/day
100 is a lot
And he did it for a long time now
@polar maple
What about your neural net
Expertium pinged you. Dekki ai is seemingly able to incorporate a neural net with flashcard reviewing.
The whole "Find your optimal Knowledge/Workload" is for lazy mindset people
"I want to become fluent in 20min/day"
๐
I just changed it to this configuration a few days ago, but so far it seems to have been the correct choice
"I'm not lazy, I'm an optimizer"
FOR NOW
maybe next year
That's the fucking key point : FOR NOW
id try this strategy
this is literally me
unfortunately
its too late for me in this year
I bet that if you survive the 95% at first, you will stop half-acquiring things and your overall stability will increase
how many reviews do you spend on a new card
my average is 3.4
the motivation isn't there, jarrett wouldn't want a nn for FSRS
@severe storm
But at least release it for the benchmark, man
I want shiny numbers to write about
You're like 90% done with it
Come on
Looks like 4-5 due to the extra step
Today was 2.5 but that's because a bunch of new cards were immediate easy skips
but my question is maybe
this is a good language learning strategy but idk about med
how much time on avg for 100 new cards excluding reviews
Yeah I suspect that too
I don't do this strategy for my grammar cards
I don't feel the need for it there
would you ever drop the retention
and when
and also
are you sure its the 95% dr working in force
or is it the 3 learning steps
๐
@quasi shadow Is FSRS not compatible with an NN
@robust hill you're free to ask of course but there's a lot of discussion with @severe storm from the past day in #language-learning , might be interesting to check ๐
okay will read
In a few days when my anki recovers from the huge shock from my reform I can say more on how the reviews are looking compared to before the change
perhaps one of you two could show me where it starts ๐ฅน
It's not that it's not "compatible", it's that it's against the philosophy of FSRS. Me and Jarrett agree that FSRS should remain interpretable and with few parameters
And realistically, the benefits of the neural net would have to be enormous to justify the switch
Also 2 relearn steps
SM-2 -> FSRS is a fairly easy choice
FSRS -> neural net is more debatable
But yes it is too early to say
My interrogatory started here #off-topic message
It's in off-topic sry
The only immediate effect I can determine is that my true retention went up by like 15%
but how much extra time
And yes, Alex's net is a lot more accurate than FSRS, but FSRS is good too. At this point improving the accuracy of predictions is unlikely to lead to tangible benefits for users, though I don't have a good proof of that
I cannot say yet because switching to 95% gave me thousands of reviews that I all did in one day so now my future dues are completely messed up
When it stabilises I can say more
And unfortunately I will never be able to give a more scientifically objective evaluation because I went from 50 new words with 2 cards each to 100 words with 1 card each
So I can't directly compare
And isolate the effects of 95% DR and extra learning steps
It's because you don't allow learning steps of <10seconds (/joke)
Now that I deleted the second card type all of my past history looks like I only did half the work
And both might help too. I've been discussing (well, alone) about "Knowledge-Stability" (flip-coin reviews) vs "Time-Stability". Some words, you might get them wrong 20% of the time because you "guessed wrong" between 2 options. In those case, having too few learning steps + low DR means you might succeed them just enough so they grow a bit in terms of interval, but then they crash suddenly
And my retention back then would be higher ceteris paribus than it would be now because there were two cards per note
Despite this I currently have fewer relearns in terms of percentage than back then. This is also despite the fact that every single relearn has an extra step now.
But I cannot say anything definite right now
I meant isolating the effects of 95% and relearn steps from other influences which are affecting my retention rate.
That they're both helpful is evident
If I hadn't changed anything else about my routine then I could directly compare and therefore say with confidence exactly how 95% DR and extra learning steps affect the bottom line
But did change other things, and not insignificantly at all
So I need to go a little bit off of vibes
i checked between and older version of RWKV and FSRS-5-recency for the first 100 users, if you take their predictions, then the RMSE between these two is 11%
Then again, being able to schedule short-term reviews accurately and not having "optimize" are pretty damn tangible
so in a sense RWKV and FSRS-5-recency diverge in their predictions by ~10% in R
nope
Is 3 learning steps something people do at all? I've never heard of it and I just randomly got the idea when I saw the optiom
IMO number of learning step mean NOTHING compared to how much effort you put in those
We have certain people here I won't mention that have around 20-30 learning steps / new card
๐ฅ
Failing every 10 seconds
i tried it for my language learning
and i think the cards stick
I don't think that's useful
What's making my third step useful is that it's after an hour
๐ฅน
It ensures that the new cards only pass if you actually remember them after doing something outside of anki and waiting an hour.
I wrote about the 5 benefits of using Alex's net before
- We can make R more accurate
- We won't have to show parameters, which means one less thing for users to worry about
- We can support proper same-day scheduling instead of the current mess
- We can throw in new input features, like time of the day, workload, etc. Not just interval lengths and grades
- We can remove "Optimize", which means even less stuff for users to worry about
1 and 4 are probably not super important at this point. FSRS-6 with just interval lengths and grades is fine
2 is nice
3 is great!
5 is nice
So we have 2 questionable benefits, two nice benefits and one great benefit
Idk if this would be enough to convince Dae, probably no
You wouldn't need it
Placebo
rand, (already optimized, optimized now)
Alex's net would be pre-trained on thousands of users and then used "as is"
ifclickedbefore2days
write
already optimized
Ifclickedafter2days
write
optimized now
๐ฅ coding
interpretation: if FSRS predicts 90%, RWKV might predict something like 82% or 98% on average
so RWKV is better able to separate the cards, probably
Yeah, the AUC is a lot higher
Just release the thing ๐ญ
Stop edging
for this one i was testing RWKV the curve version
Bro is edging so hard
He shared the metrics with me in DMs, but refuses to release it for the benchmark
ill do it sometime
Anyway, I wonder whether Dae would care
About all that stuff I wrote about
apparently i don't have the RWKV-P file at hand so i can't do the same rmse & abs calculation with it
You should compare it to FSRS-6 btw
yea i had a look at the raw probabilities, a lot of what is pushing RMSE is FSRS-5 under-predicting prob with the high decay
IMO having both RMKV and FSRS would be awesome ๐
Being able to select one or the other
But I guess it will be on a global level again ๐ฆ
๐ The irony
LSTM is a nicer choice for a drop-in replacement i think
If we're seriously integrating a neural net into Anki, it better be the absolute most accurate beast ever
fair enough
Make RWKV-1B (1 billion parameters) that only people with an RTX 4090 can run ๐คฃ
So that people with beefy PCs can make their algorithm even more accurate
i cant train 1 billion params
if RWKV does go into anki then it would probably be a smaller version for weaker devices
maybe 10k params
sure on my cpu i can run 200 reviews per second but idk about phones
I semi-regularly see people on r/Anki saying "I have an Android 4 (or 5) device, why is AnkiDroid not working?"
Well, ok, I've only seen it 2 times, but still
2 is alot in this context
If it works anything like LLMs 1B could easily run on a phone. It would just take 5 years to process your initial revlog ;p
are we getting gpu-accelerated anki?
lets go
if it uses a gpu it might even get some vc funding now
I am looking to it as a way for a makeshift short term memory model
But so far, there is all talks and nothing is coming to fruition
๐ฅฒ
no
but Anki is if the nn has a reasonable forgetting curve.
for the preset in whcih i learn languages
in preset of physiology mcq optimization
which is just straight memorization
seems like it might be better to learn stuff instead of spam memorization
Does the Dekki ai have a reasonable forgetting curve
has someone actually tried testing it
No clue, I don't think the guy shared the code anywhere
I'm getting absurdly high intervals on fsrs
stuff like, 26 days apart for stuff I've barely seen
here are my parameters:
3.6981, 6.5516, 15.4492, 27.3702, 7.2747, 0.4875, 1.5402, 0.0010, 1.4661, 0.1985, 0.9395, 1.8605, 0.1889, 0.2166, 2.1902, 0.2315, 2.9898, 0.4861, 0.5830
my desired retention is 90%
What is the reviews done on that card for example ?
But if you want to check a bit yourself, you can put your params there https://open-spaced-repetition.github.io/anki_fsrs_visualizer/?w=3.6981,6.5516,15.4492,27.3702,7.2747,0.4875,1.5402,0.0010,1.4661,0.1985,0.9395,1.8605,0.1889,0.2166,2.1902,0.2315,2.9898,0.4861,0.5830&m=0.90
(I already prefiled it for you)
The first 4 params are your initial stability
It seems FSRS things that if you press Good, 15d Stability is a good interval for you to go to DR=90%.
What's your DR ?
Could you also press the Button "Evaluate" and give the logloss/RMSE?
.
Sorry ! So yes, those 4 numbers represent your first interval for all 4 buttons
0.2490
This is logloss or RMSE ?
Logloss I guess
Since the output is like
Log loss: 0.3520, RMSE(bins): 2.90%. Smaller numbers indicate a better fit to your review history.
Well 0.2490 is pretty good !
So it means FSRS should be able to predict quite well your intervals
If you think it's still too long, maybe put a bigger DR ? Like 95% ?
(Typical example why having Evaluate helps troubleshooting much faster situations like this @unique salmon xD)
And it will be even better with the "health check" so that you don't have to think whether the numbers are good or not ๐
well for some cards I just answered it good like 3 times I think
won't that make stuff pile up
What can often happen, is that if you have very few case of cards with 3 straight good answers, and they are in general never failed, FSRS will adapt to make you have longer intervals for those. In general, even if it feel big, it is quite accurate
It will ! Personally I'd suggest you to wait for those 26d and see if you still recall those cards 90% of the time
This is what your interval could look like if you continue to press Good
I still set my retention to 91% just in case
yea 91 not 95. it's a good in between number
If you use "Hard" it seems FSRS is tuned to make your interval grow much much lower
so it's also an option, if you feel your retention is wacky
But be careful that if you press "Hard" but then succeed 100% of them, FSRS will learn to not care about your Hard and improve the multiplier
I thought I weren't supposed to use the hard button, since it fiddles with rhe vard ease
With FSRS it's not an issue
so what am I supposed to do there?
The ease hell is something related to SM2
If you felt your retrieval was lacky, you really had to think hard to get it, but got it right : press hard if you really want to make sure your next interval is lower
IF you don't know : press good
Pressing Good is 99% the right answer ๐
why not again?
Well if you failed it of course
oh ok
But if you tell me your next interval is bigger, I guess you recalled it right
Also : Don't confuse Hard/Again. Again means you failed, Hard means success
I know lol
If you use Hard as a fake Again, bad things happen ๐
since it implies uou had trouble recalling the material
You see, you have 2 params for Hard/Easy
Basically, if you press Hard, you'd get a 23% multplier on the increase (thus why you do +4 instead of +17, 4/17 is 23%)
Are you pressing "Good" on the initial review?
yes lol
why, am I not suposoed to do that
Well, if you tell Anki/FSRS you already know the new card well, by pressing Good, of course the Intervals will be long
Is "Good" the Honest answers? As in, did you already know the material on the card, and your recollection of it was Good?
*Except if in the past, you had a lot of card you pressed Good as a first review, but then you failed all of those miserably after the elapsed interval
If you literally just learned the answers on the backside, and didn't know it before, the only correct rating is Again
Personally, my starting stabilities are : Again (0.13d), Hard (1d), Good (3d), Easy(31d). Easy feels enormous, but it's normal, if I press Easy on a new card = I already knew it from well before
I don't think I ever pressed Easy on a new card
So whatever number FSRS puts there is meaningless
FSRS needs to put something there, so it has some formulas for interpolating missing S0 if you have never used some buttons during your first review
I'm studying a hard subject. I'm studying vietnamese and how Chinese characters (chu nom). I create cards off stuff I see while immersing and generally since I know the context the word appeared in I have a pretty good short term memory recall of the word. this way I can often guess what the word means on my first try since I've already seen it before. notice how I said short term; I offen forget those words after a few days
What I do in such cases is just instantly bury the new card instead of reviewing it
Personally I'd press Good, but then that's why FSRS adapted to give me a initial stability of 3d for first good
good idea
and 31d for easy
In the end ...
It really doesn't matter
(Said some Chester guy)
IMO we don't trust enough the algorithm. I suspended some cards that had 2-3 of stability. I reintroduced them 1-2 months later. I got most of them right ๐คท
Well, I went with "Just trust the algorithm" for a really long time, and it went poorly
I think with the trainable decay of FSRS-6, it should be better to model those kind of situation
so maybe I can keep doing what I'm already doing?
How much review do you have ? When you press evaluate it tells you ๐
If I'd have trusted the algorithm blindly, I'd now have a ton of cards scheduled 5+ years or more away, while having already forgotten a lot of them
1,112 reviews
yea im a new anki user
With that few reviews, it might be better to stick with default parameters for now
It's still very very young, so I'd say : For now, keep a good "review hygiene", meaning, try to be consistent in how you review things, and the algo will adapt to you ๐
you already havea very low logloss so it's very good
I don't know your RMSE ?
4.98%
The RMSE is more or less "The percentage that FSRS could be wrong"
5% is decent ! It means that "more or less" (it's a bit black magic) if FSRS think you'll get 90%, you'll have around 85-95% retention
With time it often drop around 3%
So you do great ๐
[I'm reading from the top, it's been busy here, so if I'm missing things I'll see them shortly.]
The clouds are starting to part, thank you!
The gap I am struggling with is that we're not teaching FSRS how to do something. FSRS is building a model based on our data. But because the model makes assumptions and predictions, FSRS doesn't just memorize all of the problems in the book, it creates its own theory of how addition works based on those problems and uses that to predict what the sums will be. So the thing we want to test is FSRS's theory against the original data to see how far off FSRS's model is from the real answers. [I suspect there's still something I'm not getting about this ...]
So now that I understand the train/test-split-the-data idea -- I don't know how much use that would be. I definitely see how it would be a purer and more robust test, but there would have to be a way to split the data that gave you 2 full data sets. But if both sets were full enough and thorough enough to match the user's habits and give a great exemplar ... wouldn't they be similar enough to each other that there's no point in separating them?
The risky part with an early interval that's so high is that if you actually don't remember it well, it'll take really long to come back up
So you could only evaluate the parameters in the next month with new data.
Would it surprise you to know that this is exactly what I do? ๐
After I use my parameters for a month, before I reoptimize, I run Evaluate -- testing the old parameters with the additional new data. I have the Evaluate result from last month to compare that to. Then I reoptimize and Evaluate again to see what changed.
That's what I'm kinda stuck in right now. I have a lot of cards FSRS was wrongly really confident I'd remember for 1~2 years or more, and they're now slowly coming out, completely trashing my actual retention percentage
which in turn makes the FSRS optimizer "panic" in a sense, and making my intervals incredibly short where it produces an almost unmanageable amount of daily reviews
Yeah you're absolutely right on the fact that FSRS wouldn't be able to overfit (cheat to get the right answer without caring about the future) like crazy like another kind of algorithm. But if we want to compare FSRS and the algos of @polar maple, or any other algo, it's good to have the same "testing hygiene"
Yeah but the good news is that FSRS will adapt.
Do you reschedule some of your older cards ?
You could do partial-rescheduling
it can help to reduce the issue
I rescheduled all with an interval longer than 6 months, about 2 months ago
Ok cool !
so in 4 months, I should be caught up
how to reschedule cards on ankidroid?
I think your case is also very specific since you learn Kanjis in a vacuum right
But I'm out of new cards now, and STILL getting more reviews, not less
I don't know about Ankidroid that much I'm sorry ๐
The easiest way is on desktop with the FSRS Plugin
Well, the deck contains ~2000 Kanji, and ~7500 vocabs to reinforce the Kanji
yea ik how to do it on desktop
Ahah I now that tsunami feeling ๐
The Vocabs are in a vacuum, but the Kanji have the Vocabs as context
if the only way to do it is on my pc then I'll definitely grab it soon
Yeah but what I mean is that your exotic kanjis, you won't see them between reviews
what deck is this? wanikani?
Well, I see them in reviews of the Vocabs that contain them
In my case, most cards with 1y interval don't stress me at all because I'll probably see those dozens of time in books
It's WaniKani, yeah
I don't really stress about forgetting a Vocab, since WaniKani isn't geared towards them primarily
But I also forget a lot of Kanji, which greatly upsets me
Remind me what's your DR ?
I dropped from 90% to 88% now, since otherwise the review amount is too unmanageable
Maybe the trainable decay will help a bit too
But in reality I'm howering around 80-85%
With "Young Cards" usually around 90%, and mature ones 80% or less
Don't forget the LB will make your actual R drop a bit more
The main issue I feel like is actually fatigue
I fail notably more cards towards the end of a review session than at the beginning
With the order being random
have you tried using pomodoro to space your leaning over multiple sessions
You do a lot of reviews/day?
I know people don't like that but ... you can always suspend cards ๐
I did it and it felt super great
300 cards over 3500 sacrified, 25% of my workload gone
That doesn't make them disappear though, and I need to know them eventually
I will need to eventually pass N1
If you get burn out it won't help
Yeah but N1 you can get it by understanding the language
Not by knowing exotic kanji ๐
N1 calls for even more Kanji than on WaniKani
so I'll need to somehow learn another 400 or so
N1 feels still a bit difficult, I'm like 10% good answer only
But N2 is more like at 20-30%
A lot of those Kanji are utter nonsense you'll almost never need
but it's in there...
I think if you can read japanese you won't have any trouble with N1
not true, FSRS can overfit, I'll get some of the numbers later
well after the parenthesis there the "like crazy" haha, but maybe you refer to RMSE cheat ?
"Being able to read Japanese" isn't that simple
I can read a kids book or regular manga
but not a financial report or research thesis
Only slightly. I'll send you a file with FSRS-6-recency with --train_equals_test tomorrow, so you can compare the numbers
N1 didn't felt that high level anyway though
The problem with N1 is that it does not test production AT ALL
A lot of people succeed N1 without being fluent at all
so you can pass N1 and not speak a word
But if you're fluent, you should be able to succeed right ?
IIRC fsrs with train = test roughly matches it with LSTM
If you pass N1, you should be able to have decent ability to listen and read
but you can be completely unable to speak still
okok
not that rare even
but still my point remains : There's no point burn out-ing to do N1 sooner
Except if you have a job offer that impose you to succeed it right now
I'm pretty sure the response was, "nah (the entire server is on fire on a daily basis and until folks can behave like grown-ups, we can't take on anything new)."
We seem to have gotten past that issue though, and I've had everything ready to go to switch to a channel for a while now. But every time I try, this megathread is in the midst of discussion/debate/complaint/protest, and there hasn't been a good time to shut it down and shift over. If I could copy or convert the thread into a channel, and you could just keep rolling along, that would be perfect. But Discord doesn't believe in that. So I keep trying to catch a pause. Y'all have been victims of your own volume.
cc: @clever cargo @hasty fractal
[There was also the issue of the desire for the channel to be help and dev which wasn't going to be a good fit outside of help, unless you were going to answer all the help questions. That seems to have solved itself by the basic questions being posted in their own #1266615749779390474 threads, and only the most extreme and exhausting questions getting dropped in here for y'all to take care of ... which suits me just fine.]
Well.... N1 is what separates me from a Work-VISA
N2 is not enough, point wise
need N1
I'm gonna screenshot one of my cards to use as an example, since I study kanji too
this isn't right, forget this intuition
I learnt it from @unique salmon
๐
Well, let's say that's how I wish RMSE could be interpreted
In a perfect world of interpretable machine learning algorithms and metrics...
And we had to write something in the manual, so here we are
if you can remember kanji as they appear in compounds but not when they're alone, what I suggest doing is putting vocabulary that includes the kanji on the front of the card, like I do (okay, that's vietnamese not japanese, but you get the idea)
@wind palm ๐
I went the opposite road with time, I removed ALL context (or left very few, like conjugation) because my brain was able to use very stupid details as way to remember things
For example this kanji I'm sure my brain would remember it as "Ah yeah, there's a ๅ ฌ at the botton top right kanji"
But maybe we should shift to #language-learning (Well personally I'll just go to sleep)
okay, gn
I was about to say... That Kanji does not exist on Jisho :D
that's because it's not a kanji lol
it's a chu nom
basically a Chinese character coined by the vietnamese
I hear you -- but that all sounds more significant for important things like Benchmarks, not for individual users.
But yeah, this is what I'm looking at: https://japanprcalculator.com/
If I punch in my details, with No JLTP level, I end up at 60 points. 65 with N2, 70 with N1
Calculate your eligibility for permanent residency in Japan with an easy-to-use Japan PR Point Calculator.
and 70 is the minimum
I guess being software dev since 10 years already bring me to 90 points
(9Mยฅ, that's the offer I see passing)
My example of overfitting was a bit of an extreme one to help get the point across. As Sound said, it is more relevant to neural networks that do have the ability to just memorize answers if they have enough parameters.
A key point is that we are teaching FSRS something. We give it a good starting point and only let it tweak the rules a little bit but we are teaching it: "this is how you need to set all the dials on the scheduling machine to make it work best for this user"
It is possible that when you change these dials it works slightly better on the reviews you have already seen, but slightly worse on new reviews that you have not seen yet.
The way you would do the test/train split is by choosing parts of the revlog for a preset that are only for optimisation and parts that are only for evaluation.
e.g.:
- Evaluate using the last months reviews, only optimise on reviews more than a month old.
- Mark certain cards to only use for evaluation, use the rest in optimisation.
That is one of the reasons why I was in favor of removing Evaluate. Metrics are great for benchmarking (for me and Alex and Jarrett), not so great for average users
The thread is moving quite fast today. I have no idea if I have repeated someone while writing my response ๐
We need sub-threads
FSRS channel with it's own threads 
Just sayin'
(mods plz)
N.B. Danika just responded to this. Ironically you may have missed it in the flood of messages:
#1282005522513530952 message
I saw it
That's now how I interpret it for people either, if that's what you're worried about? I wouldn't say it's about whether you will get the answer right or wrong -- that's your retention.
RMSE is about whether FSRS can/will/does schedule the card at the right time -- or how often/by how much it misses the right point in your memory curve. FSRS predicts what days you'll get the answer right and what days you'll get it wrong. If FSRS determines that the 10th is the right date to study a card with that history, and you study it early, on the 5th, and get it wrong -- that's a miss. But if you study it late on the 15th and get it right, that's a miss too.
So RMSE 5% means when FSRS schedules a card the interval could be up to 5%-points too soon or too late. [It is 5-points on the scale of R, not 5% of R, right?]
๐
@polar maple how do I explain that we can fudge RMSE binning to make RMSE give us (almost) any number between 0 and 1?
It is possible that when you change these dials it works slightly better on the reviews you have already seen, but slightly worse on new reviews that you have not seen yet.
Okay, okay, okay. I get that.Thank you!
I always feel like it is easier to teach these kind of things in person with a whiteboard / notepad to scribble diagrams. It's hard to do only using text and having to write full responses ๐
It's not related to intervals at all, only to R. It would be great if the interpretation was "5% RMSE means that if FSRS predicts a 50% probability of recall, it could be 45% or it could be 55%", but that's not really the case
The exact values depend on how we choose the bins. And we can choose them in pretty much infinitely many ways
At least log-loss is nicer in that sense, it just has one formula that everybody uses
Yes, just about R -- but R is changing day to day, and the interval set so that the day you study lands on a particular R value, right? It might be 5-points of R too soon, or 5-points of R too late.
Honestly, I recommend just not worrying about the interpretation of metrics. Alex will agree
This is also why a health check with Good - Acceptable - Poor will be nice: no need to think, here are simple words in plain English
I was only trying to write it because it seemed like you were asking me to. I haven't needed to explain it in my own words ever before that.
I totally agree with you, lol
I can tell it's going to happen. I just think you're going to need to make sure there's a good army of helpers who understand what to do with those simple words when users ask. Because they will ask. I look forward to those lessons! ๐๐ฝ
For "Good" and "Acceptable" we don't need to do/say anything. For "Poor", I've said this before: writing a list of advice in Anki itself is not going to be helpful. Most people will not read it and will treat it as noise. People who use Anki in weird ways - doubly so.
And tbh, even I am not sure what to say, other than "Don't misuse Hard, answer honestly, don't just use Anki as a note taking app ignoring the SRS aspect"
If someone's "FSRS health" is Poor aka FSRS doesn't fit their review history well, idk what to recommend
It's hard to identify what's causing it. The best I can think of is "Don't misuse Hard, answer honestly, don't just use Anki as a note taking app ignoring the SRS aspect", as I said
I always worry when I write a long explanation "did I lose them near the start and now the rest of this is useless?" ๐ It's so much easier when you can interrupt each other and draw silly diagrams.
I really enjoyed doing one-on-one teaching at university.
- Don't just use one mega preset/ a million tiny presets. Try to group similar cards together.
- Maybe your cards are bad and need re-formulating.
- Watch out for interference and try to address it.
- Make sure you are actually learning your cards first (don't just brute force out of the Learn stage).
looking at only the R for a particular day does not tell the full story about individual card scheduling, for example maybe you studied two cards that day, their true underlying probability of success were [0, 1] and you predicted [0.5, 0.5]. While you got the predictions wrong for individual cards, the average number of successes for that day was also 0.5 since from [0, 1] you get exactly one Again and one passing grade
now for longer time intervals, if you are looking at R for the past year and you are studying for 2 months where last month you got 85% R whereas your target is 90%, then an algorithm could possibly start scheduling at 95% in order to balance out the 85% in order to achieve a perfect 90%
now to be clear this isn't what FSRS deliberately does, but FSRS could be doing it by accident sometimes
these exact issues also exist in RMSE (bins) which is why i'm not a fan of it being considered the primary metric
Man, imagine how nice life would be if we were doing binary classification
Just use accuracy as your primary metric. Super intuitive and easy to understand ๐คฃ
...unless you have a heavily imbalanced dataset
Then you can get 99%/99.%/whatever% accuracy just by predicting the most common class
Actually, is there ANY machine learning task where you have a nice and simple and interpretable metric with no caveats? ๐ค
i also added FSRS Helper for the first time and "rescheduled all" suddenly gave me a big backlog why is this so
i think i did not understand what the reschedule all button does ๐
Is there a way to calculate the third recommended learning step from this
@polar maple here's the output file for python other.py --algo FSRS-6 --recency --train_equals_test: https://drive.google.com/file/d/1lFCBa9kPFOB7yqQBBDfJiMM8Kymk5OrQ/view?usp=sharing
You can compare the numbers to the regular FSRS-6-recency to see how much it overfits
@quasi shadow I think we should ask Dae to be the judge and decide whether we're keeping the "data scientist" Evaluate or the current Evaluate
Oh you know what ? You just made me sparked an idea : Iโll try group my card based on Leech status based on my algo. Splitting based on FSRS D already helped a lot but Iโm feeling that using a split based on past performance pattern could lead to an even better split
If it works Iโll modify my addon to do automatic deck switch for cards based on rules like leech classification
Why is it so hard to find the "Create Deck" button though ๐
You check all the top menu, the right click result
then you see it's just ... below
with the "below menu items"
the items that need to be below, you don't know why
https://github.com/ankitects/anki/pull/3962#issuecomment-2841720804
Alright, how long do you guys think it will take Dae to respond? ๐คฃ
He was actually relatively quick with responding to this PR, so maybe a few days
Uuuuuuuuuuuuuuuuuh
I managed to get WORSE log loss AND rmse after an optimize
isn't it a bit strange ?
The cards from the deck changed though, so maybe the final check to take or not the params depends on something else ?
LogLoss : 0.4422 -> 0.4463
RMSE : 7.08 -> 8
If the cards changed, then yes, the comparison is not valid
You have to evaluate on the same cards
Ok !
But then it's funny because training on older cards made better parameters
I would have expected that the outcome of the Optimize, would not pick a worst curve, just because it was trained on different data ๐ค
But anyway, it seems FSRS D is a better clustering metric than my performance_drop_count/ratio to train different parameters on ๐
Which is quite obvious since in the "all together" scenario, D was already some kind of clustering variable (Based "Again numbers")
Honestly, at this point I'm more worried about CMRR. We are severly understaffed - Luc is the only one doing CMRR stuff - and short on time because of the beta 
Lol
Am I some kind of niche Internet micro-celebrity now? ๐คฃ
Petition to have this in the doc somewhere ๐
I have my little search query from: expertium mentions: jarrettye reset to go check it each time I'm wondering if moving reseted cards from a deck to another will mess or not the optimize
Adding it here would be fine right ?
Isnt this the opposite of what was said earlier ๐ ?
I mean, it's what is written in the doc but people seem to say it shouldn't be used as is ๐ฆ
Yeah, we should remove that from the manual
hey! I just recently switched to FSRS (ik im late ๐ญ ) and I was wondering how is the "reschedule all cards" on the fsrs helper add on is different from when u click "reschedule cards on change"
They are the same, but because Anki has fuzz, the results of using the built-in reschedule and the add-on reschedule won't be identical
what's fuzz again?
Your intervals are slightly randomized
https://docs.ankiweb.net/studying.html?fuzz-factor#fuzz-factor
Anki's user manual. Anki is a flashcard program that makes learning easier.
is that for cards to not be shown on the same day
oh okok I was kinda ish right lol
so if I don't click on it how long does it take for all of my cards to become fsrs?
Hard to say
Depends on how many cards you have in total and how many you are reviewing per day
Simplified example: let's say you have 1000 cards and you review 100 of them per day. Then it would take 10 days, assuming each day you review 100 other cards, not the same cards from the last day
ah so however long it takes to get them all done maybe?
so if I were to do all 1000 in a day then it would convert all 1000 of them in that one day?
Yep
wait but wouldn't that have made all 1000 of them converted anyways if Anki showed made it available to me to review on that one day?
or is it the 1000 that just happened to be due that day maybe the next day I had 2000 due that weren't there the day before and those would not have been converted yet?
If you review 1000 cards, you review 1000 cards ยฏ_(ใ)_/ยฏ
I'm not sure what you are asking
sorry my b im not the best at articulating my questions haha ๐
uhhh ok
so if let's say I switch to fsrs, but I don't click the "reschedule all cards" immediately, then Anki will slowly turn all of my cards into FSRS right?
Yes
so how long that process takes depends on how many cards I study per day
And how many cards you have in total
oh okok
so if I have 1000 cards in total and let's say only 100 of them were due today and I did all 100 of them, then 100 of them would be converted and 900 would not have been converted?
Yep
and then if 200 of them were due tmr and I did all 200 of them then those 200 would be converted and id have a total of 300 that were converted and 700 that weren't
so a card gets converted when I do it on its current timeline due date without fsrs?
A card gets "converted" whenever you review it while FSRS is on
oh thats a good way to put it lol thanks
@JarrettYe That is precisely what I meat, the conclusions are thrown by this metric, when there is a lot of data that cannot be compared. I actually asked Piotr Wozniak and that was his quick reply:
"To use this: sm18/systems/{collection_name}/stats/SM16-v-SM17.csv is a good idea, but the
Wow, Woz responded to the comparison between SM-17 and FSRS.
https://supermemo.guru/wiki/Universal_metric_for_cross-comparison_of_spaced_repetition_algorithms
The universal metric is basically just binned RMSE though
Go tell Guillem that
On that page : "For example, the cost of forgetting is much greater for memories reviewed at large intervals"
My boys
For CMRR : R*S^2 ? S for "I know it better" and a S-square because "I know it better and the cost of getting wrong at that stage is bigger"
But basically instead of "Cost for Again" being a static factor, it could be expressed through S
I want something interpretable, though ๐ญ
Sure
But the concept is nice. Higher Stability = More Cost
Independently from willing to know better
Or maybe through those parameters ๐ค
@cosmic hedge I want you to try the stuff I described yesterday, with time=aยทR+b, and then also try sum(RยทS) on top of that. So we calculate time based on R AND use sum(R*S) and not just sum(R). In other words, we change both the numerator and the denominator in workload/knowledge
If even that fails, then we'll give up, and I will finally stop pinging you all the time ๐
If Jarrett is right, time=aยทR+b should be sufficient to unfuck CMRR
If not, we can try that also using sum(SยทR)
And if that's still not enough, then bye-bye CMRR
sum(S*R) I think I tried it
R, R*S, R*sqrt(S) I think I did
but it's always good to try again
I think my previous exponential would not have an issue because it penalize super heavily super low stability compared to >5-10 one, but I might be wrong since the optimizer could find a way to do just enough reps to have 5d stability, but with low enough decay, you'd still get 50% after ... thousands of years
yeah this is a significant value, as i remember it brings it to around LSTM's performance
๐ค mildly interesting
calculating r for the simulator isn't easy so I wanna check there's nothing else first
CMRR can always just be removed for one update if it doesn't work, there's no real rush
Why elapsed days on the X axis instead of R?
well we know what r does
?
here i double checked what r does on some random user as well
well if some other metric has a correlation with duration and is 100* easier to implement then i'd use that
Btw, I was thinking. Every time the user optimizes parameters, memory states have to be recalculated. And if we're recalculating R anyway, might as well use that opportunity to calculate a and b for time=aยทR+b. Then those values will be stored somewhere so that they can be accessed for simulations
it would be very hard
Yeah, but I don't think elapsed days is a good candidate because the same number of elapsed days can correspond to different R, depending on S
- What will you be doing next with your free time
- Do you have any plans for writing papers not related to srs
- There has been a lot of research from china in the machine learning field. Planing on transitioning to such a company?
OH another question
how did you manage to have nerdy intrests have a job and work on fsrs for so long?
- I'm refactoring Maimemo's algorithm.
- No. I don't like to write papers.
- No. The economy is not good here. I don't want to bear the risk.
I had a lot of spare time in my job.
Oh dear.
They made its knowledge graph manually...
๐ I hope it's an automatic work.
Thx for the link
I'm using Math Academy and have recommended it to my friend.
Maybe I will try to create a system like it.
Sorry, but could someone confirm that I've interpreted the FSRS vs. SM-17 benchmark correctly? But based on what I see, FSRS with its default parameters, and therefore without optimizing with user data, is apparently just as good at predicting performance as SM-17 with individual optimization data? In other words, FSRS is in the worst case as good as the best case in SM-17...
In FSRS-vs-SM17 benchmark, I trained FSRS with SuperMemo users' data.
https://github.com/open-spaced-repetition/fsrs-vs-sm17
This? No, FSRS is optimized, just not like in Anki
I'll leave it to Jarrett to explain the difference in how optimization is done
I mean, fsrs with default parameters has a similar rmse to sm-17 based on data from users who have used sm-17 already with the algorithm adapted to their memory
Again, it's optimized in this benchmark
We haven't tested FSRS with default parameters there
Oh, I see, thanks
https://github.com/open-spaced-repetition/srs-benchmark
We have tested it here, but there are a lot of differences, so the results aren't very comparable with what you see in the fsrs-vs-sm17 repo
And fsrs-vs-sm17 is based on a tiny sample size anyway, whereas the main benchmark is based on 10,000 users and hundreds of millions of reviews
I made a cross-comparison between SM17 and FSRS-6.
In the view of SM-17, FSRS-6 calibrates the data well.
You should probably add an annotation to explain how to read this graph
๐ I wouldn't like to repeat Woz.
No, really, please add an annotation. It's not obvious at all what's going on. Is higher = better? Lower = better? Closer to 0 = better? Can we get a definition of B-W?
(I know closer to 0 = better, but other people don't)
srs-benchmark:
fsrs vs sm17:
i wonder if AVG would have better log loss than SM-17?
Nice, do I read correctly that for again first, review time increase with time ?
AH it's elapsed days meaning since last review ?
or since introduction?
Lol, that would be funny
๐คฃ
Maybe I can add AVG to the comparison.
added
benchmarking now
There's something I don't understand in AVG. If you do the Average Retention on a collection, well you might get a good prediction if let's say, most people put DR=80-90% anyway, no ?
I mean, the fact they had DR=90% and Avg R close to 90%, is what allow you to have an AVG that is not that bad, since most of the test set will be made on (probably) people with AVG R of 90%, no ?
And once you have that AVG (R), how do you make it a scheduling agent ? Transforming the R into Stability/Interval ?
I mean, if you train AVG on 70% of my review and test it on the 30%, well, those 30% were already scheduled in a way to make it close to 90% AVG R
But saying simply "I expect this review to have AVG(R) retrievability, is not really answering the question "WHEN will it have a R of 90% ?"
Ideally you'd want to test your R on things that were not pre-scheduled for you
Or maybe I miss something stupid and I'm sorry for the confusion ๐
ah ok it's just for testing the prediction
I see
If an user used SM2, and a simply AVG is already able to do a good job at estimating those prediction, then it means we should then do better than it
However, if the scheduling is perfect (TR = DR), we cannot distinguish perfect predictions from AVG.
Ok got it
Thanks !
Ideally we'd like a prediction function that could even be more precise than giving prediction and would be able to say "This will be a 1, this will be a 0"
Already quite good actually
Wait what, that can't be right
AUC of AVG should be around 0.5
Like in the main benchmark
It can't be worse than random, especially THAT much worse
AUC of 0.08 would mean that you can get an insanely good algorithm if you just inversed AVG's predictions
it's moving average
it's updated for each review
if the last grade is pass, it increases. if the last grade is fail, it decreases
OK, at least AVG cannot beat SM17 in the cross-comparison.
@polar maple surely this can't be right
"If you get a score of 0 that means the classifier is perfectly incorrect, it is predicting the incorrect choice 100% of the time. If you just changed the prediction of this classifier to the opposite choice then it could predict perfectly and have an AUC-ROC score of 1."
https://stats.stackexchange.com/questions/266387/can-auc-roc-be-between-0-0-5
Sure there is no 1 or 0 confusion ๐ ?
"So in practice if you get an AUC-ROC score between 0 and 0.5 you might have a mistake in the way you labeled your classifier targets or you might have a bad training algorithm. If you get a score of 0.2 this shows that the data contains enough information to get a score of 0.8 but something went wrong."
So it means if you take -AVG, you get a AUC of 0.92
๐ค
Another guy say what I copy pasted is bullshit though
And I don't understand much about AUC
lol
But AUC is not a good metric for our case, as I mentioned before.
Oh shit
it's too good to believe
๐ What's wrong with the collections from SuperMemo users?
I need reviewer
@polar maple @unique salmon would you mind reviewing it?
@quasi shadow
copilot is dumb๐
no vscode anymore, I need cursor๐
It's 5:27 a.m. in China. My circadian rhythm is messed up in holiday.๐
Did you just wake up or have you not slept? ๐
I haven't slept.
Interesting, my friend in Japan also haven't slept.
๐ It's the geek's circadian rhythm.
am i allowed to guess thats asuka? XD
yep
he reviews my pr's at times i 100% assume hes asleep
OK, fixed
It's time to go to bed.
@quasi shadow I have a suspicion for why FSRS and AVG do so much better than SM. If the pandas dataframe is set up in the same way as it is in srs-benchmark then the problem is that FSRS is asked to make predictions about the review at the moment just before it happens, while SM is presumably doing the prediction after the last review of that card. In the time between the last review and now, FSRS is getting extra information
fair point
so I need to store the stability and decay for each card after each review?
In the next review, we need to use the stored stability and decay to predict the retrievability.
yeah, and i think you might need another column that is similar to tensor but also contains the features for the current row, this way you have enough information to predict the result of the next review if it exists
the current tensor column still needs to be kept since after revealing the label for the current review, you add (tensor, label) to the replay buffer
alternatively in the df preprocessing step, send the labels backwards 1 review step so that you don't need to store the stability/decay, but you still need to remember to add the data point to the replay buffer at the right time
Isn't it OK though ? I mean, let's say I have done 4 reviews on a card and the last one was 6 months ago and it had a stability of 1y.
Based on my new reviews, if I optimize FSRS, I might be able to reschedule, even that card, and get it early, at a more precision time scale.
So isn't it "the fault of SM" to not do it ?
I was also wondering, but more on a funny side, what could be the precision of a true baseline like just the constant function ๐. You set it to the average of a user to simulate we know the DR and then you just evaluate all reviews prediction based on that โฆ
80โฆ80โฆ80โฆ80
In the main benchmark AVG is literally a constant
Aaah OK sorry I thought it was a moving AVG evaluated after each new review
sry sry
@unique salmon
Isnt this the dekki guy
Oh, ok. I wonder why their website doesn't link to this repo
@quasi shadow @polar maple wanna implement it in the benchmark?
https://github.com/ankitects/anki/pull/3962#issuecomment-2846796128
Also, this. You two should share your opinions
Yeaaaah it's going into the direction "More training set = Better than having a split" which can be quite harmful
But I think it might be for the best to have attention of Dae on those thematics now
Might slow down a bit this PR but for the long term it might give other people less involved in those discussion to understand why, well, in a world of neural networks, letting the algo infer rules like : "if it's 1it's odd, if it's 2 it's even, .... if it's 27... if it's 286..." because he had all the dataset as a training set and never just created a rule "finish by 0,2,4,6,8 : even, else odd"
@quasi shadow Umm Jarrett Did I break FSRS or what
I have been getting these 35 seconds for a while now for the same card. D is dropping, but interval is not changing
did you FSRS-6๏ผ
Yes I am on the current build
25.05.2?
Does the stability increase?
Does not seem like it
It has been 1 minute for a while
I see D dropping
This has happened to 4 of my other cards
What is weird for me is that this card has never reached a D=99% or a 100%
So it did not even reach maximum difficulty
D is dropping and good interval stays the same
Not rising
Here is another card as well
D has reached 60% which is fairly moderate
Yet it cannot go above 35 seconds
I am going to have to press easy on both of these card because effing hell i cannot get out of this cycle
If i recall correctly, from @unique salmon , low decay (0.1) means super slow between 90% and 70%, but super quick drop from 100% to 90% no ? So maybe it has an impact for same-day review with people with such bizarre revlog ?
But at least know you have your short term memory model ! Just keeping spamming
If you press "Easy", does it help ?
It would push my card to 2d
I mean, I think after 30-40 good, one per minute, should be easier right ?
Yeah maybe FSRS learnt that your "Goods" doesn't mean much, don't know
Or mean much until D is lower
It's nice, you have dynamic learning steps now
Well D is low enough. At D= 60% that is where the majority of my cards at
So you'll have very good recall of those
D has no impact on short term stability.
Could you send me your collection?
I need to reproduce this problem.
I will fix it soon
OK, it increases from 35s to 42s!
๐ maybe it fits your memory
@lapis hearth the PR will fix it
Too low stability so the rounding was off ?
Yeah 0.001 getting stuck on itself was what I meant
Well
I guess @lapis hearth you're not edge-case tester
Fuck me
I knew something was off
I always run into problems like these and end up helping with discovering bugs, but that's what happens when you are overloading FSRS to its extreme
I got FSRS, I am using the entire FSRS
Thank you Jarrett
Yes
I need to set a lower bound for w[19]
What's the reasonable number of learning steps?
I don't know, I thought FSRS should be telling me that ๐
I can't help a lot here
maybe you need to re-optimize your parameters
I mean the way FSRS 5 was dealing with short term scheduling was good enough
My problem was that it couldnt drop below a certain interval
Not without a proper short-term memory model
I already optimized yesterday. Guess it does not hurt to optimize again i guess
What do you suggest I do with the cards caught in this 35s hell-cycle
Reschedule
Press easy
or what
the next release will solve it
the ad-hoc solution is disable FSRS short-term schedule
Well crap, i guess i will probably bury the cards until the next release then
And yes, the way FSRS 5 did it was pretty reasonable.
Only that it did not go below a certain interval point
You can always use Easy too
FOr example, ifyou get 5 times in a row something, I assume it's easy no ?
Or set some learning step that would be just bigger than the "blackhole point"
something like "35s 60s"
If 60s doesn't lead to >60s, try 65, 70 ...
Or compile anki with jarett commit
Huh
Never thought about that
Well the problem is already discovered and Jarrett will be fixing it
Optimizing has made the problem go away for now
But if this ever comes again until the next release comes, i will be trying this
It's possible that with all those reviews at 35s, FSRS concluded "Well, he'll definitely have more than 35s stability if he answers good", thus helping jumping out of the blackhole
I find that it's tricky.
the data_processing needs refactoring
I need to include the first learning entry for each card
wait... the collection file from SuperMemo doesn't include it.
๐ i give up
What is PBC? What is UCL?
What is LCL?
What the hell is an XmR chart...
15-20 ... words... by day
And what is even a "process change". I google LabPlot but it's all about showing data, not SRS-scheduler-correction or whatnot
It almost feel like the post was written by a bot
Search on FSRS on their website, nothing too
.5 retention on two days?
The perk of doing 15-20 review/day I guess
did they spend the day flipping coins instead of answering
But to be honest it almost feel like ChatGPT hallucinated post
Nah, he's replying to Jarrett
It's a turing test man
Except that his replies don't make anything more clear ๐คฃ
Why would he mention Jarrett to then, say "Oh yeah it's 15-20 reviews per day"

It feels like a very targeted ads no ?
Like those guys that ping you on Linkedin to sell their custom version of some profiling tool
Also "The FSRS couldn't reach the target"
The guy doesn't even sound like someone that does anki
"FSRS is not able to reach the target"
OH WOW THROWBACK https://labplot.org/2025/04/28/labplot-2-12-released/ the author of this post is a dude who worked with me on the loadbalancer addon like a decade ago
or at least has the same name
ยฏ_(ใ)_/ยฏ
I guess he wants a collab
why is he using the labplot twitter for this though
So a PBC is a "Process Behavior Chart"
Ok I think I got it : He try to sell the fact that thanks to his PBC, he was able to go from a Desired Retention of 90% to one of 85%, which allowed him to increase his True Retention from 81% to 95%, and reduce the variation from 0.137 to 0.047
Yeah, but he doesn't explain exactly what he did
Which doesn't make any kind of sense to reduce your DR to increase your TR
He's saying
FSRS = TR low, variance high
FSRS + secret sauce = TR high, variance low
But doesn't say what exactly the secret sauce is doing
Yeah and also "Why changing the DR to a lower value". I guess with PBC he's able to see changes not due to normal random variation but from different behavioural pattern, which can be nice indeed, but I don't understand why he claim then that thanks to it, he was able to solve the issue
what is this black box using to increase retention
Since it's an account for PlotLab and he speaks only of PBC I guess it's related to that but I can't understand what ๐
He just increased TR by plotting a graph. Yep, that's how it works. You plot a graph and your TR increases
that sounds about right for what happened
Or ...
He's just saying "I had 2 phases in my reviews, and thanks to LabPlot LBC, I'm able to detect that there was 2 phases"
Because the "Process Behavioural Change" detected a change of behaviour ?
And the how is not really the point, but the fact he was able to detect it ?
Golden rule of Twitter : Don't use twitter to communicate things that should be described with more context ๐
should I make a possibly inflamatory reply: "my thoughts are this is a handwavy graph with nothing supporting it"
The confusing part of my interpretation, is the fact he said himself he optimized his params, changed his DR...
... So why do you even need a tool to detect "Change of Behaviour" when ... you are the very source of this change of behaviour
@bold terrace do you wanna review this? https://github.com/open-spaced-repetition/fsrs4anki-helper/pull/560
So is this going to get benchmarked or what
Sure Iโll take a look
Done !
@quasi shadow @polar maple guuuuys
@quasi shadow hey, jarrett. uh, well, this is a selfish request veiled as a question. wdyt about this: https://github.com/open-spaced-repetition/fsrs4anki/issues/713
this will be real useful to us students who create cards from material/lectures and also use pre-made stuff.
IMO that's a lot of med students. Maybe 90%? Then there are even langauge learners with a mix of pre-made and self-made cards (I personally only know Danika though).
@unique salmon your opinion is also welcome expertium. anything u have to say.
I think a ton of my first intervals come out totally messed up because of this issue. So I'm just hoping someone has an idea to fix it.
I just can't think of a way to incorporate the time difference between the creation of the card and the first review into the calculation of S0
I guess we could have 8 values of S0 instead of 4, for (first_review_date - creation_date) < median cards and for (first_review_date - creation_date) >= median cards, but I highly doubt that it would pass the "log-loss must decrease by at least 0.0015 per new parameter" standard that we have set for ourselves
And it would be a pain to implement
Instead of splitting cards into groups we could somehow directly incorporate (first_review_date - creation_date) into he calculation of S0, but I have literally zero clue how
do we have the creation date info in the 10k dataset?
idk
If we don't, then sorata's idea DEFINITELY won't happen
Btw, Alex, check this out and see if you can add it to the benchmark: https://github.com/marawangamal/dekki
maybe jarrett has it
not interested
welp
How would that even work? The creation time of the latest cards I studied was 2 years ago
But I hadn't seen them before
They aren't quite the same -- as the warning points out, using "reschedule on change" can fill up your database with useless entries. You also have to change something to use it. And it applies to the entire preset, while the Helper feature can be narrowly focused on just the cards you want (in the Browse window).