I feel like you're overestimating how much load balance affects retention. I remember Jarrett and Jake made changes to it and Easy Days, so it should be more accurate now and the "LB sends cards too far away" problem shouldn't exist anymore, or at least should be mostly mitigated
If anything, the real problem is likely that FSRS itself just isn't accurate enough at predicting probabilities
#FSRS Megathread
1 messages · Page 2 of 1
I'm considering to split the easy deck in two. Cards with an even id into one subdeck/filtered deck with a low DR, and odd ids will would have a higher DR. It could be a decent way of feeling how DR affect my reviews
I'll also screenshot the predicted workload of the sims, to compare it to reality in the following weeks/months.
Let's say, your DR is 60%. And for the question, you have roughly a 50% of chance to guess it (You hesitate between meaning A or B). If I never show it to you more than enough, you'll probably grow a bit your stabilty when you have a cool mnemo to remember it, but after a quick rise of stability that lead to an interval of 2month after a maybe a 20-30d interval, if you forgot that memno, now you're back to hesitating between those again
WHich I think lead to those forgetting curve
Oh... I know it ... I know it better ... I ... Shit ... Review hell again
Forgetting is absolutely not that harmless in my opinion
Especially before you had it REALLY long term appropriated
Not "I have a cool mnemo I'll forget after 30d"
Right, makes sense
I'm really just taking the 1d of interval that is the initial cost of the LB unfortunately, but I saw it was maybe refactored
1d interval, for low stability card, often mean a R=60-70% instead of your target 80%
With my filtered decks, that card would be reviewd at around 85% if it's an older one in terms of when it was introduced (prop:reps>25 -introduced:30 prop:s<1.3 -prop:rated=0 -is:due)
Which means I have a DR higher for low stability ones in some sense
I could maybe rewrite that filter as : "prop:s <1.3 prop:r<0.90"
But I like the current one because I can really target cards that has been already there for quite a long time
excluding the potential new easy young ones
hm, two Failure-Modes would actually be interesting, but hard to implement I guess 🤔
"Again" and "Almost" or something :D
I have a "Wrong but almost right" case quite frequently
Yeah but I think there are many types of "Wrong" and it would be impossible to have a list long enough to cover them all lol
"I confuse those 2 things". "I know a mnemo super well but I just got it wrong". "I was too quick but in fact I knew it", "I have a similar one in my relearning so in fact it caused me to confuse it"
I think it's OK to accept that Anki will never replace any kind of real life appropriation 😛
I very often have cases where I got it wrong, but just forgot a Rendaku or something
So it's definitely wrong, but only a little bit
I even saw a guy saying he was just pressing Good all the time
Because to him, the whole "being tested every time" is what can make those tools demotivating haha
He just used it as a way to encounter some words a few more times in the following days
When you watch real life content, you dont get a penalty each time you understand something wrong
We had one guy in the WaniKani Discord, who thought hitting enter auto-rates, based on the typed in answer being right or wrong
So he did anki for 3~4 months, only hitting Enter
And honestly, his line of thinking is reasonable
But I couldn't find a way to make it behave that way
Would need quite an involved addon that adds new JS functions
Let's say at school, every single day is a test day
No more practice, you fail something, you get it with lower interval
FSRS-School
I think you would burn out quickly
Tests are good to measure and evaluate progress over a long period of time
they are not the main drivers of learning/memorizing
These last few levels of WaniKani also feel like high risk of burnout
There is only very little new stuff that seems useful
Like, they are introducing Kanji now, that are only used in one word, and that one word is some random family name.
A bit off topic, but know I do 1h of real active immersion per day, I learn WAY more than before, even when I know all the words
How a certain words is better used in certain condition, how a specific word can carry some nuance
My problem is, there just is no immersion I can possibly do that can reinforce the last couple levels of WaniKani, cause the Kanji and Vocab is so... rare?
So maybe useless 😄 ? I mean, useless for real life content exposition
Maybe useful if the goal is to do Kanji Tests
competition etc
Don't know per se
I did some mock questions of N1-N2-N3 I was surprised that I could guess a fair amount for each
JLPT is simultanously pretty easy and hard
given it's just multiple choice
but they do make a point out of demaning knowledge of more unusual Kanji
Sure
if JLPT is the goal, I guess it's probably better to learn those then
Law of specificity, you get good at what you train for
https://www.wkstats.com/charts/jlpt sumarized WK vs. JLPT quite nicely
I have learnt kanji in an extremely dumb way. I got into it because I was just interested in the writing system, not actually planning to learn Japanese. I kind of did a speedrun of kanji meaning so I've reviewed the meanings of all the WaniKani kanji, but I've only fully completed level 10 of WaniKani.
I'm now slowly converting that into actually learning Japanese.
Knowing all the Kanji can make learning the language easier. You're kinda doing it like a Chinese-Speaker would.
Though it's not the usual approach, since you do a lot and gain nothing useful for a pretty long time
I was 1 minute too slow on the review 😆
You need to start waking up at 3:00AM to check in with Jarrett ;p
Why do reviews jump off a cliff? 🤣
The first one is because the new cards run out. The second one is because the backlog is caught up.
why not something like this:
I think it looks ugly because of the not-so-light color used
What’s the meaning of the blue shadow?
I'd guess its the max and minimum values in that range
I thought about doing something like that but wasnt sure how it would work when multiple lines overlap
We already use the perfect seed by default, we don't need to compare with others:
yeah but when you change the parameters the lines going to change right?
like when a blue line and an orange line overlap
Yeah, I think it's a nice idea, but gets messy quickly.
Yeah exactly
You might be able to get away with two overlapping but any more will quickly become unreadable.
I always end up messing up my easy days setting when swiping through deck options, any way to fix that?
I think I've seen buttons etc. that don't toggle if you were swiping on top of them.
On mac I don't have that issue with that field, even if I just clicked on it before
I do have it with desired retention though, but only if it had focus
@quasi shadow why is RMSE increasing after optimisation? I thought applying weightage in evaluation would make that not happen?
What’s your previous RMSE?
Is it calculated in 25.02?
like 7.49 and then it became 7.58 maybe
no it's the first rc
Could you evaluate your previous parameters in 25.02 and report it here?
ah, wait, maybe it's cuz of log loss going in a different direction?
Oh. I guess it’s because you have two relearning steps.
yeah
Then the optimizer disabled the short-term memory parameters.
yup
We really need to replace "set the last two params to 0" with "set the last two params to the minimum values that do not cause problems"
And it also skips the comparison of RMSE.
It’s not a simple “setting the last two parameters to 0”.
It sets the gradients of the last two parameters to 0.
What do the minimum values mean?
I have no idea how to calculate it.
The minimum values of the last two parameters such that they do not cause the "next interval after Again is longer than the previous one" problem
But it’s good if they are equal to each other?
Yes. You could also make it so that the next interval has to be shorter by at least some factor, like 0.9
Such that new_interval <= 0.9 * last_interval holds true
Fine, but it requires the optimizer to know the number of relearning steps. It’s not elegant 
But it will make metrics better
I remember DerIshmaelite was complaining about his RMSE going up because of this
Fine. I will take a look.
I like it how Jarrett sounds like he's annoyed but he ends up doing the work anyway 🤣
something like this?
do you think the percentages should be in the boxes or would just a tooltip be ok?
Definitely in the boxes
well for some reason my codes giving up after doing half the numbers but you get the gist 🤷♂️
Damn, these percentages are tiny. Either FSRS is working like a charm or you messed up the conversion to %
the intra-day ones give me faith that my code works 😂
https://www.reddit.com/r/Anki/comments/1iulebw/we_should_delete_the_anki_manual/
Man, we really need one of these 3:
- A deck that comes with Anki and has cards based on the manual (SuperMemo way)
- Two UI layouts: Beginner and Pro (also SuperMemo way)
- An interactive tutorial, like in videogames
I think I’m leaning toward the tutorial deck idea. We could start by creating it, gather feedback, and iterate until the deck covers all essential use cases. Additionally, we could split the deck into basic and advanced sections to cater to different user levels.
I don’t have a strong opinion on the second option. For me, all the options in Anki feel necessary. In SuperMemo, the Pro mode hides features that are truly unnecessary for beginners (like the sleep tracker and very complex statistical graphs).
As for the third option, it seems like it would be very challenging to implement. When you think about it, it’s essentially a more complex version of the first approach.
Also, about the second approach, I have a feeling that most YouTube guides would start with something like, "First, make sure to switch to the Pro UI."
make a deck that comes with anki
question: what should you do now?
answer: read the manual
answer: RTFM please, otherwise GTFO.
I've proposed all of them before and Dae isn't enthusiastic about any of them 
Why not semi-oficial tutorial deck?
1.3 hours avg, 19.71 days
1.34 hours avg,20.32 days
1.4 hours, 21.24 days
1.49 hours, 22.69 days
1.66 hours, 25.2 days
2.22 hours, 1.13 months (34.2 days)
In the sense, that it won't be preinstalled, but will be referenced in the docs or on anki website
It wouldn't be as effective as a deck that literally comes with Anki. Plus transaltion strikes again
deepl transllate
Or even start with reddit, after sometime time (if it will be good enough and with many positive feedback), we can show it to Dae and discuss what to do next.
it is way better than blame Dae and complain about Anki difficulty
Anki Forums
@dae , looks like the iframe embeds aren’t working for some reason. It could be a Discourse bug, but they are working on https://community.ankihub.net/. Here’s an example of what an embedded tutorial looks like “full exploded” on Discourse. The downside to this type of embed is that they won’t be dynamically updated when the tutorial is modifie...
It is not a anki deck. And literally nobody knows about it.
Jokes aside, i really think that every user should just read the manual. No exception.
Then that's their problem. Such people will also skip any tutorial and will run to reddit to ask the question when it will come up.
It is a mentality question, i think.
Yeah, I just wanted to point out that some people have shown initiative, but it didn't really go anywhere in the end and the discussion died down
I mean we have a community that answers questions, nothing wrong with it.
I personally wouldnt want to help someone if they don't even read something I ask them to (that is to say, if I link them to the paragraph they need to read; some people refuse that on my face)
but their choice
Exactly! I think that Reddit community members should just more often send links to Anki Manual (exact paragraphs) with message to read the manual, and ignore people who refuse to do so. Maybe this could change people's attitude.
Another question, however, what to do if the user, after reading the manual, still does not understand what they need to do.
And again, in this case no tutorial is going to help here.
*Any tutorial will have the same effect
I was advised to add a third learning step in the 1-6h range (e.g., 5m 10m 2h) to improve retention, as I tend to forget new cards quickly in my production deck. My new card retention rate is far below the desired level. However, the FSRS Helper add-on only suggests 1-2 learning steps (e.g., 8s 13m). Would it make sense to combine both approaches and add a third learning step (e.g., 8s 13m 2h)? Would you expect this to result in more efficient learning?"
I always ask people with trouble reading text what sentence they didn't understand and why? How are they interpreting it?
that helps me clarify any confusion
also helps to improve the manual
Yeah, I didn’t mean that in an offensive way. Sometimes, no matter who you are, there are just things that don’t click right away. In those cases, getting help from real person in the conversation is the best solution
ye, I was just telling u my approach
the only time u offended me was last year 🤣
u remember it probably
Yeah, I’m really sorry about that. Sometimes I react strongly when something I care deeply about changes in a way I didn’t expect
@cosmic hedge This may be a dumb question. How do you find the timestamp for today (collection-existed-days) in you addon?
Ok, now I've got an initial version ported to your addon. Now I just need to figure out why it is horribly broken 😂
Hmm. Apparently I have a card with a Retrievability of 21942%. That's probably part of it...
Success! It still needs a lot of work, but it works as well as my native proof-of-concept one.
https://github.com/Luc-Mcgrady/Anki-Search-Stats-Extended/pull/30 oh yeah i already have some matrix code implemented here
i dont know if you care about that
add some leech matrix to the add-on as an experimental graph (and hope someone interested contributes later)
I've done it in a different way. It's still got a lot of work before it's actually ready to merge.
I created a draft PR in case you want to have a look @cosmic hedge
https://github.com/Luc-Mcgrady/Anki-Search-Stats-Extended/pull/31
on a card like this and similar ones
where i struggle a bit, and it gets into this 2 day trap
fsrs refuses to give a 1day interval more than once in a row
yet
FSRS helper add on, reschedules it for the next day
please help
its similar to this
can i get help or just another welp
or maybe that it will be worked on or something idk
and of course
when i said this, i tried to optimize the parameters "parameters are optimal"
my dr is 93
I don't think it's really a bug, I just think you're stability is growing slowly so in terms of "day", it's still rounded to 2d
To be honest, looking at your review log, it feels quite logical
https://open-spaced-repetition.github.io/anki_fsrs_visualizer/
You can use this tool to see the stability increase
Allso, Stability=Interval when DR=90%
When DR is greater than 90%, then Stability is bigger then Intervals
Look at mine. One mistake, means I'll need 7 reviews up to the previous stability.
But since each time, I have 90% of retention, it means every 10 steps, in average, I will have to go back 7 steps
Now, the most depressing part ? With my parameters, if I do 9 good reviews, 1 bad (which is the "ideal" scenario of really having my DR=90%), guess what ? My stability will even decrease with time
In this sim, I do 9 right answers, 1 wrong, 9 right, 1 wrong, 9 ...
On the very long run, it converges to stability = 6d
Which, funny enough, is ALSO what my 1y of Anki with FSRS led me too
I stopped adding any new words 3 months ago 🙂
This is why, while I think FSRS is a very very good prediction tool, it's an abyssimal learning tool.
use Anki/FSRS to test yourself, not to hope to learn, basically
That's actually very interesting
@quasi shadow maybe something like this happens when running the simulations for SSP-MMC, hence why it doesn't work for 25% of users
why not round down
😠
Depends on whether the unrounded value is closer to 1 or 2
Unless it's exactly 1.5, in which case it depends on the specifics of the implementation
@unique salmon : If I do the same with 19 good reviews, 1 bad, it converges to 77 stability (which would mean something like ~30d interval I guess ?)
should always round down
: |
77 stability is 77 days at desired retention 90%
As for other retentions, I don't have a calculator at hand for calculating the multiplicative factor
That changes interval lengths according to desired retention
Welcome to Filtered Decks world ! Where we just review every day everything that has Stability <1.3 😄
"deck:Japan::1. Vocabulary" prop:reps>25 -introduced:30 prop:s<1.3 -prop:rated=0 -is:due
Sure, but since I did 19 good, 1 wrong, we can assume the DR is set to 95% here
So, doing bigger DR will lead to betters stability in this case
What do you mean "assume"? You select DR in the visualizer
AND better interval since the 77 multiplied by the 90->95% factor should still be greater than 6d interval
aaaah
Shit
With DR=0.95, then we go back to Interval=5
So yeah, interval 6d or nothing I guess lol
Btw, I love the idea of rounding to the nearest even number. Since half of all integers are odd numbers and half are even numbers, this means that with this method 50% of the time the original (unrounded) value will be rounded up and 50% of the time it will be rounded down!
someone in #general was misusing hard 😔
can anyone here do the work for removing hard/easy buttons? dae said he's willing to accept the change.
With 80% DR and 4 good, 1 wrong, we get around 10d interval, which is also kinda logical
someone already did
download the pass/fail plugin
actually, is there a timeline for the code migration that will happen? I just keep hearing it
i will reject it
https://github.com/ankitects/anki/milestones
Probably no
I don't even see an issue for the two-button mode
To get this, I had to do :
13333333333 : 10 Good Answers
13333333333 : 10 Good
13333333333333 : 13 Good
1333333333333333 : 15 Good
133333333333333333 : 17 Good
So with my params, to get an increasingly bigger stability long term, I should at least increase the number of good answer I get in a row of ~2 per fail
But since Desired Retention is 90%, not sure how I'll go by myself to 94% (17/18)
Well, at least in Anki 🙂 But I think it emphasis something : Anki alone won't cut it
Which also kinda suggest that with time, you SHOULD aim for higher and higher retention or based on your parameters, you'll just have decreasing stability
It generally just seems to have come up with something somewhat akin to FSRS at first glance
Well, the other way around, but yeah
FSRS is based on SM algorithms
"Based" in a very loose sense
I also never realized the SM in SM-2 is for SuperMemo
https://supermemo.guru/wiki/Algorithm_SM-17#Past_vs._Future just makes me think "the dude really likes to write a lot"
He sure does
I at this point have largely given up to tune FSRS, or any algorithm really, for my WaniKani deck. Since it's just not enough memorization/recall to work properly :D
It'd probably work much better on a pure Kanji deck
What do you mean?
Well, I don't actually recall the large majority of my cards
I solve them
I know how to read the Kanji, so when a Vocab using them comes up, I just read the Kanji, get it right, and pass the card.
But I did not actually remember it at all
And at the same time, that obviously reinforced the Kanjis in that Vocab, which will have their own Intervals
Oh, and it also has this extremely confusing algorithm description
Which get messed with this way
specially the last line in that sounds weird
I really wonder how R is used in the formula for D, because I have tried my damn hardest to improve FSRS by making D depend on R, but nope
at least from that outline, it isn't?
Theoretically, R should affect D. For example, if you were 99% likely to recall a card and pressed "Again", that's a lot more surprising than if you were 1% likely to recall it and pressed "Again". The latter isn't surprising.
Looking at this formula, it is
I wonder if difficulty is an intrinsic property of the card itself. At least, that's what all my failed attempts of making D depend on R make me think
I tried something like that, yeah
On the topic of "a silly number of review buttons" I recently had the thought that recall kind of has two components:
- Accessibility - How easy is it to "find a memory"
- Integrity - How "good" is the memory
Good accessibility, Poor integrity = "I know it is X or Y. I cannot remember which one."
Poor accessibility, Good integrity = "It's on the tip of my tongue!" / "Oh of course! How did I forget that"
It would be painful to give two grades per review, but it's interesting because you probably want do different things to solve poor accessibility vs poor integrity.
Mr. Supermemo uses incremental writing on top of incremental reading on top of SRS flashcards. Maybe that's why he's able to write so much.
I've thought about ways to maybe fix FSRS for my WK deck. And what I came up with basically boils down to splitting the one deck into 3*60 subdecks. Radicals, Kanji, Vocab, and 60 levels for each. And then optimize all those 180 decks seperately.
"Nearly all texts at SuperMemo Guru have been written in SuperMemo using the incremental writing approach" that explains a lot.
Message deleted because I didn't realize it was a rehash of general
At that point there won't be enough data for FSRS to work with
I'm not even sure about that
By now that's still hundreds or a couple thousand reviews per deck
I think I understand that desired retention is best used as a way of controlling review load.
That said, as a language learner, would setting different desired retentions for decks of different word frequency bands be wise? (Or is this overcomplicating it?)
||This feels like it might be out of scope for a pure FSRS discussion area, sorry if so.||
That's an interesting idea. I can see how it might be good to keep higher frequency words at a higher retention than lower frequency words.
Some thoughts:
Might Make FSRS Worse if done Naively
If you do that, you'll reduce the number of reviews that FSRS has to optimize over, which can make FSRS perform worse. Because FSRS only looks at the cards in the preset, by default.
But, you can tell FSRS to optimize over multiple presets by modifying the search query beneath the FSRS parameters (which would let you have different desired retention for multiple presets while still using all of your related cards to teach FSRS how you remember things).
e.g. in both of your presets
(preset:"Preset 1" OR preset:"Preset 2") -is:suspended
This would tell FSRS to look at Preset 1 and Preset 2's cards.
You should test this in the browse window to make sure it's pulling back the cards you want.
You may be overcomplicating it.
I think if you're following the platonic ideal of language learning with Anki + Immersion, you'll very quickly stop ever missing very common words (since you'll be immersing and seeing them all the time). Since you never get them wrong, their intervals will grow quickly and reviewing in Anki won't be necessary for you to maintain that knowledge. Essentially, they're no longer taking your time (big intervals) and whatever desired retention you set no longer matters for them (review constantly via media).
So even without splitting out lower frequency and higher frequency words to set different target retentions with, changing your desired retention will essentially only affect the "not seeing all the time" words. Which might work out the same in the end as setting a different retention for lower frequency words.
A dumb thing you can do that I believe makes FSRS work better for language learning
(Different presets for TL -> NL vs NL -> TL.)
#language-learning message
The problem is, we cannot distinguish leech and non-leech in this case.
Andyʼs working notes
But the fact that
- Anki/FSRS will prompt you a card when prop:r dropped below the desired retention which means you should still expect a 10% failure rate (if your DR is 90%),
- The fact the Optimizer will do everything it can to change parameters so you hit that 90% retention
Combined with the new insights that :
- Based on parameters, it's possible that a 90% retention lead to decreasing stabilitity
Isn't there something off ? I wouldn't call that a contradiction, FSRS never promise to help you learn, just to predict your Retention on a certain date, which I think it does well. But then, it means how my behaviour led FSRS to optimize in such a way, show that merely following the path of "Doing a long chain of 90% predicted reviews" is apparently leading me to decreasing stability.
My point / instinct : It shows the fact we're searching for an Optimal, reducing Workload,** might **for people over-relying on Anki/FSRS, lead them to worst and worst memorization. (Based on their activities outside Anki)
This comes also back to the request that the longer the card has been in the system the higer target retention should be. So instead of single retention there should be retention range which fluidly changes
Given infinite time, the stability will converge to a small value. But there isn't infinite time.
Most of cards will become mature and you will never review them in 10 years.
Well, we're talking about a few month here, and only 4-5 failed reps total before most of the max interval "converged".
You have to realize, 1 mistake after 9 good reviews, and, apart from the first one, you will always, go a bit lower in terms of stability.
The maximum stability achievable (and interval since DR=90%) here, is 25d. We're really far far far away from the promise you'll never see it for 10 years here.
And even if you're right, it means you would have way more good reviews than expected by FSRS to be able to reach that point. And don't dare fail a single time in that threshold, because each failure will cost you MORE good reviews to compensate from it. Which defeat the purpose of precisely predicting and prompting the user when prop:r < DR.
I'm sorry, but this is obviously a situation, a shortcoming that need to be at least accepted. Once again, I'm not saying FSRS is doing bad its job, it does a good job at predicting. My criticize, is the model of "Having 1 review each time prop:r goes below DR", is not a model that will (always) lead to any kind of long term progress for the user.
Now, is it because of the lack of reviews ? User Behaviour outside Anki ? Most certainly. But whatever the reason, I think it's not really safe, long term wise, to try to ignore this potential behaviour.
Could you sum these intervals up?
It's around 78, I've did 26+18+12+8+5+3+2+2+1+1
That phenomena is something that I'm sure many experience
And once again, for FSRS, being a mature or young card won't alter when it will be scheduled : when R < DR. So given a FSRS that is precise enough, you are "doomed" to failed at some point
I think there is not any solution for this case with the given parameters. The solution is out of the scope of scheduler. The real solution is to improve the parameters - your memory.
A leech is always a leech until you re-formulate the card and re-encode it.
If I may :
- The solution is out of the scope of FSRS
- The solution would need to come from the scheduler,
- If not, then it needs to come from the user that would have to know, that relying on Anki/FSRS won't lead to better memorization. Which is a fault of the tool, or at least, a lack of disclaimer with it
is this like one of those cases where mmc fsrs wouldn't converge?
There are some significant differences between the parameters of failed convergence and successful convergence.
Leech action is designed for this case.
FSRS predicts some card's stability will converge to a small value even if there is an optimal scheduling.
Then you're suggesting to use FSRS only for things that won't have have more than X lapses before they go to N interval.
Which means, you're suggesting using FSRS for things that won't be failed enough before their stability grow large enough they'll be able to forget it.
Which means, some specific domains of study, won't be supported by Anki
Could SM-2 support these specific domains of study?
I'm afraid there isn't any scheduler could support.
By increasing the workload for specific items I guess, through the ease factor
The whole prop:r < DR concept doesn't really exist with SM2
And by nature of the ease factor, it's "card specific", not deck global (compared to the FSRS parameters)
That's whyI was thinking about dynamic DR and how it could help people
I think dynamic DR cannot fix it completely. You will get something like "review it every day".
That's exactly what I'm doing right now for some low stabiliy cards indeed
Right now it's still early, so I'll have more insights if it's really working later, but right now I do feel that instead of "edging the forgetting point", I start to just get "boringly used to them"
I think, one of the thing that could also help, is being able to qualify how "Good" a review was
Pressing "Good" after 25s of self-doubt, thinking very loud about it, vs 2s "no-brainer" kind-of retrieval, there is a huge huge difference
And since interval grow as long as you press "Good", no matter what kind of "Good" it was, I think it plays a role
The "luck" aspect is also some that could be quantifie, that's why I use multiple learning steps now.
I see 大地. I'm 100% sure it's "daichi" or "taichi". But I'm 60% sure it's "daichi", not "taichi", or the other. The "Quality" of my Good, is also lower than if I was 100% sure it was "daichi"
So you see why I'm not criticizing FSRS by itself : I think the true weakness is not necessarly coming from it.
And I think SM2 compensated by its nature those weakness, by allowing the user to drastically alter the ease_factor for different cards.
In FSRS, how you press Hard/Easy will once again, only alter deck-global parameters, not card specific per se
It also increases the difficulty of card.
My bad, it's true I don't really think about this one
But it's a really good point though. Maybe a solution could come on how different difficulty classes are computed. and how it adapts to review
The parameter won't change until you optimize it.
According to my research, it's better to have a higher DR for high D cards.
Experience wise, it's also what I'm feeling and what I'm doing right now with my Filtered Decks
Difficulty evolve though, but it's true it's quite conservative right now
Maybe during the optimization, you get better parameters by squeezing it to a narrow range, while in fact, cards might benefit from having more room to breath
In an early experimental version of FSRS, I designed a kind of difficulty without upper limit.
It gives difficult cards more room.
But the cost is, it's very hard to let them become easy😂
Which might be the case, I had a few example like that
Some cards that would go 1d, 2d, 6d .. fail again and again. I've done them 10 days in a row, now it grew larger
That's why I'm relying a bit on Filtered Decks right now, for ex :
"deck:Japan::1. Vocabulary" prop:reps>25 -introduced:30 prop:s<1.3 -prop:rated=0 -is:due
The intuition is something like : Before going to Interval >N, are you even able to really master it, with Interval<N ?
It's basically a rewriting of your : "Higher DR for more complex cards" or "Higher DR for Younger Cards"
It's basically : Before letting you play in "adult's interval area", let's already be sure you can manage a high retention on lower intervals
Said different, if you succeed it 4 times in a row every day, we have a better input on the "quality" of those 4 goods, instead of having you fail 50% of the time
So you can have a bit more qualitive information, by allowing a bit more reviews than the bare minimum FSRS is right now giving you
Woz has an interesting model for stability.
In this model, the stability has two components: molecular and structural.
Yep I just read it now, it's interesting
I think most low S cards have some structural problems.
Instinctively I agree, because it also highlight the fact that "Low S Good Reviews" might be "Low Quality Good Reviews"
Aside from Anki, I'm doing more Exposure now, and I do the exercice to learn some words that I DON'T add to Anki, and sometimes when I see it again 6-7 days later, I still remember it quite easily
And I'm starting to realize that vaccuum-reviewing things in an app like Anki, with a "checklist" mindset (You have to do it for the day, so you cram it), might just lead you to very very shallow structural memory
Basically, you do the minimum effort to get it done
But then, the "daichi" vs "taichi" just become a coin-flip
"Got it right ? Nice !". "Got it wrong ? Ok I'll just remember it until my relearning phase is done"
And if I get it the next day, I might still remember that "coin flip", but not really because of the content itself, but because you just remember the frustration of yesterday
But still, there's no really any structural memory there, it's mostly emotion-based adjustement on top of low-quality encoding
You get it wrong 2d in a row ? Even more frustration ! Emotion-based memory got a boost ! Now you remember it 4d ! But guess what, 4d later you had many other frustration with many others words, and you just don't know anymore. You fail it, but you get mild-frustration again
And you repeat that cycle of low stability for months and month and month, until you really structure it in some way more deeply
That's why I was talking about Behavourial pattern earlier, I think it plays a key role in how stability will build up
"New ideas may enhance structural stability. Review may enhance molecular stability"
This passage is really really really insighftul. I noticed sometimes in the past how 'adding more' words/concepts helped me remembering others one that felt "alone"
That's why considering the reviews of related cards makes sense.
If we have a model that could consider the impact of relative cards, it even could suggest how to introduce the new related cards.
I wonder if it is possible to cut up an LLM and use it to Frankenstein a model that lets you do:
- "card text A" -- LLM --> "encoding of concept A"
- "encoding of concept A" + randomness = "encoding of similar concept B"
- "encoding of concept B" -- LLM --> "card text B"
To automatically generate supporting cards without the need for a full knowledge graph.
Or find the midpoint between two similar concepts and generate a card from that.
Given how powerful LLMs are, wouldn't it be simpler to directly generate card B from card A?
Yes, but I was thinking if you had access to the innards you might have more insight into how similar / different concepts are by looking at the distance between their encodings.
I see, would be interesting indeed
Once you are back into the realm of text it becomes much more nebulous.
I've only ever done simple perceptrons and CNNs before though, so I have no idea if it is possible/ feasible.
lol, I was arguing with 大地 earlier and guess what I get on my reviews today ? That's the kind of "Good" that should count as a "very low confidence-level good" 😂
Ofc I can mark it as "Hard", but would have it be hard in the first place ? Who know
its been rescheduled with fsrs helper add on but like
what should i trust
am i to trust fsrs helper add on rescheduling or normal fsrs rescheduling
I don't understand the expected situation and how it differ from the actual one ?
you notice how when i hti the 3 rating/good
interval goes up
for some reason today, the interval instead of continuing to go up, it went down to 12d, for some reason
(probably optimising or changing params)
yes
but
nothing changed so catastrophically
in order for intervals to be reduced by that much
so what should i even trust
because when i reschedule with normal it doesnt make it like that
You mean if you press Good but then reschedule ? The 12d becomes how much then ?
Because yes, it's not that big of a deal that at first, parameters evolve a lot
Yep quite new
🤷♂️
I'd say my parameters really started to stay stable after 6-9 months
New patterns emerges, so the function will adapt
it's not a big deal
When you press evaluate, what's your logloss and RMSE ?
To see how good or bad it is
another deck of mine
But for your information, Rescheduling through deck options/plugin has small differences, for example the plugin will flatten the review number of the next 4 days, the deck option one will ignore the LB
So sometimes you can get 2-3d of difference between both
which one should i do then
Also there was a bug in the previous anki release
the memory state was not refreshed in relearning phases, so it's possible some dates might not make sense once they start to get re-computed now with the fix
but don't stress too much about it, except if you get 300d instead of 10 🙂
Your actual retention match more or less the desired one ?
Looks fine indeed
this has become 93% now tho
some difference has to be expected on top of the RMSE because, if you look at the column "Target R" in the BRowse view, you'll see many cards will in fact a bit lower than the goal when doing them
For low stability ones it can be a 10% drop, for most of mid-long term one it will be 1-3%
what are examples of cards that are related to eachother? i dont understand
or cards that would impact other cards when they are reviewed
e.g. Japanese vocab cards: "一" = one "一つ" = one thing
If you have just seen "one", you are much more likely to get "one thing" correct.
oh i see
Though usually they make it harder to memorize each other, not easier
Like, for example, the flags of Poland, Monaco and Indonesia
Or 米 vs 来 in Japanese
One of my recent arch nemesis: 情 vs 靖
the first is 心 related ofc
as long as u keep that in mind
Difficult thing is to know if something support each other, or interfer to each other, it will also depends on what the user will test.
If he test the meaning, sure it will help. If he tests also the reading, then いち(一)ひとつ(一つ)いっかい(一回)ゆいいつ(唯一).
So I think interference/support is something that won't be able to be trained really statically, it's somethign that has to be inferred dynamically, when the user is actually doing reviews.
Also sometimes it can comes from less obvious interference, example :
I see 王国. I fail to type the correct reading of the second kanji. A few minutes later, 国内 comes. They are very different, but my brain was so wired that he got wrong the first one, that I instinctively write 王国 very fast, without realizing it wasn't the lapse, but another word. But too late, I saw the answer, so marking it as good might be risky, so I marked it as wrong
I exported a supermemo collection to xml, then imported it to anki. I activated fsrs and assigned a preset for that group of cards. The problem is that as I do repetitions, the algorithm seems to ignore them because when I press evaluate in the optimize fsrs section, it tells me that I have done 0 repetitions. Why does this happen? Thanks
(I ' am using anki 25.02)
@quasi shadow take a look
The problem with trying to detect interference dynamically is that you probably need a lot of data to do it, far more than a single user can generate.
I'm guessing that to do it effectively you probably want:
- Review data from many users.
- A knowledge graph (so you know where to pay attention instead of having to compare all cards).
In the case of Kanji it would be really interesting to have access to the WaniKani dataset.
Could you share some screenshots of your cards’ info?
I want to check the review history.
It sounds like Ignore cards reviewed before could be excluding all your cards. What is it set to? (under Advanced in deck options)
Interestingly when I look at card info it's a one-way interference. I have not got 情 wrong since 2024-11-11. On the other hand I get 靖 wrong quite frequently.
u have two many 情 in your input
I was curious so I tried programming in the "suspend on leech / lapse count" feature for the simulator
(#1: no limit, #2: 14, #3: 7, #4: 100)
good idea?
i am confused here
does #1 mean you dont suspend
yep thats just the normal one
Looks good. But why do you have so many leeches?
because i'm bad at making cards 😭
At least it's using the full range. I've got nothing below 40%
I'm quite harsh with my marking.
how can i learn this power?!?
thats a lot of hards XD
I only count it as good if there is zero doubt, and only a second or two for me to get the answer. Easy = "why am I even being shown this?"
It seems to be mostly working. 🤷♂️ I'm meeting my DR.
this is my 2nd language that I know how to speak in
and i learned recently how to write and read
so i put all words, even simple words, just to make sure i have the standardized spelling
did you do this with sm-2 as well?
before i would say a word that means "I take" "zemi земи" and recently, i found out theres another letter before it "vzemi вземи"
so of course once i knew it was right i just get it right from on
even my english deck has a pretty high difficulty skew to the right 🤷♂️
i guess its because theyre only words i'm never going to use
wow somehow i have a higher learning% than young%
0 learning steps btw
i think?
maybe i changed that that would make sence XD
yeah i'm going to go with that
I only got back into Anki ~ August 2024 and only used FSRS this time. I think I started marking more harshly about a month in because I noticed FSRS was pushing things a little too far. I was remembering them, but always with a big delay / a lot of uncertainty.
I've always really struggled with language learning.
how r ur percents going down
i think thats an issue
😭
im pretty sure everyone does
except like mega geniuses or something
yeah no idea 😂
ohh its because the decks so small i always review them just after i add them
yeah that adds up
I think that's just an inherent flaw in how current FSRS works. It only cares about the average of all R == DR.
If I remember correctly that is the allure of SSP-MMC. It instead tries to target "graduating" cards where you retain it so long you don't have to look at it again.
Not exactly, average R is higher than DR unless you took a reeeeaaaaaaally long break so that now everything is overdue
More precisely, R when the card is due == DR
🤓
I knew that. I'm just really bad at explaining what I mean. 😅
Maybe "the average pass rate of all cards on their scheduled review date = DR" would be a better way of saying it.
Ross no speak good
Sure. Thank you
I think he meant the "Card Info" screen that looks something like this:
You can find it by right clicking on a card in the "Browse" window and then clicking Info...
"no reviews found" is going to be your problem
have you saved the preset to all decks and subdecks?
after reading the chain I am now aware you are aware it is the problem 😔
could you search preset:"java" -is:suspended in the deck stats and show what it says?
A simple Ignore cards reviewed before problem seems less likely now. Unless the supermemo export/import went very wrong and somehow some reviews claim to have happened before 1970.
The bottom data is missing
That's interesting. "Reviews: 3" but no revlogs.
it's a long shot because i don't really know exactly what it does, but try and run "check database"
that is clearly not good
Tools -> Check Database from the top menu bar.
didn´t work 😭
Well, then something went wrong with importing
In fact I tried to create a new collection in supermemo with new cards, to test exporting, and the same problem keeps happening.
I would suggest resetting the cards (go to Browse, select all cards and click "Reset...")
It seems like the Anki DB might be messed up by the import too if new reviews are not counting.
What is the effect of doing this?
Your cards will be treated as new
But the intervals will be forgotten?
Even better - delete them and import without enabling "Import review history" or whatever it's called
The cards will be treated as if they have never been reviewed before, so if that's what you mean - yes
So, apparently the only option is to start the deck from 0 without review history. It's a shame, although with this deck there aren't that many cards, there are others that do have many more. I guess for those I have to leave them in supermemo
focusing on example sentences have helped me. always used to do with english and recently started doing it with jp too. basically read/try to recall the example every time the card comes up.
There is obviously a bug in the Anki importer. You should report it on the forum so there can be a more long-form discussion: https://forums.ankiweb.net/
If you can export a file that repeatably breaks Anki and share it someone will be able to figure out what is going wrong.
Thank you.
If only though 🙂 I mean, it will pop up when R < DR, but sometimes that drop can be quite big, specially for low stability ones
I don't think it's really dependent on intelligence though, and I'm not sure being smarter speed up the process by much. I mean, it would definitely help finding strategies, putting in place good "learning hygiene" and stuff, but I don't think it requires you to be that smart.
I even think sometimes being too smart (in a sense that, you try too much to create rules/principles/...) can be an obstacle to learn something that just require you to experience it with a more straightforward mindset
I wonder if there is a study analyzing whether IQ correlates with scores on some foreign language proficiency test after controlling for the amount of time spent preparing for that test
Oh, btw, I don't have specific studies at hand right now, but I remember reading studies about US immigrants that showed that age of arrival correlates with English proficiency more than...literally anything else
One of the issue with IQ is that having better vocabulary help, I think there are questions like "Hands are to arms, what leaves are to .... [branch, trunk, soil, sky]"
Ofc, vocabulary in that one is not that difficult but you get my point
Also I don't always trust that much any kind of human science studies because you can found waaaay too many confounders.
For example, maybe higher IQ people have more confidence in how they can learn (even if they don't know their IQ per se), which will translate to more motivation, a bit addicted to that "rewards" of acquiring something they had trouble learning.
While lower IQ people might get discouraged, because they just got crushed each time they had to mentally compare themselves to others, etc
So even if there is a correlation, I'd be cautious to call it a cause (that better IQ lead by itself to faster language acq)
u just deleted it?
isn't it measuring general intelligence though? or at least purporting to do
Yeaaaah it wasn't adding much to the discussion because it was pure trolling, the account wasn't make any sense
It is ! But sometimes general intelligence is also quite influenced by prior experience in my opinion
is Steven pinker trustable on this? he says it's a myth that IQ scores aren't real.
or something
You take a kid that could be super smart, you keep bashing him, putting him in non stimulating context, telling him he's an idiot, you break him, and I'm sure his IQ will be way lower than what it could
Just to note, I don't imply it's not a thing 🙂
I personally expect that too, because he's lost on so much learning
Sure, but that's like saying "Height isn't real because you can just saw a person's legs off"
I once got bashed for bringing up IQ in a server so I never talk about it
the only reason I trust it to some degree is flynn effect
what explains that Africa's scores are rising so much? Ofc because they're relatively better of compared to the rest of the world now than a century ago.
(NOT saying africa is better of than everyone else but that that they're better off now than before)
It's more like...
Imagine a world where we can't measure height. Maybe all rulers are broken. So instead we "measure" height based on how high a person can reach when they jump, by attaching a bunch of wooden planks to a wall (relatively high) and asking people to jump and touch those planks.
So you get results that are correlated with height, but also with the jumping ability.
I mean, it's a measurement taking in account ability to abstract, memorize, ... That's a lot of great intellectual qualities for sure, and it will probably correlate to A LOT of thing, but my point is, he might have reached that point because of a good background
Sure, problem with "Intelligence" is how abstract is is compared to "Height" in the first place
So well, IQ is indeed like the height of how high you can jump. A bit of genetics, a bit of skill, a bit of prior training
But in the end, if you want people able to reach a certain height, a certain "level" of intelligence, it's still a good measurement
And you obviously don't care how he came to that, you just WANT that criteria
My only issue with IQ and other measurement in general is more about "Ok and now, what do you do with that ?"
If it was a clear discriminant for something really precise, make sense
If it's to brag about yourselves everywhere, wellll... go brag about yourself alone 😄
(And ofc, most people that might defend it might have high IQ, and a lot of people saying it's dumbshit might have lower avg IQ, so those discussion won't ever lead to any kind of happy medium lol)
You could use it for
- Figuring out which children need special education (which is what IQ was historically used for, btw)
- Finding suitable candidates for a big brain job
Yeah for those definitely make sense
That's a bit why I'm so against the "High Potential" concept/term
It's not really the potential people will care about, it's the actual situation you are in
"My kid has all bad grades, have no friends, discipline issues and eat bugs, but he's HP and he needs to be understood correctly"
Feels like a "I have low IQ but in a different world I'd be high IQ"
I'm not sure what you mean here
If you're high IQ, you'd just say you're high IQ.
Btw, this should probably be moved to off topic
yep sry
The idea of a recruiter ignoring all relevant qualifications and prior accomplishments for a big brain job and instead deferring to rascist astrology is genuinely hilarious
I say that as someone with an IQ of 130 (doesnt mean anything. I am a moron.)
u can have really high fluid intelligence
I think the opposition is mostly due to rasicts using it to justify their rasicm.
another explanation would be if the average intelligence in your cohort is low from a subjective pov...
fluid and crystallized intelligence theory is bad
yeah, let's stop here... I can sense the convo going in a bad direction
my reasoning had always been to do with flynn effect anyway
I just had another look at ramf's Card Info and noticed it has Ease instead of Stability and Difficulty even though they have FSRS turned on.
It looks like something went wildly wrong when they imported that SuperMemo file.
Or the various biases of the test worked in my favor. Or the test is far too narrow to measure more than a thin sliver of what we call “intelligence”
well, their are some horrible IQ tests out there from what I've learned
Yes. I had mine done as a young child in school so idk remember but the kind of cultural bias and general knowledge questions I’ve heard of are absolutely ridiculous
I wouldn't speak on which tests are good and which are bad but there's some meta out there in cognitive tests community. It's generally agreed that a lot of tests can be horrible.
I looked into it some years ago but have forgot all of it almost. also have lost interest in IQ.
I guess the importer only converted the items to cards. But the review history is lost.
The thing that is really weird though is this bit:
The problem is that as I do repetitions, the algorithm seems to ignore them [...]
I could be misunderstanding, but I was taking that to mean new reviews done in Anki were also not counting which suggests something bigger is going wrong.
Their screenshot of their "Reviews" stats shows nothing. I assumed it was for a deck that they had done some in-Anki reviews on.
I probably should have made sure to ask for "Card Info" for a card they know they reviewed in Anki.
For this case, the review history of the card is incomplete, so the optimizer will ignore it.
Ah. So maybe they were just showing us a screenshot of a deck they had not done any reviews on yet and the optimiser was acting as expected on the different deck they had done Anki reviews on.
Why does the optimiser ignore cards with missing revlogs?
I assume you usually are only missing old revlogs (no holes in the middle).
Isn't that effectively the same as if you had just been learning something outside of Anki?
Thinking about it, even holes are pretty much the same as reviewing outside of Anki.
https://github.com/ankitects/anki/pull/2922#discussion_r1445896159 This pr might help you.
GitHub
Allows users to ignore revlogs generated before a given date while reviewing.
Useful for ignoring bad reviews, a feature of the non-inbuilt optimizer.
Implemented for Optimize, Evaluate and Optimi...
@bold terrace
That's, super super nice
We need to import Repetition History to convert them into revlogs.
The XML file of SuperMemo doesn't contain the records of repetitions.
It's possible to associate the Element data with the Repetition data.
You think "extended sibling burrying" would make sense/work? i.e. sibling burrying, but not just for Siblings, but also "related cards". i.e. if you study a vocab that contains a Kanji, and that Kanji is also scheduled today, it'll get burried. Or vice versa.
Could be extended to also burry vocab that contain the same Kanji, but that feels too disruptive, given some Kanji appear in a lot of Vocab.
That feels like it needs something like the Math Academy "Fractional Implicit Repetition"
It would be cool, but Anki doesn't allow you to define arbitrary connections between cards 
I could easily implement that for at least my Kanji+Vocab deck
Either by just plain parsing the words, or I could add the related Subject-IDs onto every note, and burry based on that
WaniKani already provided a list of other "subjects" that are related to the current one
It would have to be custom scheduler / addon time but you could store it in the custom card data.
i think you could do something like that using tags
like
cardiology::hypertension::drugs
would connect all the questions that are related to drugs
or even
cardiology::hypension::drugs::diuretics
that would cover all cards with that tag
and you can also add the tag
physiology::electrolytes::hypokalemia
to the card
Loop diuretics cause {{c1::hypokalemia}}
I don't think you need a new system. Just need to find a way to levrage tags
writing this I realize you can't make tags have hierachical importance
but still it's better than nothing
had FSRS on for a while, does setting Reschedule cards on change do any harm?
it potentially unearthes a lot of cards at once
if you can, you should reschedule via the FSRS Helper AddOn though
I think if you reschedule via the helper addon, you can just undo it
not sure about the built in one
can always just make a backup first
will try later!
The helper addon respects the load balancer and easy days (the built in option does not). ||I know for sure about the load balancer, only confidently guessing about easy days.||
The helper addon also doesn't create an unnecessary review entry (in the card info table).
You can also use the helper addon to reschedule only specific cards by selecting them in the browser.
helper addon = love = life
Any people caring about giving me their optimized parameters ? Want to check my 9good->1bad->9good curve on others curves
Witht default the same pattern emerge but still converge to a high value, 500-600d of stability
While mine converges to 5-6d
0.2449, 6.6747, 29.6870, 83.2370, 7.1572, 0.1256, 1.6081, 0.0301, 1.5763, 0.2629, 1.1031, 2.0592, 0.1111, 0.2567, 3.5012, 0.3858, 5.7947, 0.1542, 0.6841
And 286 if you really push it at most
@bold terrace
Turbo Mega Hard Deck Of Absolute Hell: 0.0690, 0.1361, 0.3771, 0.7439, 6.9436, 0.7726, 2.4885, 0.0013, 1.1994, 0.2544, 0.7253, 2.0119, 0.0824, 0.2102, 1.6911, 0.2482, 4.6876, 0.4448, 1.2809
Turbo Mega Easy Deck Of Memory Gods: 0.3926, 1.6209, 13.5989, 100.0000, 6.5253, 1.1444, 1.3246, 0.2590, 2.1167, 0.0009, 1.5381, 2.0056, 0.0014, 0.2261, 2.4639, 0.0418, 3.9873, 0.7226, 1.2180
@quasi shadow
@bold terrace
Do me do me!
Super Easy: 1.7897, 5.2633, 14.1069, 69.7609, 6.9608, 0.5037, 1.6173, 0.0102, 1.7623, 0.0000, 1.2258, 1.9436, 0.0965, 0.3572, 2.3282, 0.0000, 2.9909, 0.5971, 0.7215
||The FSRS simulator predicts that I'll have to do only .06% of this deck every day eventually.||
Hardish: 1.2629, 1.5894, 1.8647, 2.2586, 7.2076, 0.0512, 1.6107, 0.0085, 1.4163, 0.5319, 0.8862, 1.9795, 0.0942, 0.3124, 2.2257, 0.2881, 2.9501, 0.6150, 0.8739
||The FSRS simulator predicts that I'll have to do around 3% of this deck every day forever.||
I thought Hardish is a langauge
now it sounds like an Indian guy lol
First one converges at 6.85 stability but the second one is a different beast lol
Super Easy
15.20d stability for the second
@bold terrace I think you should be randomly sampling 3s and 1s rather than spacing them apart like that
Maybe but it's also part of the goal to see how it would behave in a "perfect" setup, to try to increase as much as possible stability
Love me some 100 years stability
Well, in fact to respect the 90% R but have S as high as possible, 19G-1F-1F-19G-... should be better
In that completely hypothetical, then it COULD work
But still, something tells me than then, if you want to have increasing stability, with a fixed set of parameters, you NEED to outperform your previous performance
THEN, FSRS would I guess adapt the params, to make the intervals longer for the same DR
Or, you increase the DR (at the same rate you out-perform the prediction)
Interesting. This is close to the FSRS simulator results.
If I set up my current deck to grow to 100,000 cards over 10 years (to make the numbers easy to work with, okay? I'm not actually shooting that high... maybe), it thinks that I'll eventually settle down into 4,000 reviews per day (4% of total). Which I feel like you can kind of extrapolate (assuming that every card is equally difficult) into a 100k/4k per d = 25d stability.
Sadly there is no log scale, but here
I crudely simulated gradually increasing DR
A little zoomed in
With fixed params, I guess it can make sense since the more you outperorm the prediction, the more quickly the exponential goes
(But, I know from one of Jarret's analyses on Reddit that the cause of a non-shrinking load in the simulator is probably predicted leeches very difficult cards taking up an excessive amount of reviews, so that would imply the average stability would be higher.)
Yeaah but depending your params I don't think leeches are that easy to define
lapsing ~5 times a card can happen super quick
It still grows long-term with DR=98%
Sorry, I actually intended to not use the word "leeches" and forgor, lmao.
Jarret says:
the behavior is caused by a set of cards with extreme high difficulty
Maybe we do need SSP-MMC after all
Since otherwise S will stop growing
So unless you are learning super easy stuff, you won't reach very high S
That's a bit my point with the whole "good prediction tool but lacking learning tool", by predicting constantly the same DR, you don't necessarly build your S really far
its so over?
ssp happening?
Maybe it is time for a new benchmark.
We've now got algorithms that are good at predicting R.
Maybe we need a benchmark that generates synthetic data (assumes the FSRS curves are perfect) and assess algorithms on how good they are at "graduating" cards.
Not until we
- Verify that it does actually result in lower time spent per card than with any fixed DR
- Fix the matrix convergence problem
expertium doesn't understand poetry 😕
sounds like what SSP-MMC-FSRS is doing
https://github.com/open-spaced-repetition/SSP-MMC-FSRS
No errors bars (unlike in the benchmark) because neither me nor Jarrett care enough to run these simulations dozens of times 🤣
holy acronym man
mightt as well start with PDD-Y1.5-SSP-MMC-FSRS-SRS-2.5-EXP3-RTI-UM
https://en.wikipedia.org/wiki/Basis_set_(chemistry)
6-31G(3df,3pd) – 3 sets of d functions and 1 set of f functions on heavy atoms and 3 sets of p functions and 1 set of d functions on hydrogen
In theoretical and computational chemistry, a basis set is a set of functions (called basis functions) that is used to represent the electronic wave function in the Hartree–Fock method or density-functional theory in order to turn the partial differential equations of the model into algebraic equations suitable for efficient implementation on a ...
aug-cc-pVDZ, etc. – Augmented versions of the preceding basis sets with added diffuse functions.
def2-QZVPPD – Valence quadruple-zeta with two sets of polarization functions and a set of diffuse functions
I don't think these are even the longest ones
Chemistry for people who don't want to leave their room
i am in biochemistry and i want to annhiliate the department
Need to give the algorithms fun nicknames a la Wifi vs IEEE 802.11ac
I checked the ORCA (quantum/computational chemistry software) manual
"aug-cc-pwCVnZ-PP-F12-OptRI" and "cc-pVnZ-PP-F12-MP2fit" were some of the longest basis set names I could find 🤣
computational genetics, computational (numerical) taxonomy. we've got some too.
Can't wait for FSRS-5.5.1-recency-sibling-fatigue, or just simply FSRS-5.5.1-R-S-F for short
I name him Phillip ;p
Btw, hopefully thanks to @polar maple we will be using probabilities given by a neural net rather than by FSRS as the ground truth
Whenever Alex makes his Super Duper Mega Neural Net 9000
No, but we might get some 200 IQ parameter optimization
@polar maple I'm counting on u 🫂
RWKV-P RMSE(bins) (mean±std): 0.0168±0.0088
RWKV-P AUC (mean±std): 0.8210±0.0668
RWKV LogLoss (mean±std): 0.3159±0.1485
RWKV RMSE(bins) (mean±std): 0.0367±0.0279
RWKV AUC (mean±std): 0.7658±0.0659
LSTM-short-secs-equalize_test_with_non_secs LogLoss (mean±std): 0.3271±0.1519
LSTM-short-secs-equalize_test_with_non_secs RMSE(bins) (mean±std): 0.0374±0.0261
LSTM-short-secs-equalize_test_with_non_secs AUC (mean±std): 0.7358±0.0758
FSRS-5-recency LogLoss (mean±std): 0.3432±0.1614
FSRS-5-recency RMSE(bins) (mean±std): 0.0537±0.0342
FSRS-5-recency AUC (mean±std): 0.7065±0.0781```
RWKV-P is a model that predicts the probability of the result of a review immediately before the review happens
RWKV is a model that does a curvve prediction after the last review
These numbers are only for the first 1000 users
Oh no, don't even try to explain it like that
xD
Even I had no idea wtf you mean
When you put it that way
Just say "one has a forgetting curve and predicts memory stability as an intermediate value, the other one just does magic directly, without S as an intermediate"
can we use it in anki in the future by your grace 
idk
RWKV-P can't be used for scheduling
I did choose the RWKV architecture specifically so that it can run efficiently on CPUs
In some sense the strength of RWKV-P is fake because the other algorithms on teh benchmark aren't able to update their state fast enough
just a limitation of the benchmark itself
Also WHERE IS MY SCALING LAW YOU MFER?!
(I will keep pestering you about the scaling law forever...or until you actually do it)
the proper comparison is if FSRS can optimize its parameters after every single review, which is not done in the benchmark
So, now the question is : If you need to outperform the prediction to be able to grow higher stability than the theoritical Xd (for ex 6d in my case) if I was doing really 90% every cycle. How to do that 😂
(ofc without reviewing it outside anki)
Better encoding, better "feel" for the material by learning adjacent knowledge
Adding more words ?
More Words => More Time Spent => More Potential adjecent knowledge => Better stability on certain elements
uniformly spacing 3 and 1s is too perfect to the point that it is straight up unrealistic
@unique salmon RMSE (bins) didn't improve much for RWKV
i think that the improvement for LSTM over GRU for RSME (bins) comes from base + ceil and the different curve shape
i mean for RWKV compared to LSTM
Ah
RWKV with LSTM has a similar log loss improvement as compared to LSTM with GRU-P
but the same cannot be said for RMSE (bins)
idk
unfortunately due to the way RWKV was build i can't easily get simulate cards like what Sound has been doing
RWKV relies on global context and i'd have to mock other reviews or something idk
a bit annoying to implement
Does it give you better insights about the question like this then ?
"Get extremely lucky so you get a very large stability until you fail it and you have now a 6d stability again" ?
@unique salmon I think RWKV and LSTM are close enough such that if LSTM was properly pretrained on 5k users rather than just 100, and LSTM used a larger model, then LSTM would get similar performance
yeah, do this independently for many random strings of reviews
then find some metric you're interested in and measure it
Well I think one conclusion can already be : "Having a 300d stabilit doesn't mean shit about how stable the knowledge is"
seems like it
i believe that FSRS doesn't handle lapses well, i'll check with a nn model later by doing the same thing that you're doing
Which then basically means "Anki doesn't really help me"
So to be cautious, and apologize for my previous criticize of FSRS being a bad learning tool,
SRS as it's used might just be 😂 At least the "1 review, with increasing interval, will lead to good memorization"
One big problem FSRS has, at least for me, is that it does not deal well at all with "outside influence"
i.e. if you recall some vocab super well cause they appear all the time in daily life, it'll "taint" FSRS if enough of them are in the deck, cause it then thinks you're much better at remembering the other cards as well
so they keep lapsing after too long intervals
Yeaaah at the same time, there are a few words I learnt before Anki-ing them, (like 汚い、dirty)
And for those, funny enough, I never fail them apart the very first time
So it's really like : If I know it, Anki will be able to show me I know it well
If I don't know it, Anki won't really be able to make me learn it, but it will be able to show me I don't really know it well
More like "Anki is more a grader than it's a teacher"
It's quite common that I encounter cards for reviews where I remember having seen/read it the other day in some video or article
And where I'm also reasonably sure that I would have not known this word if that hadn't happened
That's also a huge factor, but for those I think you might get a lucky pass and then screw later on, if really it's a difficult one
But yeah got your point
I just feel in Anki, I don't have a "rich enough experience" with some words to really have it stable long term
well, on a lot of those cards, pressing Good can be 1~2 years in the future
Also, sometimes it feels like you would need to really hammer something down at first, but then it would be stable for long time
a problem is that hammering down some words means less time for other words, the net effect could be negative
Also something to keep in mind is that in Anki, you test yourself with zero context
Very likely with context around the word, recall would be much easier and more stable
I don't buy this theory that much because for example, English is not my native language but give me any of the words we wrote until now "alone", and I'll now how they would be use
So it's not really that the recall is done in a vacuum the problem, but maybe the fact I LEARNT it in a vacuum
The other way around
I think reviewing in a vacuum can be OK, but not learning
Which would support the fact that Anki is NOT a learning tool but a reviewing one
A lot of words have a lot of nuance, so without context around them they're really weird and hard to remember
but in a sentence it's usually obvious what it means
I feel like English does not have that nearly as much as japanese though
you should at least have 1-2 meaning right, right ?
I mean take "get"
There's 99 possibilities
but one could be "obtain" and another one "enter" (get in)
my favourite example of that problem is かける
good luck learning that as a vocab on its own
Yeah sure but you see
I see かける and automatically I have 2-3 meaning that comes directly
I'd mark the card as Good for that for example
See, and in context, the other 20+ meanings are usually obvious from context.
Which is what I'm saying
Sure, but I'm more talking about cards that you should really get even without context
Things like material objects, seasons, etc
I have like 10 different cards for かける and its variants. I quite frequently get them wrong, cause I got transitive vs. intransitive, or the kind of かける wrong.
probably more than 10 even
I do use context though, but ONLY if the context would 100% be there
for ex :
so いえ and け can't be confused
Because 100% of the time, the け would follow a name
But it's still a review in a "vacuum"
While, for Learning, I think without different context, sound, emotion, ... you might just not build stability that much
So Anki is more like a Grader than anything
OFC you will "learn" some stuff, but a vast majority will have low stability
I mean, after 60K reviews, I'm still with an average stability of 1month, but a huge huge huge blob of thing under 15-16d
Because while I "more or less know them"
I don't really master them
Because seeing 1 word 5 times in 60 days, even if I got them 5 times right is not really a good marker for "long long term appropriation"
And sure, my Memorised can be larger and larger, but there's in fact very little I can really use instinctively
I even stopped adding any substantial amount of new words for 80-90 days, and my reviews/day is not even really going down
Retention is going up, so it's not like it's not doing any effect though
BUT, if my math is correct, at this rate, I'll be fluent in 1000 years
Earlier, I even failed 小説家 (novelist) while 小説 has a stability of 3 months and I see it often ...
Why ? Because my brain start to build retention based on number of kanjis, font size (if one deck has a slightlier font size, sometimes it's what help me remember it was X word)
I start to think Anki might just be bad for learning at least 🥲
@unique salmon for rmse (bins) since you like it more, this time on 1600 users
Anki is a tool for repetition. As woz said, you need understand before you learn.
I guess it is off-topic for this FSRS thread, but I thought this was somewhat of a debated topic. My understanding was that recent thinking is more that understanding and learning (memorizing) really need to go hand-in-hard. It is hard to advance in understanding without memorizing important, core information. And it is hard to learn (memorize) without understanding.
@bold terrace
The visualizer now has an option to make the Y axis logarithmic (the dev responded to my issue really fast)
Though it seems like anything like 0.1 or 0.01 gets rounded to 0
EDIT: already fixed
@cosmic hedge the simulator PR is merged into Anki.
What do the colors mean?
Number of cards in that bin. Blue = small, Red = big
I don't know if it's actually useful, but it turned into a fun little side project to play with.
So Easy Days sliders are not shown in the simulator window?
Yep. I leave this work for others.
Perhaps, we can add it in that PR.
Any idea's where the easy sliders would fit? A drop-down or something?
A modal in the modal???
GitHub
Adds the "Suspend after X leeches" feature to the simulator.
Limits for examples:
(1: no limit, 2: 14, 3: 7, 4: 100)
simulator keeps getting better 🚀
@quasi shadow out of curiosity, do you do any kind of machine learning, as a hobby or at work, that is not related to spaced repetition?
Also, take a look at this: https://forums.ankiweb.net/t/25-02-possible-bug-minimum-recommended-retention-0-70/56210/18?u=expertium
Anki Forums
@L.M.Sherlock can you implement average discounted stability and see if CMRR gives vastly different outputs if we optimize for average discounted stability/time instead of total knowledge/time? Perhaps we should move this to a new topic EDIT: man, how does Jarrett always manage to make code I don’t understand…I’m trying to run optimal_retentio...
If you asked me 2 weeks ago I'd say 100% agree with you (that learning/memorizing need to go hand in hand). But know I just realize "learning" is something so broad that merely doing "reps" doesn't necessarly improve it. So now I'd say : They need to go hand in hand, but I'm not sure you can really do both in the same app/same tool.
What I'm thinking is : Is something like Anki "doomed" to not be a good learning tool, or can it be, in certain circumstances : For example, here https://supermemo.guru/wiki/Two-component_model_of_memory_stability, you find that "New Ideas may enhance structural stability", which is what I think is closer to "Learning".
If I do some retrospectie of that year spent on Anki, I'd say some words became easier sometimes by ... adding more of them. For example, 手伝う felt difficult to remember until I learnt 手 was "hand" and 伝 carries the "transmit, convey, ...", which now gives 手伝う a very easy way to remember "You give a hand" , you "Help".
So I (for now) think Anki can actually help in some extent learning, but not really by merely reviewing the same cards again and again. It also means, the cards should have some kind of overlap, just enough so connections are possibles between them. The order on how they're introduced will also help connecting those things together or not, etc.
It also explain why with time, your Anki performance is somewhat accelerating : The more atomic/different words/concepts you know (which are harder because not interconnected), the faster you can learn "compound" words.
Still, one area where Anki (sorry for the term) suck, is that it's quite a static way of having a front concept with a back definition. Ideally, you'd like to learn things by seeins their different angles. Probably 手伝う is now "easy" to remember as "helping someone", but maybe certains usages in certains situations exist and you won't necessarly build the right connection just by reviewing it in Anki again and again.JJ
I did NLP before the release of gpt-3.5.
Then there are times when you don't want to think of the components, like in ||苦笑い||
This word is cursed
About what you are saying though, I'll even go further than you. Anki encourages mindless repetitions when you should be doing the opposite.
It's just so much easier to not think of the material and just spam reviews with Anki and get a feeling of "ah, i'm doing a lot of work"
japanese has a lot of compounds, and I think mindless repetition for them especially is a huge negative when it comes to jp.
I've thankfully learned the habit of actually trying to understand my cards unlike what I'd be doing before.
also, a very intersting compound word: ||二番煎じ||. look it up.
My guess was "biscuit"
@bold terrace https://forums.ankiweb.net/t/25-02-possible-bug-minimum-recommended-retention-0-70/56210/18?u=expertium
So unless I screwed up the code, this new metric, "discounted average stability" as I call it, makes the minimum recommended value of desired retention higher
Anki Forums
L.M.Sherlock can you implement average discounted stability and see if CMRR gives vastly different outputs if we optimize for average discounted stability/time instead of total knowledge/time? Perhaps we should move this to a new topic EDIT: man, how does Jarrett always manage to make code I don’t understand…I’m trying to run optimal_retention...
Here’s an idea for a metric: average discounted stability.
It would be calculated as sum(R_i × S_i) / n, where R_i is retrievability of the ith card, S_i is the stability of the ith card, and n is the number of cards.
The difference between this and simply average S is that average S doesn’t take into account the fact that you won’t be able to recall 100% of your cards, only some fraction <100%.
...the new value with default FSRS parameters is 0.93, which is very different from 0.84.
This means that using average discounted stability would push optimal retention very high compared to using total knowledge.Next I used parameters for one of my hardest decks where MRR is always at 0.7. I got 0.87 with the new metric.
Next I used parameters for a hard deck where MRR is 0.73. I got 0.85 with the new metric.
Next I used parameters for an easier deck where MRR is 0.87. I got 0.88.So far
0.84 → 0.93
0.70 → 0.87
0.73 → 0.85
0.87 → 0.88
The idea is that instead of just using the number of cards that the user is expected to remember as our estimate of knowledge, we also add memory stability into the equation, to account for the, well, stability (or memory strength, whatever you wanna call it)
So now we consider both how likely the user is to recall a card and how well he knows it already, since under the theory that FSRS uses, memory strength/stability and probability of recall are not the same
@quasi shadow please double check that my/Luc's code is correct - that only the values of R and S on the final simulated day are used, rather than all values of R and S
Other than that, thoughts on the new metric?
We would still use workload (time spent on reviews), just divide it by this new value instead of the sum of R. That's the way I implemented it, the workload calculation is unchanged, only the denominator
Also, based on my limited testing, it seems like this one has a much narrower range. It basically never goes below 0.8.
Assuming my implementation isn't flawed, this is kind of good news?
I mean, not from a perspective of a power user, but rather from a perspective of an average user who doesn't want to tinker with the settings and just leaves everything as default. If the optimal value of desired retention is something like 90%±3%, that would mean that leaving desired retention always at 90% is perfectly fine
Although, maybe it's just my brain, lol
We definitely need some more testing
Oh, nevermind, it does go below 0.8 for the latter of these params from obe. I get 0.74
Alright, maybe it's not always 90%±3%, lol
That's interesting. I was always a little suspicious that the CMRR for my Kanji deck was firmly at the minimum of 0.7 (even with 3650 days).
Give me your params and current MRR at 365 days
Params: 0.2449, 6.6747, 29.6870, 83.2370, 7.1572, 0.1256, 1.6081, 0.0301, 1.5763, 0.2629, 1.1031, 2.0592, 0.1111, 0.2567, 3.5012, 0.3858, 5.7947, 0.1542, 0.6841
Current MRR: 0.7
With the new metric MRR=0.81
That seems much more sensible. I assume the current method would have found something even lower than 0.7 if there was not a hard limit, which seems weirdly low.
@unique salmon what sort of numbers would you get if you just ignore R and use sum(S_i) / n?
Higher numbers. I don't feel like making a spreadsheet for this, but higher by 2-4%
Based on, like, 4 sets of params
So not a whole lot of testing
Tbh I just like discounted average stability more than average stability
It's more nuanced
With just average S, you would get the same value if you had a card at R=1% and R=99%, but intuitively, those two cases are pretty damn different
I think if we're really trying to come up with a good metric, it should include both S and R
I think sum of R is fine
Well, sound disagrees 🤣
Hence this entire discussion
if it is worse at lower DR then we can add heuristics
but i think sum of R makes more intuitive sense than R*S
Yeah, it's more intuitive
I told him that
R itself is fine, it’s more about the Total Knowledge and how considering “more knowledge” to have 1000 items with 1% R (Total Knowledge = 10) than 9 with 100% (Total Knowledge = 9), sounds off to me, and it’s also why I think for the “minimum recommended” (which is in fact optimizing Total Knowledge) tends to just advice to lower R as much as possible.
Anki Forums
R itself is fine, it’s more about the Total Knowledge and how considering “more knowledge” to have 1000 items with 1% R (Total Knowledge = 10) than 9 with 100% (Total Knowledge = 9), sounds off to me, and it’s also why I think for the “minimum recommended” (which is in fact optimizing Total Knowledge) tends to just advice to lower R as much as p...
1000 items would likely be 'activatable' in a certain sense so i'd prefer the 1000 items for sure
What do you mean?
you can relearn these 1000 items much quicker than to learn 990 items from scratch
I mean...probably, but that's not very useful unless we can incorporate it into the metric
it is already indirectly incorporated, sum of R does this when the simulated time is long enough
If it wouldn't be such a pain to rate each card in two ways I would love to see if my idea of (Accessibility + Integrity) would turn up anything interesting.
?
I'm guessing you mean "how easy it was to recall" and "how confident I am that I will recall it again tomorrow or at some point"?
My thought was Retrievabilty might be made up of something like:
- Accessibility - How easy is it to "find a memory"
- Integrity - How good "quality" is the memory
Good accessibility, Poor integrity = "I know it is X or Y. I cannot remember which one."
Poor accessibility, Good integrity = "It's on the tip of my tongue!" / "Oh of course! How did I forget that"
In which case yeah, then we could probably extract information about both S and R from grades, but that sounds like a pain, yeah
Linking with what Alex said: 1000 cards might have poor accessibility, but ok integrity, so would be easier to re-learn than completely new cards.
If you could model both you might be able to do some more interesting things than you can with plain R, but you would probably have to ask people to rate cards on the two axis, not just Again-Easy, which would be a pain.
A bit more intuition why we need a metric with both S and R in it
Example 1: you have two cards at R=90%, except one has S=1 day and the other has S=365 days. Clearly, no sane person would consider them equivalent.
Example 2: you have two cards with S=100 days, except one is at R=1% and the other is at R=99%. Clearly, no sane person would consider them equivalent.
Hence we need to combine both into a single number, such as R*S
I don't know why I haven't thought of this before...
R*S is not good, i'd much rather have 365 cards at R=90% and S=1 than a single card at R=90% and S=365
Do you have any other ideas for f(R, S)?
both R and S are already considered when we only sum R, when we simulate up to a long enough duration as long as the algorithm doesn't work towards abusing the metric
and by abuse, i mean something like if we measure at 1 year, the algorithm doesn't start randomly cramming new cards near the end of the year
idk, a strict improvement i think is R * log(S)
but i still think sum(R) is optimal
Oh, btw, CMRR assumes you can do an infinite number of new cards and an infinite number of reviews...as long as time spent per day is <=30 minutes. In other words, Jarrett uses time spent on reviews as a constraint, rather than strictly limiting new/review cards
Why?
a heuristic for the fact that i'd rather have 365 cards at S=1 than 1 card at S=365
deck_size=10000, learn_span=365, max_cost_perday=1800, learn_limit_perday=math.inf, review_limit_perday=math.inf, max_ivl=36500,
These are used for CMRR
I'd prefer it if it had finite values for learn_limit_perday and review_limit_perday, but such is the word of God Jarrett
Then maybe log(1+S), otherwise it can be 0 or negative
yea
Ah OK I didn't understand the idea at first but now I see why bringing the stability helps
The idea of the computation is still to say : "At the end of that period, what is the DR that would lead me to the "best" situation", so I don't think the "best situation" would be a situation where, you know nothing but could reactivate it later
ironically "you know nothing but could reactivate it later" is more present in sum(S*R) than it is in sum(R) alone
But I 100% agree that this 'Best' might be subjective. For exams where basically your score will be the amount of right answer, having the 1000 total with 50% R is probably better than having 499 with 100%. (Score 500 vs 499), but as I use Anki to do more than just pass an exam, it feels a bit shallow
an algorithm might fear reviewing a high S card in case it fails, in which S would catastrophically fall
or it could just leave it at low R; R would decrease slowly anyways so S*R would still remain high
Hmmm but in S*R, better S will lead to a better score, I don't see why the algo would fear it ? The R would normally be in [DR, 100], so something like [80, 100], and for that range, bigger S means "better" state
Dropping R would only hurt S no ? I mean, S is not related to DR, it's its own measurement (not like the interval). So Dropping R, is a higher risk of dropping S as well ... So by nature the algo will try to not drop R too much, to not hurt S
Thus also why Expertium gets bigger recommended DR with this
i mean if the metric is S*R then to achieve better performance on this metric, an algorithm may choose to not review at a fixed DR but rather have a specifically lower retention for high stability cards
But we're using fixed DR
Since R is between [DR, 100], and Expertium showed that the recommended DR is bigger than before, I think it's safe to assume that R will also be better no ?
boring
Then of course, once again, if the goal is really to pass a test, and the low R are accurate, the current Total Knowledge might be better for that use case, of course.
But I'm really not that sure that the goal is to maximise the arithmetical mean of good answers, without considering how the stability would be impacted
TBH right now, I tested with my params the Days to Simulat ewith 5 days, 30 days, 360 days, 3000 days ... I always always get that 0.70 recommended
It really feel like this is not really achieving much right now
Time for an advanced tab on CMRR.
Imagine three beautiful radio buttons:
- Least amount of work per amount of things "known" (R)
- Least amount of things "known well" (S, R)
- Least amount of work only. ||Please, I want to see the med students try it.||
If you take the extreme :
1000 cards with R=20% and 2d stability (Total K : 200 score, R*S : 400 score)
vs
250 cards with R=79% and 30d stability (Total K : 197.5score, R*S : 5925 score)
I really feel in that setup, the 250 cards with 79% and 30d stability is a way way better situation than having 1000 at 20% (with a mere 2d of stability)
If we really really want to push this further we can even think "Shouldn't the difficult class be included ?". Since it drops with good answer and goes up with wrong, it's probably also a factor, but a smaller one
Difficulty isn't well-defined
At least S and R are properly defined
D is just...a thingy
R*S is not good, let's roughly equalize R*S and have
15000 cards with R=20% and 2d stability (Total K : 3000 score, R*S : 6000 score)
vs
250 cards with R=79% and 30d stability (Total K : 197.5score, R*S : 5925 score)
so if we talk about S we should use examples that use log(S+1) or sqrt(S) or anything else idk
Hmmm I think I see your point
Still true that having a choice for the user to chose between :
"Focus on Total Knowledge, no matter the quality (perfect for one-shot exam)" vs
"Focus on Knowledge and how long you'll remember it afterward, better for higher retention and things you'll keep learning later on / will want to already practice along the way"
Might make sense
No 😠
Option 2 if you ask me then haha
sum of R at a certain defined time, vs average R per review cost over infinite time periods
both do not need to include S explicitly
It wouldn't even work in practice, since mean(R*S) results in higher DR than sum(R) according to my testing, and if we loosely label those as "For an exam" and "For lifelong learning", people are gonna be like "My MRR for an exam is LOWER than for lifelong learning?!?!?! BUT I NEED 99.999% FOR MY EXAM!!!"
I'm bad at figuring out things without examples, the average R per review cost would look like a curve ? Won't it just try to take the minimum R possible just like now ?
Yes, I should rephrase "For an exam you just want to have a barely 50% with minimum amount of work"
"Success not guaranteed"
I mean something like the "knowledge per minute" column on the SSP-MMC-FSRS table
this simulation is done with an infinite-sized deck i think
Nope
ok, why is it not infinite?
Why should it be?
Also, knowledge/time is already what CMRR is optimizing for
It's just that we disagree on how to define "knowledge"
plenty of people have > 10k decks, and SSP-MMC-FSRS is ideally targetting towards lifetime learning
Isn't time a bit difficult to measure ? Sure you have the Anki stats, but for example if you fail something you might spend ~50-60seconds really looking into it (not timed in the answer time)
So you need to come up with some approximation of what an error cost compared to a good answer
doesn't this partially explain why SSP-MMC is not as good as it can be? the deck is pretty much completely learnt in the benchmark...
It's noisy, yeah. Which is why we use a median + some smoothing when n(reviews) is low
and IVL=7 is unfairly limited, it is stuck at 9999 and it cannot go higher because of the deck size
The issue with these simulations is that they have so many parameters, man
Days to simulate, deck size, new limit, review limit, max. interval, time limit (per day)
Anyway, go back to discussing f(R,S)
No, no, wait. Now I want to see what SSP-MMC's results look like with a deck of 100,000 cards.
Maybe I'll run it myself later...
And with a different new/reviews limit ratio...and with a different number of days to simulate...and with a different time limit per day...
Here's the CMRR code, feel free to try it guys. You can replace this line
avg_discount_s[today] = (card_table[col["retrievability"]] * card_table[col["stability"]]).mean()
FSRS parameters are around line 395, I had to hard-code them cause I have no idea how this thing is supposed to be ran normally
@unique salmon, Do you know a way how to extract these values from user collection/deck?
LEARN_COSTS
REVIEW_COSTS
FIRST_RATING_PROB
REVIEW_RATING_PROB
FIRST_RATING_OFFSETS
FIRST_SESSION_LENS
FORGET_RATING_OFFSET
FORGET_SESSION_LEN
Nope. Ask Jarrett
Actually, no, I have an idea
https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/v5.3.3/fsrs4anki_optimizer.ipynb
Upload your deck/collection here, follow the instructions and the values will be in section 3 "Optimize retention to minimize the time of reviews"
Yeah, I wrongly checked the fsrs4anki_simulator.ipynb instead of fsrs4anki_optimizer.ipynb. Thank you!
https://github.com/open-spaced-repetition/SSP-MMC-FSRS/blob/6b6ff1fc09f6b0175500512659ecf0a3cae2ccae/simulator.py#L10-L11
@quasi shadow is this a potential bug? the purpose of this code seems to be to treat cards with stability of >= 3 years (s_max is given as 3 years) as completely learnt, but this ignores t. If t > s then this still returns 1.
I think it's just the assumption of SSP-MMC that cards above a certain stability are "permastore" in Woz's terminology. Aka unforgettable.
i think this needs justification, we should check in the dataset to measure if this is true before using the assumption in SSP-MMC
if it is true? go ahead and set S = infinity itself in FSRS and voila we have improved FSRS. do not include the assumption in SSP-MMC specifically
Ask Jarrett. IMO 5 years should be enough
I'm not sure how to choose a good value tbh
100 years is DEFINITELY permastore and 100 days is DEFINITELY NOT permastore, now we just need to find something inbetween 🤣
Thanks for the reply. I think that's an interesting take. To sum up your opinion, it sounds like you're saying that because Anki doesn't really have any built-in support for "new Ideas may enhance structural stability", it is sort of missing a big part of what makes learning possible/easier? I've definitely felt the same thing.
And to be honest, I became fluent in Japanese through Anki and never realized your mnemonic for 手伝う 🙂
This was intended as a reply for this message #1282005522513530952 message from @bold terrace.
but there are like 400 messages between then and now
It's not a bug. It's what I did in my paper's simulation.
okay i'd like to see experimental justification for this by looking at the dataset
I don't have any experimental justification for that. You can throw out this assumption, and see the difference becomes negligible between SSP-MMC and optimal fixed DR.
But there are some use cases: https://ankiweb.net/shared/info/1666520655
The CMRR becomes less useful when we provide the simulator.
Should we introduce more complex formula into CMRR?
@quasi shadow i cloned SSP-MMC-FSRS and ran script.py. All the results except for SSP-MMC match the readme. Is there something i'm missing?
The readme has two tables.
Oh, I checked the commits, and I forgot to update the table.
@unique salmon I find a typo.
Btw, for the benchmark of proprietary algorithms, we have two comparisons.
Feel free to make a PR. Or just tell me what the typo is
Oh, ok, I see
I really need someone to proofread the article once it's done, but idk who would
Just make it based on real deck sizes and review limits
That alone will make it waaaay more useful
Max. studying time per day can also be estimated from data, btw
For each day calculate the sum of review times, then find the day with the highest sum
regarding convergence for ssp mmc I just changed the code to use torch for parallelism and it's like 1000x faster now on my pc
so running for 10k users won't be as bad anymore, hopefully
I'll work on a pr tmr
Should it also consider the existing cards?
Torch is awesome. I thought Numpy is good enough but I'm wrong, lol.
Does it require GPU?
Yes
I was wondering if maybe CMRR and the simulator should just use exactly the same config. I'm not sure how much difference the options like leech threshold would make. I just assumed the fixed 10/day thing was for some fsrs reason I wouldent understand 😂.
Actually, that wouldn't work for people with no reviews, so we'll need a default. My proposal:
- Go over all 10k users and calculate workload (minutes of study/day)
- For each user, calculate the average
- Take the 95th percentile of the averages
That way we can get a reasonable max. of how much time an Anki user is going to spend per day at most
I'm talking aboutmax_cost_perday