#FSRS Megathread
1 messages · Page 11 of 1
maybe we can go lower than 0.5% RMSE (bins) by combining RWKV-P + RMSE-BINS-EXPLOIT
I don't count RMSE bins exploit because it's cheating
how is it cheating?
the problem is that we cannot exactly know if a good algorithm isn't cheating the metric a bit
For that one in particular - it uses the formulas for binning internally, which is obviously cheating
If an algorithm doesn't use our binning formulas internally, it's fair
i don't see how being aware of a metric constitutes cheating in the same way that models like FSRS are aware of log loss and RMSE (bins) as well
FSRS is even directly optimized on RMSE (bins), direct cheating apparently
and when you guys find improvements in FSRS it is often in terms of % reduction in RMSE (bins), you are optimizing over the metric directly over time
models like FSRS are aware of log loss and RMSE (bins) as well
?
FSRS doesn't use the loss functions internally for it's own calculations
Sure, but again, FSRS doesn't internally count bins using the binning formulas
And it doesn't internally keep track of log-loss
So it's fair
whatever algorithms do internally is their own business, we should make the metrics as robust as possible to anything that they try
i now believe that RMSE (bins) should just be removed everywhere, it doesn't have a nice interpretation so the only reason we still use it right now is no longer the case
https://github.com/ankitects/anki-manual/issues/368
@quasi shadow please give fsrs wiki modify permissions (I think i need permission to modify the page?), i'll write about RMSE (bins)
also disagreed on this exactly line, FSRS does compute log-loss when doing gradient descent
and at least on anki it does compute RMSE (bins) to make sure that the new parameters do better on RMSE (bins) than before
so it also does compute bins
its just not in the way that you consider cheating, but the fact is that it does compute the bins
The log-loss is calculated outside of FSRS, same goes for RMSE
RMSE-BINS-EXPLOIT calculates the bins inside the algorithm itself, hence cheating
this isn't loss and I am disappointed
How about we skip the forum and just ask Dae directly to remove "Evaluate"? 🤔
did you make a mistake in the diagram? FSRS is cheating? not that I disagree by your standards!
It's just two different alternatives
Feel free to replace "FSRS" with {algorithm_name}
The point is that if it doesn't calculate the loss internally, it can't be cheating
FSRS computes log loss when doing internal gradient descent and also RMSE (bins) when updating parameters, so it is definitely cheating
in the same way that you claim that RMSE (bins) is cheating
literally where
literally gradient descent optimization
and also in anki, FSRS does not give new parameters unless it does better than the previous ones on RMSE (bins)
But it's not a part of how FSRS calculates DSR. Gradient descent is outside of the algorithm
it is crazy to claim that gradient descent is outside of the algorithm of FSRS
Why?
it's the entire optimization process that FSRS uses
without optimization FSRS doesn't learn
You cannot be serious man...
When FSRS calculates difficulty, it does not use log-loss/RMSE
When FSRS calculates stability, it does not use log-loss/RMSE
When FSRS calculates retrievability, it does not use log-loss/RMSE
okay then, please remove 'optimize' from anki. FSRS now only has one set of parameters, if you ever want to optimize, you are no longer using FSRS
Maaaaaaaaaaaaaan 😭😭😭😭😭
I'm just saying that RMSE-BINS-EXPLOIT is keeping track of the loss inside the algorithm itself, while FSRS doesn't. Hence why one is cheating and the other is not
i mean that's my point, FSRS also needs to compute loss internally so this specific thing isn't cheating imo
to me RMSE-BINS-EXPLOIT isn't cheating and it shows that RMSE (bins) is unreliable
Do you love Evaluate though?
@unique salmon how about we go back to plain old RMSE, no bins? nice and human interpretable
It will be too similar to log-loss, both in terms of absolute values and in terms of being correlated with retention
fair enough, do we know if AUC is also correlated with retention? could show AUC instead of RMSE (bins)
https://github.com/ankitects/anki/issues/3926
Just remove Evaluate
I made an issue
(screw forums because people on forums disagree with me 🤣)
evaluate makes me feel like a scientist
Give me a moment
expertium if you delete evaluate button im going to delete you
no i wont mods dont punish me pls
Ok, this was a huge pain, but here
Love me some AUC less than 0.5, lol
Love it when FSRS does worse than random 🤣
Zoomed in a bit
seems usable
if AUC is less than 0.5, turn off FSRS
@unique salmon btw ill take a look at nn D but i'll write my own code, won't make any promises on this
Nah, reverse FSRS predictions
Whenever you see R=90%, treat it as 10%
Whenever you see R=10%, treat it as 90%
🤣
https://github.com/ankitects/anki/issues/3926
Wanna chime in here?
Dae, despite what the screenshot above shows, I think we should disregard that poll and remove "Evaluate" anyway. David agrees, btw. "Evaluate" gives the user a bunch of numbers...
I agree with @polar maple , if a metric is exploitable, you can defend an algorithm to not internally try to exploit it but by definition of trying to minimize it it might, a Neural Network could led to such cheating without being specially instructed to do it like this
But at the same time with log loss you would over privilege shorter interval precision no ? Since the vast majority of reviews are 1-20d
You could argue that then having better precision for that vast majority should be the goal though
You’re a complete psycho 😅
Alex made LSTM and another net, and in both cases RMSE and log-loss improved. We haven't seen a case where log-loss gets better or stays the same, but RMSE gets worse
That would be evidence in favor of NN cheating
Please read what I wrote on Github
Which I did before writing this
Not actionnable ? People could based on it understand that something is wrong in their prediction so they can adjust that
They can also see if splitting différents deck in different preset help or not
Most users have no idea what the numbers mean
So explain how to interpret those numbers
You’re going full dictatorship
Or just trolling but this is just ridiculous
Btw, RMSE isn't used during optimization at all, it's only used after the optimization is finished. So a neural net couldn't "internalize" it anyway
well i only have 1 day of reviews so thats probably why i have .38% 💀
Why do you hate regular users man...
And before you say "That is not what I said at all" - there is absolutely no way in hell we can make "Evaluate" intuitive
Like, none
The best we can do is "lower = better", which is what it already says
We could add the log-loss formula, but that would scare the average person even more
Because you never cared to write something like “RMSE of 3% can be seen as having in average a normal range of 3% precision around your DR”
Here's your intuitive log-loss, lol
Funny enough you’re the one saying they should not have to decide
I guarantee you if I polled r/Anki, most people would say that they don't know what to do with the numbers that "Evaluate" shows
You’re playing the dumb troll but you know you bend truth to justify being smarter
You give people a “normal range of log loss” based on user in the 10k dataset and voila case solved
...except that the benchmark uses a different procedure
But you know that very well you’re just playing pretend
Give them range based on the one from evaluate then
You’re just playing the “I know better than people so people should not be able to judge by themselves”
People neither want nor should have to decide this stuff. Users use Anki to review cards, not to tweak a bunch of abstract numbers
What people want is what you get in your poll you decided to ignore
The whole point of FSRS is (supposed to be) that it outsorces tweaking to the computer
IMO you just get so attached to FSRS like it’s your baby that you just want to control it’s course
Alright, fine, I'll make a poll on r/Anki and ask things like
- Do you find "Evaluate" useful?
- Do you know what the metrics mean?
- Do you know what values would be considered "good" and what would be considered "too high"?
You unilaterally create a request to do something most people asked by you, not to
It tells more about you than average users or than about me
Because you think your Reddit sects will abide to you
the evaluate button is literally useless
the numbers mean literally nothing to anyone except 5 people
Or because you think redditors are smarter than the average user from Anki board ?
No, because I think Redditors are dumber than the average power user from forums
So explain the number, vulgarize it, give examples of what healthy value can look
even if it is explaied, what actionable things can I do with that number?
like even if I knew exactly what the fuck "Log loss: 0.2826, RMSE(bins): 3.35%. " means, what do I do with that information
Reflect on things that could explain a not healthy value : hard that were again, deck that mix very different material, not enough card rated “good” that you didn’t know already acquired leading to too much optimistic stability …
Btw jake, I'd appreciate it if you commented here or gave me a thumbs up, just so that Dae sees that it's more than me and David
https://github.com/ankitects/anki/issues/3926
Dae, despite what the screenshot above shows, I think we should disregard that poll and remove "Evaluate" anyway. David agrees, btw. "Evaluate" gives the user a bunch of numbers...
what percentage of anki users, if this number was suddenly useful, would actually reflect on this
If you’re telling me it’s better to have just people having screwed up parameters, then what will you do to help them? How do you ask them those parameters to help and educate them ?
" I think over more than a year of helping on r/Anki, "Evaluate" came in handy, like, once." (https://github.com/ankitects/anki/issues/3926) it sounds like this isn't even helpful
FSRS would have to be removed too to be sure people are not screwed by it then, this is non sense
what imaginary problem will suddenly be resolved by the contents of the evaluate popup
I'm not suggesting removing the parameter field
As I wrote, the parameter field can be useful
out of curiosity, has anyone outside of maybe that ismael guy actually posted truly shit numbers?
Also I’m sorry but most of the time the “help” I got before diving in understanding FSRS was more or the time “you understand nothing” …
I'm pro hiding the parameter list too, maybe a "click here to copy info when asking for help" that puts it all in the clipboard for debugging help
Comparing my DR to my Actual Retention ? Being said I can’t …
Yeah, I once saw a guy who used "Remedy Hard Misuse" and didn't optimize afterwards, he had RMSE=20% or 30% or something like that
?
Who said that?
anyway from what I understand the actual argument is not to actually remove evaluate, but to put it somewhere that downplays its importance/relevance
(at least that seems to be david's idea)
I think the preset ui is just really bad at containing weird niche options
so everything just has the same weight to it no matter the importance
That would be nice, but I'm not sure how to do that in practice
We can't put it in "Advanced", that would be to akward + hard to make it clear that it's related to FSRS
IF IT WERE ME: multiple setting categories: core, extra, fringe
this is peak 'fringe' category
I have proposed having two layouts, Beginner and Pro. Dae was not very fond of that
this isn't about beginner/pro, this is about multiple tabs that may have the same categories but with different tiers of dumbshit
“You’re so wrong your beyond the point of being helped”
maybe you shoulda posted your log loss
https://forms.gle/EyJpGmpR6M8JAFGy6
I will post this on r/Anki tomorrow
that might have solved everything
I was simply explaining how the Average Predicted R could be lower than DR in case where a lot of new cards per day was introduced
But I was so wrong it was difficult to explain it to me
...and then I am called a troll 😅
Then the same people are taken as example as helping others, while their just getting their ego dose of feeling smarter than everyone
all this talk yet no real world use of rmse bins found
I gave 3 practical actions
Unfortunately those are ignored I guess
they will be ignored by literally every anki user
I was roleplaying the average person using this app
Hey, it's useful in the benchmark. We need a number that can get close to 0 🤣
Pretending to know every people is also not that great when we have not even a clue of the percentage of user using FSRS in the first place
"I have to read the manual to understand what this number means and what to do with it? FINALLY I LOVE THIS FEATURE" - sound, probably
More like “why do I have a retention 10% Lower than expected if the gospel was to enable FSRS and my live would be better than with SM2”
@bold terrace did you move to sm2?
Followed by “you’re really a geek to have enabled FSRS and complain about that difference “
Rough approximation (this is from a year ago)
Well, I can't ask random Anki users, so this is the best I've got
@bold terrace maybe accept you're in the 0.001% of people in the dataset that is better served by sm2 and use it instead 🍃
can't dae like....scan all the data on ankiweb for decks that have the fsrs feature enabled?
That's a good question, actually
@bold terrace can you do an experiment for science and use sm2 for a few months?
The guy still tweak SM2
FSRS works great once you have some grasp on how it works, thus why I’m against removing evaluate and instead help people interpreting those values
https://forms.gle/EyJpGmpR6M8JAFGy6
I added a new question
I have zero grasp on how it works and I just chose to not care to think about it and I'm doin fine
¯_(ツ)_/¯
Because you’re using Anki and developing in Anki for years 🤷
that is quite an overstatement of my experience
fsrs didn't even exist the previous time I was using anki
Funny enough Anki is in a boom
SM2 is 6 times bigger than FSRS for whatever that means
shitpost: reminder than sm is currently at version 11 and sm2 is basically 20 years old
🍃
17 🤓
Actually, no, 18
Actually, wait...
gimme a sec
I'm unsure if there exists SM-19 or no
SM-18 is definitely a thing
man I am OUTDATED
Well, FSRS has been around for less than two years, so it's not super surprising
also how many "sm2" searches are looking for super mario 2
a nonzero amount thats for sure
I’m going a bit in the Ad Hominem territory but sometimes I feel Anki is not always used by the people to actually learn the stuff they want to learn but more because it became almost its own thing 😅
I got Spiderman 2 in the results
There is SuperMemo 19 (software), but unclear if it's using a new algo
I'll take that as "SM-18 is the latest algo"
Btw, I deleted (I assume) Jake's response because of the addition of another question 😅
nooooooooo now I have to fill it out again
It sure as heck can, lol
The issue is that nobody is going to give us data
Jarrett barely scrambled 16 or so collections of SM users
Yeah
why has no one reverse engineered any of the sms past like 2
On that note at least FSRS community is open about data
Apparently a long time ago Anki devs tried, but gave up
back in the dark ages of sm-5
Also, FSRS sorta-kinda counts as "reverse engineered SM"
Priorities don't affect the calculation of the probability of recall though, unless I misunderstand how SuperMemo works
@bold terrace would priorities solve your problem
https://supermemo.guru/wiki/SuperMemo_Guru
Woz wrote a ton of stuff
From articles about algorithms and math to...uhhh...some vague stuff about the brain of a certain political leader and about Elongated Muskrat
He's a bit of an odd fella
"can supermemo be used to forget things"
Investing and Vtubers too ?
Lol
Yeah that one caught my eyes too
I read the list before I realized you ALSO pointed that one out
I mean techincally I'm using anki for that purpose
the idea is to just fill my brain with so much garbage it pushes other things out
I realized I was spending way too much on alcohol
and anki seemed like a cost effective replacement long-term
By the bins ?
I'm a craft beer weenie
RMSE (bins)
personally i get a little dopamine rush when i see that parameters have changed and also the metrics look better
I think anki needs more visual flair, like you hit the optimzie button and theres a graphic that shows the numbers moving
animated bar graphs going up or down
A bar that represents RMSE getting shorter
yep yep
Adds zero utility, but at least it's fun 🤣
Actually, nvm, it would be a nightmare if you have 20 presets and use "Optimize all presets"
Would be like getting ads
nah its one big set of graphs all animating at once
this is how we get the young users and vc funding
brb gonna make my own anki
called wanki
with this shit
it'll be great
DerIshmaelite watching 260 graphs going down
Wanna attract more people ? Preinstall decks 😂
Things like yomitan or Anki where you have to install external decks or dictionaries is a no go for most people
if an nn like RWKV were optimized on RMSE (bins) as the optimization metric then it would likely be able to reverse engineer RMSE-BINS-EXPLOIT
Oh, yeah, configuring dictionaries for yomitan is a huge filter
sponsored preinstalled decks? vc will love this
coca cola presents: basic spanish vocab
Defining the data type of your card fields …
Even the difference between a card and a note
I know more or less FSRS but I’m still afraid of using “cloze”
Good thing we don't do that 🤣
Ok, ngl, you are kinda convincing me to remove it from the benchmark
Let’s documente cards notes and cloze with an UML diagrams
Well, technically you can't anyway - it's not differentiable
RMSE (bins) I mean
on a more serious note: the fact that you gotta use a website then copypaste like a number to get decks/addons is mega jank
like how is that not built in still
it is differentiable, the bins are fixed items
I actually think it's neat. No unpacking files and copy-pasting them into folders, just a single number and Anki does the rest
this is I think the most reasonable thing anki does in this category of things
I meant the browser should be in-client
Ok but no average user do css
Some graphical editor would help many people
how do you propose it be done
WYSIWYG
maybe i am stoopid
@bold terrace glad to hear you are working on this feature I expect to see a PR soon
RMSE (bins) can be made to be uncheatable by actually 5-way splitting it properly, but it would no longer make sense to use it on algorithms that adapt on the fly like RWKV
Add an AI that writes CSS based on user input
Tomorrow by 8AM
but think of the tryhards who don't want a wysiwyg experience
I need layers of js in my templates and dae refused to let me add a separate scriping field
I mean having both WYSIWYG and markup is not that rare
ui complexity oh nooooooo
ui complexityyyyyyyyyy
I'm trying to visualize how to hide "Evaluate" in away that isn't jank and doesn't require lots of scrolling
Come on
but now we got tabs and css radio buttons
just make the font smaller
kek
I mean
Come on
Be slightly honest for one minute
Preview/Markup, UI Complexity compared to this ?
where even is that
Anki IOS
I mean thats peak ui performance right theer
The only one that you need to pay
ship it
You know what would make it even better? Smaller button + less opacity
You know what would make it even better? Make it even smaller and decrease opacity further!
Perfection
[insert that vince mcmahon meme here using these images]
Don’t know if I’ll be able to use Anki with CMRR outliving Evaluate
At least get rid of that with it
he doesn't know
Jarrett wants to remove CMRR
The fact it will be be improved or removed ?
And I am like "LUUUUC! SAVE US!"
Jarrett: cmrr bad
Me: cmrr need realism. luc make cmrr unsuck
You see how it’s frustrating to have people solving issues by removing unilaterally things right ?
Except that I can see the benefits of CMRR, but have to think of rare edge cases to maybe come up with some questionable benefits of Evaluate
the people involved think its easier to remove than fix
unless you're gonna do the work making it better, its probably gone
So in the end everything is about how YOU see it 🥲
We'll see how my Reddit poll turns out
Ad hominem but maybe changing that could help you bonding with real people 😅
Aka we'll see what the average user thinks
I think we burn down all metrics, remove everything except "enable fsrs" and an "optimize" button
Or at least the average r/Anki guy
I think we burn down all metrics, remove everything except "enable fsrs" and an "optimize" button
maybe "optmize" and "optimize with reschedule" and get rid of that stupid fucking toggle
how do you handle auto optimize and rescheduling on optimzie
auto-optimize to me implies no rescheduling
ahem
Anyway, this is peak
(David's pic btw)
TBH I wouldn’t mind those options to be in FSRS helper and leave the normal use case being DR only
https://i.imgflip.com/9qx5v1.jpg anyway I quickly used some garbage website to make this garbage meme, its not even high enough res to see anything and I didn't even have a 4th image
I’m against removing evaluate but moving it I don’t mind
I was hitting "I'm spending too much time on this" territory and so you get this half-finished thing
But moving it would mean moving it with the optimize
actually david's image should be in the last slot
Btw, I expanded my Github comment a bit
Moving "Evaluate" somewhere else is nice in theory, but I can't think of a good implementation. If we put it in "Advanced", it will be awkward (scrolling back and forth between sections) and unclear that this button is related to FSRS in the first place, unless it says "Evaluate FSRS parameters" or something. Maybe we could collapse it, but again, I can't visualize a good implementation.
https://i.imgflip.com/9qx6qt.jpg I must have nothing to do today wow
if only it had enough pixels to actually see anything
BTW, top comments on this Why you still use SM2 ?
https://www.reddit.com/r/Anki/comments/1h2k4m2/to_people_still_using_sm2_instead_of_fsrs_why/
- I prefer to have a bit more control
- I don't want to believe what strangers tell you as it is
- Happy with retention in real-world, and not as percentage
...
Response from FSRS community : "Make it more blackboxy !" 
While someone still has to pay its rent with his anki videos
Ok but this kinda true
Oh come on
you could have screenshoted one of your 9+ comments
A seamless experience, smooth and with no bad surprise
- https://www.reddit.com/r/Anki/comments/1bwxd22/even_with_retention_rate_set_to_70_fsrs_is/
- https://www.reddit.com/r/Anki/comments/1c16u4y/fsrs_mature_card_retention_rate_dropping_pt2/
- https://www.reddit.com/r/Anki/comments/1an2pha/fsrs_my_retention_for_mature_cards_went_down_by_10/
- https://www.reddit.com/r/Anki/comments/18zvrhq/im_not_sure_fsrs_is_actaully_better_than_sm2_but/
- https://www.reddit.com/r/Anki/comments/1h2k4m2/comment/lzlfevb/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
- https://www.reddit.com/r/Anki/comments/1h2k4m2/comment/m0ce32k/
- https://www.reddit.com/r/Anki/comments/1an0ck6/how_long_before_i_see_results_from_fsrs/
- https://www.reddit.com/r/Anki/comments/1hqaukz/since_turning_on_fsrs_my_retention_has_been_lower/
Oopsi
Yeah that one I agree
When you think about i
I knew Anki through r/LearnJapanese
Yeah, TR<DR is something that has been popping up with alarming frequency
I have no idea why, honestly
but actively following r/Anki is not something you do as an average guy
I mean, I can think of a few explanations, such as not optimizing parameters and Hard misuse, but still
but who would complain about TR>DR?
I think Jarrett nailed it : Hyrum's Law. Through SM2, since there is no concept of "DR" or "R threshold", you reset cycles at every lapse, with always-decreasing intervals : People get higher and higher retention through lapses. Basically, even if SM2 does not specify R(Mature) > R(Younger), it is the side effect of its implementation
https://www.hyrumslaw.com/
Eh, I've seen a few people, though they were more confused rather than sad/angry
I've only just caught up 😅
Evaluate can be useful to decide if splitting presets is a good idea. You don't need to understand what the numbers mean, just before > after = good
You might be able to make nice UI that tells you without showing the numbers, but until then having the numbers is useful.
FSRS on the other hand is "expected retention"-based. You don't get "higher prediction" at every lapse per se, you might get higher D and lower I, but the prediction will still be a constant DR (not increasing R)
You're too precious for this world ❤️
I wanted to switch one deck to SM2, and one to FSRS, but it's a global setting
so I need 2 user profiles
and I'm a bit lazy for that now
And IMO FSRS is superior at both task (precision AND increasing retention) if you know how to tackle it (which is extremely simple : Increase your DR, wether globally through Deck Options, or through Filtered Decks)
Time for another spaced repetition civil war
SM2 is "let's calibrate your ease factor for each card" which is less efficient than letting FSRS guess the profile of your card in 4-5 reviews
And I know I can sound contradicting, but my point is : FSRS is truely the way forward, but we just need a way to hold the hand of the user
Remove Evaluate side: me, David, sorata, jake
Keep Evaluate side: Sound, Danika, rossgb
Seems you forgot @cursive badge , and all people in the Anki forum that were too dumb for this poll 1h ago 😄
I edited my comment just before you made yours 😅
ultimately none of our opinions matter on the topic
since daes gonna do what daes gonna do
Dae giveth and Dae taketh away
I ain't goin to blame him
Well maybe I'm going to make my own SRS app with blackjack and hookers 😝
What about a clean and simple architecture 🥲 ?
hey me too
And 0 users because I'll probably get bored and go back to Anki ;p
how would you make anki clean and simple
Unironically, why not just keep "Evaluate" in the Helper Add-on?
this is what I'm saying
No 50 languages meshed together
Maybe a full js memory-vore thing
thats like....the least complicated thing about it
Or some kind of full-python stuff
We could just hide the button, but keep the underlying code to be accessed by the add-on
anki used to be 90% python
I still dream of WASM/JS runtime.
rust was done to allow the code to be used on all platforms
But just like Ruby was before it
Although Jarrett said that he doesn't plan to add any new features to the add-on 
can't really have a python backend running on ios
there were real reasons for the rust move
Why not Java? It runs on 3 billion devices you know. ;p
and the current multi-lang architecture
java could have been an option!
point is ankidroid is able to have feature-parity trivially due to the rust backend, before that it was constantly year(s) behind in terms of compatibility
Also personally, in my little dream SRS, no concept of Notes/Cards, no cloze, searching deck and addon from within the app, just "Again/Good" by default
how do you handle cases where multiple cards naturally fall out of the same source data
No notes/cards would be ass though
Different cards
note/cards is like one of the most convenient things, its just the ux is garbage
I don't know I feel notes/cards is a good idea theoritically but in practice I don't think I could transform my Vocabulary deck into a Sentence deck just by mapping different fields in my front
The quality of the card would suffer
my own system wuold be very frustrating without it, especially as I bury my siblings
I think notes vs cards is a necessary evil: it's confusing, but the alternative is having to make a lot more cards, not being able to edit all of them at once, and not having "bury siblings"
For ex, I think people doing JP->EN and EN->JP from the same notes will always struggle with synonyms
Becuase you might need some context around the JP term to know which one you want in the EN->JP answer, and context for the EN in the JP->EN
Same goes for deck vs preset: a necessary evil. Otherwise you will have to either
- configure each deck separately
OR - have the same settings for all decks
It would probably be terrible UX, but I kind of wish cards were decoupled more from notes. I like the idea of there being an explicit knowledge graph where cards can be based on multiple nodes.
Oh !
It's probably something that would work better hidden behind the scense in something like Duolingo
@cursive badge you got me there, I was about to point of "F'cking card links"
Right now I have a field that help me tremendously : "Confusion". When I confuse card A with card B, I add "B" in "Confusion" of card "A"
So at every review of A, I also reinforce my ability to dissociate A and B
Having things completely atomic in Anki is not that great IMO
Being able to relate cards together would be a killer feature
"Synonyms", "Antagonist", "Different form"...
Like "Card A reminds me of card B"
FSRS could take that into account in theory
what do those relations do in terms of actually reviewing
Let's say you got wrong a word because you mixed it with another, Anki could then put those 2 in alternance in the next upcoming days
For example, if you review card A and your memory stability for that card increases, stability for card B also increases
We tried that with siblings, but the improvement was too small
Typical problematic scenario : A and B are very similar.
Day 1 : Review A (i=5)
Day 5 : Review A (i=10)
Day 15 : Review A (i=20)
Day 20 : Review B, got it wrong because you thought it was A
Day 21 : Review B, got it right
Day 26, Review B, got it right
...
Day 35 : Review A, got it wrong because now you thought it was B.
...
Day 65 : Review B, got it wrong because now you thought it was A
You do that "confusion dance" until A and B fall in the same day bin
With relationship, confusing B could reduce the interval of A, or sync it to B, so you review always them together, to be sure you can differentiate them
A dream feature for me would would be to map answers to the knowledge graph and automate the creation of "scaffolding" cards when interference is detected.
The Math Academy FIRe stuff would be fun too.
Yuup
the math academy stuff is carefully crafted decks and relations though?
But IMO, this kind of graph might be easier to do if you do a tool speciallized to a domain (math, japanese ...)
like how much work goes into that
if you do it agnostic, then it's a bit difficult to build all that model yourself
With a domain-specific approach, you could even use "community defined relationship"
It would be really hard to create a good UX. That's why I said it makes more sense for a Duolingo-like. I can dream though.
New users would directly benefit from some relationship like "Synonyms" etc
finally the peanut gallery can affect my anki reviews
Model: FSRS-5-siblings
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5-dev LogLoss (mean±std): 0.3270±0.1525
FSRS-5-dev RMSE(bins) (mean±std): 0.0507±0.0325
Model: FSRS-5
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3276±0.1526
FSRS-5 RMSE(bins) (mean±std): 0.0518±0.0333

Yeah but this is completely different thing
For the record, a neural net didn't do much better either
Though maybe Alex's nets are smarter
here we're not really talking JP->EN vs EN->JP, but more like mesh of networks, with semantic relationship
it would be cool to get a dataset with actual information on the cards
Also, the precision would not necessarly be the first benefit
The fact B entered the chat, even change the initial prediction of A
so prediction could change very organically based on what other things happen
"A card C has been introduced ? It has "Synonyms" links with cards A and B ? Let's sync their recall timing !"
You could even create "routes" of learning in that network
bulk things together, etc
building such a thing seems user and deck specific and would probably take more time making all the links than just bruteforcing your way through it 🍃
I need a WaniKani DB heist.
what I have problems with some other person might just get
Not really though !
I had cards with >100 reviews I kept getting wrong until I did the work to manually do that outside Anki
I think Anki is an extremely inefficient way to learn even vocabulary
and identifying and doing the work outside anki would be faster than creating some sort of relation graph in anki
If you consider a language like Japanese has 500k word, let say you need to know 50k word, at a rate of 8 new/day, it's 17 years !
Yeah my decks is basically a big siterip of netflix lol
Sure thing ! But if you make it community-based, then it's a bit like if students would create their own students ressource
bro is learning core 500k 💀
The 17 years were for 50K lol
Before I had tht mindset : "Every word I know should be in Anki so I'll be able to really track my progress"
Now I realize how infeasible it is
because you need a card for ever word
every word should be in anki because anki is a grind
and are you grinding if you don't add everything you don't know to it?
how else are you gonna push the unwanted memories out
anki is a form of penance
In fact, many words overlap, so finding ways to batch them is really a nice trick to achieve higher new/day without having to wait to master 20k words before it goes faster
I batch my next new card by common kanji now
Also, I just realized that it's a tripple negative 🤣
And to find the next batch, I added a sorting order "Unseen Card" in Kanji Grid (By Kuuuube)
so I take a mastered-kanji for which I have a lot of cards to learn
and I can make learnig new words way more easily
not even 100%
IMO grinding better logloss/RMSE prediction already hit a threshold of diminishing returns
you need to grind harder clearly
As a side topic: I reduced my DR from 90%->80% last month because I was struggling. It halved my reviews but somehow my TR for the month only dropped to 87% 😅
Lucky boy
That makes sense if you didn't use "reschedule cards on change"
I did
#Team90DRforevernow
Well, the helper addon version.
CMRR is full of lies and shadows
@cursive badge : But you still do higher DR for mature words no ?
So it pull the retention higher /
I remember you said you had Filtered Decks for mature
CMRR be like:
This user has a deck with exactly 10*days_to_simulate cards in it. And he can learn an infinite number of new cards, but only as long as the overall studying time does not exceed 30 minutes
This is why I asked Luc to just hook it up to the simulator config
I dropped doing that too after a while. It has not been a great month for me ☹️
Gotta take some 0.70 ?
Sorry to hear that 😦
I like how it just returns the lowest valid value, "I mean it is the minimum recommended value"
CMRR was intended to have as few settings as possible, but as a result you get this "The user has no learned cards, deck size=10*days_to_simulate, Easy Days don't exist, fuzz doesn't exist, sort order doesn't exist, new card limit is infinite" shit
sounds a bit halfbaked
This is why I asked Luc to just hook it up to the simulator config
Also fundamentaly there is the issue of the exponential that goes slower and slower into low number, while those low number contribution are purely linear
So it's easier to have a full backlog of 30% Retention than to maintain cards at 80% R
flat forgetting curve supremacy
Secretly I was one of the low decay weirdos all along 😮
I'm definitely not remembering this stuff in 100 years though, so maybe not that weirdly low ;p
Also IMO the default graph selection of Anki should be a bit more useful than "What was your Retention at 9PM".
- Stability over Time
- Memorized over Time f(R,S), not just sum(R)
- Daily Load profiling (by lapse, repetitions...)
Kind of, but not in the sense that you mean. Rather, I think we picked all the low-hanging fruit and past FSRS-6 there just aren't any clear ways to improve FSRS
You could get much better results with a neural net though
Memorized over Time f(R,S), not just sum(R)
Oh come oooon, how many times have we discussed this...
Sum(R) has an intuitive interpretation, f(R, S) will almost certainly have none
Like, my R is around 90% between noon and 6PM and very low after 8PM ...
No shit sherlock, I had DR=70/80% when I was doing my reviews at night
?
f(R,S) means : a function depending on R and S
Yep
Think of one that still has an intuitive interpretation
Oh, the integral one does, btw
exactly
R*(1-exp(-S)) or whatever doesn't
whatever
Or R×sqrt(S) or R×ln(S)
Damn you really missunderstand everything 🥲
By F(R,S) I just mean any kind of function taking into account R and S
so if you like your integral so be it
I Don't care xD
Go jerk on vtubers
There is a difference between "it has desirable mathematical properties" and "it can be explained in one sentence with <20 words to an average user"
You want something with desirable mathematical properties, but in this case it makes it very hard to come up with something that also has a simple, intuitive interpretation
And not just "higher number = better"
For "memorised over time" just sum(R) is the best
I was planning to use the integral for CMRR
Ross like line go up. Line go up make Ross feel good.
sum(R) should be called "Expected Score at a test" IMO 😛
If you have 50% R for all your card, you might be able to get a 50% at a test with the same cosntraints than in Anki
So yeah, sum(R) can make sense
But c'mon, please people, have a bit more self-love than just trying to get x-% at a test 🥲
Try to remember it more than one day 😄
I think it's even more important that normally, in a perfect world, TR=DR
Soooo if all you see is a retention line that stick at DR for month
you might feel you're getting nowhere
but in fact, Stability might increase, workload might be dropping ...
all those tiny positive things need to be brought up 🙂
That's also why I think sum(R) is a bit silly : If your DR is 80, then sum(R), without adding card ........... will always be somewhere around N*90% (N your number of active card)
Well, yes. That's the point
Kind of sad if you are looking for progress though.
Yeah to me sum(R) is really just a "test score estimate" in some way
Well not even an estimate 😛
You know at 100% R 50% of your deck, the Memorized will tell you 50% N, but in practice you'll most definitely miss your test if the grading condition is 50% good answers
(if you're not lucky and pick questions you were at 0% in your deck)
So sum(R) is like "Your estimated score test result, if the test consist of all the cards of your deck" lol
Maybe I need a Stability over time heatmap to show progress.
You can plot sum(S), yes. Though I can't think of a nice interpretation
Average S is a bit better in that sense
I mean this, but into the past:
Yeah average S is nice, median too but is a bit less smooth in practice
(N.B. the cut-off at 21 days is because of the filtered decks Sound was talking about earlier)
I'm not entirely sure what the due dimension brings to the table though 😄
Could be useful to find "spike" of workload, but those are often the 1-5d stability that are accounted only for one rep in those things
I need to redo it, but Stability over Reps is the most depressing things I ever plotted
It's ... a declining function
The more you rep, the less the average stability
It's at that point I thought : Ok now my focus is to find where my workload goes XD
My higher lapses, represent 10% of workload for each slice of 5% cards 🥲
54% of my workload for 33% of my higher lapse cards 🥲
I have not found the "due in" version very useful. It just lets you see a bit more of what is happening in "Future Due".
A version looking into the past would let you see how the card stabilities changed over time (hopefully lots of them increasing).
Yeah with the past it could be cool 🙂
So now I even consider reducing my lapse normal -> hard to 4 and my hard to suspend to 8 lol
The overall idea would be : Discover as many easy words as possible, taking the low hanging fruits, and then build the more difficult words based on them
I have 1157 cards I never lapsed a single time xD
Over 3000
So when I see a word like 躱す (To dodge), that I reviewd 62 times in 4 months for a current stability of 4d... I'm like ... ok maybe I should just postpone it
One of my worst is 靖 (109 reviews, 23 lapses). I kept on confusing it with 情.
Yeah for those I really think brute force is not the key
So either you take time to really analyze it, or you just suspend it
I finally got it after noticing the issue and spending some extra effort. Now the thing that usually gets me is answering "peace" instead of "peaceful".
I think you just aren't brute forcing hard enough
bro just 500 more reviews and i will remember it bro
maybe meditate on the card for an hour
new preset: brute force. 20 learning steps and they're all 15 minutes
I'm willing to bet someone is actually doing that
Though, finding the exact user among millions of users is a problem
Holy moly
Mother of all leeches
one day
What were their stability ?
I'm waiting for dae to sign off on it before I touch code
my screenshot software crashed 😭
3 days
i mean theres more than 3 XD
精進 is the top one
yeah I need a leech detector now tbf 😅
I think we can safely say without a detector that those 3 are leeches 😂
Lapse>100 might already give you some insights 😂
As stated a few times, I'm an extremist for minimalism: I'd consider removing everything but DR
I still think there needs to be a component of looking at study time and passed intervals in a leech detector. The Poisson Binomial stuff is interesting, but I don't think it fully captures "leechness" on its own.
Based
Now let's see how Dae reacts to my issue 😅
Maybe the solution is to remove all the advanced stuff but leave the APIs for an "Advanced Mode" addon that puts them back in. Then the burden of maintaining the separate UI is offloaded to the Addon maintainers instead of core Anki / Dae.
thats something I've been thinking on how it would sort of work, like anki could give a handful of hooks into say fsrs logic and addons could use that in a variety of ways
imagine: fsrs auto-optimize addon
conflcts are your own problem 🍃
the issue with this though is that mobile users get fucked
My kingdom for a [cross platform addon system] 😂
if only apple didn't explictly forbid them
We just have to get the EU to harass them harder until they submit. I saw something about them having caved on emulators recently because of EU pressure.
see you in 5 years
Just in time for the Svelte migration to be finished! 😂
🍃
AUC is not a good metric for our optimization goal.
then let's just leave only log loss
or none, expertium wants to remove the evaluate button
It's my initial position years ago.
Expertium hopes the metric human-readable, so we have RMSE(bins).
Do you mean this page: https://github.com/open-spaced-repetition/fsrs4anki/wiki/The-Metric ?
or how about we only display RMSE non-bins and not log loss? if we don't show log loss then we don't run into this similary problem
yeah
Now you can edit it.
ok thanks
@unique salmon pretraining decay doesn't work well.
Model: FSRS-rs-dev
Total number of users: 345
Total number of reviews: 10524780
Weighted average by reviews:
FSRS-rs-dev LogLoss (mean±std): 0.3275±0.1511
FSRS-rs-dev RMSE(bins) (mean±std): 0.0479±0.0311
FSRS-rs-dev AUC (mean±std): 0.7184±0.0831
Weighted average by log(reviews):
FSRS-rs-dev LogLoss (mean±std): 0.3496±0.1618
FSRS-rs-dev RMSE(bins) (mean±std): 0.0620±0.0389
FSRS-rs-dev AUC (mean±std): 0.7078±0.0884
Weighted average by users:
FSRS-rs-dev LogLoss (mean±std): 0.3517±0.1638
FSRS-rs-dev RMSE(bins) (mean±std): 0.0640±0.0399
FSRS-rs-dev AUC (mean±std): 0.7071±0.0909
parameters: [0.2027, 1.0535, 2.8078, 15.9455, 6.9865, 0.5577, 2.2141, 0.0069, 1.5326, 0.1223, 1.0383, 1.8223, 0.1175, 0.3022, 2.2859, 0.2162, 3.0055, 0.79, 0.2611, 0.1427, 0.2029]
Model: FSRS-rs
Total number of users: 345
Total number of reviews: 10524780
Weighted average by reviews:
FSRS-rs LogLoss (mean±std): 0.3273±0.1509
FSRS-rs RMSE(bins) (mean±std): 0.0479±0.0309
FSRS-rs AUC (mean±std): 0.7187±0.0838
Weighted average by log(reviews):
FSRS-rs LogLoss (mean±std): 0.3489±0.1607
FSRS-rs RMSE(bins) (mean±std): 0.0615±0.0374
FSRS-rs AUC (mean±std): 0.7079±0.0888
Weighted average by users:
FSRS-rs LogLoss (mean±std): 0.3509±0.1624
FSRS-rs RMSE(bins) (mean±std): 0.0634±0.0382
FSRS-rs AUC (mean±std): 0.7072±0.0913
parameters: [0.216, 1.1977, 2.8019, 15.7018, 6.9865, 0.5514, 2.2311, 0.007, 1.533, 0.1272, 1.0386, 1.8204, 0.1162, 0.2988, 2.2863, 0.2181, 3.0072, 0.8048, 0.2625, 0.1379, 0.1914]
There isn't any deck in math academy. I'm using it.
#1282005522513530952 message I still personally think that "a better fit/log loss than x% of users" would be the most readable option.
Crap. Oh well
It's meaningless to compare the log loss among users because it's related to the retention.
People complaining about long intervals, meanwhile my hard deck : 50 consecutive good rating, stability 28 lol
4 reviews for my normal deck to get to 25d stability
Hard : logloss 0.4353, RMSE 4.34%
Normal : logloss 0.3579, RMSE 3.29%
Merged : logloss 0.4203, RMSE 3.39%
But funny enough, a mistake in the normal one is more sanctionned than in terms of interval (but not in reps to recover) in the normal one
...based on 16 collections
You should add AUC btw, just for the sake of consistency with the other benchmark
Actually, once your PR is merged I'll make another one just to re-write some stuff in readme
If everything goes well, the benchmark will be done tomorrow.
@quasi shadow would you move "Evaluate" to the Helper add-on if Dae was ok with it?
yeah
Then Sound won't complain 🤣
if you remove my evaluate button
Just let Jarrett retire 😂
yes
If R=50%, would be the cost of a "Good" be equal to the cost of a "Again" ?
If you mean time per review, no. And Jarrett removed that "correction", if that's what you're talking about
Ah no no in terms of optimization, to reduce logloss and RMSE
I was wondering if someone is strange enough to put as DR, 50%
What would be all the implications
Ah, yes. At R=50% logloss is the same for any grade
Could it be that the closer to 50% you are, the less precise FSRS could be then ?
For example for DR=60%, the cost of 3 or a 1 would be much more similar, so the optimization result might not be a model that target 60%, but a model "that just doesn't really care" 🤔
Of course depends on the subject, getting 50% if the questions are "Yes/No" would be different than "Type the year when this happened"
The "endgoal" question being : "Isn't because of that, that higher DR like 90-95% could just be easier to predict for FSRS than lower one like 70%"
If your true R is 50% all the time, FSRS would still do its best to adapt to predict that. So it wouldn't "not care", it has to care to accurately predict R=50% all the time
It's not like it won't be penalized. If your true R=50% all the time and FSRS predicts 40%, it would be penalized
Yep you would get very volatile parameters only if the answers would be a flip coin "Yes / No"
(By nature you'd have a R of 50%)
For the DR=95% vs 70% though
what if ur true R is at 2%
I dunno if they would be volatile
@polar maple ok man, I didn't want to bother you, but I REALLY hope you can release RWKV soon. My benchmarking article has been in the making for months and I want to finish it 😅
https://docs.google.com/forms/d/1Uy8zr9QOS6u-oLVRwVCuQfyFUiSwKxt9pWEvlTGdn9k/viewanalytics
Regarding Evaluate, things may change as I gather more responses, but so far a very strange pattern emerges: there are a lot of users who use Evaluate without understanding where the numbers come from or even what values are sane.
As of right now:
- ~60% of users use Evaluate regularly
- Only ~15% of users can give a range of log-loss/RMSE values that are good. Out of the remaining 85%, most don't know what values are reasonable at all, not even roughly
- ~88% of users don't know the math behind the metrics
- Yet only ~40% of users are confident that their Anki routine would not be negatively affected by the removal of Evaluate
That's...strange. It means that a lot of users are using Evaluate on a regular basis without knowing what values are good or how they are calculated, and those users feel like removing numbers that they don't understand would (somehow) disturb their way of using Anki.
@bold terrace thoughts?
TLDR: 85-90% users have no idea what the numbers mean or what values are good, but 60% of users use Evaluate regularly anyway
I'm not sure how to reconcile these two facts
maybe in the hard deck there doesn't exist cards that have gotten out of the leech zone, so fsrs can't learn how to predict stability for these cards
ask people what's the function of evaluate. imo some people think you should press it "for the algo" or it's just stats porn for them.
and anki is educational software, we gotta remove all porn!
ah wait, u didn't give them enough options. I personally don't fit into any of those groups.
I used evaluate a lot at one time (for presets) but have completely stopped using it.
I only use it now if a new update comes (stats porn).
I think you are still not considering that knowing exactly what the numbers mean / how they are calculated does not matter.
Knowing that number goes down = good is enough for people to:
- See that FSRS is improving over time
- Check if splitting / reorganising Presets is worth it
Knowing what range is "good" would be useful, but could be replaced with a simple traffic light Good/Ok/Bad (if we actually know what ranges are "good").
Ok, but even if knowing exactly how the numbers are calculated isn't important and only knowing the range of good values is, 85% (80% as of now, the results have changed a bit) of users don't even know the range
So we're still left with a situation where the majority of users don't know what values are good, but keep using Evaluate anyway
To be fair I don't know what ranges are technically "good". If I created a new preset and saw it had massively larger values than existing presets it still helps me know something might be wrong, even if I don't know exactly what range I should be expecting.
I like the idea of a "health check", but Evaluate is really poorly suited for that. Evaluate is like a health check that tells you "You have fatal organ failure" when it's already too late AND doesn't tell you which organs are shutting down or why
In an ideal world I do agree that they would be debugging/advanced values and we would have nice "Health Check" tools.
I've proposed detecting it based on one of parameters, but Jarrett said it's a bad idea
I'm surprised 19 has checked the formulas for RMSE, I didn't, I just asked you how to interpret it once or twice and that was it
For the rest, I align myself on this interpretation by @cursive badge
I assumed it was impossible to detect Hard misuse because you are effectively lying and we have no way of knowing the objective truth apart from what the user tells us.
Yep
However, if there is something I think could be simplified ... or even ... removed... would be to use both logloss and RMSE in the screen. You know lower is better, but what if RMSE goes down but not logloss, etc
We can kind of assume that the user is misusing Hard if FSRS decided to set their SInc(Hard) to 1 aka S doesn't increase with Hard, but again, Jarrett said it wouldn't work well
Even to this day, something I'm like "Ok now I have 0.40 logloss instead of 0.60, but I get a bigger RMSE by splitting the deck... so what do I chose ? Lower logloss ? Lower RMSE ?"
have we tried something like treating 'hard' as 'again' and checking if the metrics look better after?
Nope
Yeah agree that would be basically a constant factor of 2 and at least we could suggest the user "Hey, guess you might have better time treating your Hard as Again (or even just ignore those)"
ah shit can't ignore those
Then you break it for weirdos like me 😂
Not necessarly, the optimizer would run twice, one with Hard=Hard, one with Hard=Again, and you take the best fit
Definitely more something for the addon though ?
The best solution is to have a "I use Hard as fail" toggle
That's it
Simple and no false positives/negatives
It would require maintaining two versions of FSRS though, that sucks
I'm fucking surprised at 97% FSRS though
apparently 'Remedy Hard Misuse' just does this Hard -> Again relabelling, why isn't this just automatically done?
But what if Hard=Fail fits better if you assume that's what I meant when I did not.
Considering that this a survey about FSRS, I wouldn't take that particular % seriously
I'm really curious how much this fit "a more broad" population like the 500k people that watched the "anki introduction" where the guy still tweak SM2 in 2024
ah indeed
what
That's like making a survey asking "What's you favorite anime?" and being surprised that 95% of participants watch any anime at all
I think we have had this reaction before ;p
Well to be honest if your memory model fits better with Hard=Fail, why not use that model 😄 ?
OH you know what ?? Do you think it would be possible to have an Anki addon that would remove Hard/Easy, but would input those instead of "Good" when time used to answer is lesser or greater than certain thresholds ???
but I didn't mean Hard=Fail when I was grading, so FSRS would push all the intervals a lot shorter in trying to get me to match my DR
That would be DOPE
please no
https://expertium.github.io/Buttons.html
I hope nobody will interpret this article as “It’s ok to use review time to automatically select the answer button for the user”.
Time to answer varies not only between different people but also between different types of material. So Anki will have to estimate what time corresponds to Again-Hard-Good-Easy for this specific user and for this specific material.
average_t(Again) > average_t(Hard) > average_t(Good) > average_t(Easy) is true only for 40% of users.
There will be outliers if the user went to the toilet or got distracted by a phone call or something.
It’s WAY easier to just use self-reported grades. There are a lot of arguments about using 2 vs 4 buttons, and those arguments will likely last as long as Anki itself, but using time as a proxy for the answer button will be worse than either of those options. Using time as a proxy will work reliably only for about 40% of users, will be prone to outliers, and the exact cutoffs will have to be adjusted for each user individually and for different decks.
Compare that to just asking the user to click a button.
Force all <X sec to be "hard', all >Y to be "easy", run optimization on it, and see if FSRS fit better ?
That would be proof
In case you are confused: for example, Again > Hard > Good > Easy means “Average time for ‘Again’ is greater than the average time for ‘Hard’, which in turn is greater than the average time for ‘Good’, which in turn is greater than the average time for ‘Easy’”. But that’s too long, so I just wrote it as Again > Hard > Good > Easy.
whats your fav fsrs param
choose log loss most of the time
You would probably want to do it based on the response distribution rather than fixed thresholds. Also Anki only records total study time, not time looking at front of card which I think taints the data.
Let's run FSRS on the 40% that respect the Again > Hard > Good > Easy and change all their ratings based on their time answering
And see how well it improve their rating
their prediction*(
Also
Hard > Good > Easy is the only thing we need to take into account
Using both time and grades would be neat, but idk how to do it in practice with FSRS
I tried it once (time + number of reviews done on that day) and it didn't do shit
So either I'm dumb or it's just hard to do
Well it's true that time taken to answer, is already somewhat captured in the Retention info... So long answers already weight more on the "fail" side
And, since people didn't use themselve hard/good/easy, fitting a model on those "faked entries" in the benchmark means if they press "Good" for everything, they won't benefit from tit
So we'd need to take users that already respect that pattern
can't do that on people not using Hard/Easy consistently in the first place like me
But if an Addon was forcing those Hard/Easy, and the user was just pressing "Good", it would solve that
Going back to Evaluate
- 66% of users who use FSRS use Evaluate regularly
- Only 23% of them can give ranges of sane values
- Only 21% of them know the math (that's actually surprisingly high, I thought it will be like 2%)
- Only 30% of them believe that removing Evaluate will not be bad for them
But FSRS optimizer wouldn't have to change at all since the time info is captured in those 3 values
iirc i did small tests with LSTM and excluding duration information affected log loss by ~0.001 and treating hard = good = easy affected log loss by ~0.003
Hope crushed
Speaking of which
binary means "hard = good = easy"
I'm surprised by how not-shit it is
And better than FSRS-5 btw
I still suspect time-to-flip could be useful.
So FSRS-6 with "pretend that Hard = Good = Easy" is still better than FSRS-5
(marginally)
Maybe 4 buttons really are placebo
Another idea, another hope ! Engineering a feature that would represent how much the card front info is represented in the deck !
For example, if the front of cards is : A, AB, B, C, D, E, you'd have higher featuer for A, B, AB than for C D E
Nope, no can do
Anything that involves the content of the card is a "no"
Only soulless numbers and IDs
What do you think @polar maple 😄 ?
yeah we just don't have the info available to us
As in "you LITERALLY can't", not "Expertium is telling you it's bad"
Not in the benchmark dataset
the dream is that we get some vocabulary deck data so i can throw it at a nn to let it figure it out
use some word/sentence embedding nn to encode the card info
Would 74K reviews suffice ?
prob not
The dataset is anonymized. No text, no audio, no images. Only deck IDs, preset IDs, card IDs and note IDs
You can see all the available columns on the huggingface page: https://huggingface.co/datasets/open-spaced-repetition/anki-revlogs-10k
Sad, there are few things that could be useful while being anonymous (glossary, front, ...)
I even started noting the words I confused with other when I did
Imagine this on NN
Going back to Evaluate
- 66% of users who use FSRS use Evaluate regularly
- Only 23% of them can give ranges of sane values
- Only 21% of them know the math (that's actually surprisingly high, I thought it will be like 2%)
- Only 30% of them believe that removing Evaluate will not be bad for them
So...now what?
Until someone syncs their "memorizing people's phone numbers" deck with AnkiWeb...
Could do some anonimzation on those !
Deck1::Subdeck2::NuclearCodes
For example if you Remove "Screenshot" and "Sentence" in mine, you won't see my dirty talk
do nothing, seems like a significant portion of users get something useful out of evaluate
I'm assuming Dae does not want to hand check 10k users worth of decks for sensitive data.
Sure but those 10K are set in stone, but what about the future !
I'm sure 90% of people don't even mine their own card
make the fsrs helper addon have a button to upload a deck to allow science to be done on it
they just download a shared core deck
Have an "I donate my decks to science" setting inside Anki ;p
insidious: make the fsrs helper addon just do that with no prompting 🍃
With shared decks, infering card relationship would even be easier since we'd have huge amount of data
you'll kill your reputation but who cares when you have all that fresh real data
Didn't stop Facebook/Twitter/Google to be where they at
yeah but they ahve money
Because they didn't care about their reputation first
I mean when you control the flow of info you can just hide flows that make you look bad
¯_(ツ)_/¯
what I'm hearing is we need fsrs incorporated first, then we can use it to steal all the decks 🍃
hahah you weren't even memeing about nuclear codes
Not even in Anki, damn ... on "Chegg"
I really want WaniKani to donate their dataset to science.
They have a massive dataset and all the "cards" already have nice links showing how they are related.
Imagine DuoLingo
Never used it
But when I read that I'm like maybe it's not too late
Unfortunately the WK SRS is terrible 😦
They just use fixed intervals. Not even SM2 levels of adapting to the user 😦
Don't have to imagine: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/N8XJME
Only 13 million reviews though, it's way smaller than our 10k dataset
Excerpt
p_recall,timestamp,delta,user_id,learning_language,ui_language,lexeme_id,lexeme_string,history_seen,history_correct,session_seen,session_correct 1.0,1362076081,27649635,u:FO,de,en,76390c1350a8dac31186187e2fe1e178,lernt/lernen<vblex><pri><p3><sg>,6,4,2,2 0.5,1362076081,27649635,u:FO,de,en,7dfd7086f3671685e2cf1c1da72796d7,die/die<det><def><f><sg><nom>,4,4,2,1 1.0,1362076081,27649635,u:FO,de,en,35a54c25a2cda8127343f6a82e6f6b7d,mann/mann<n><m><sg><nom>,5,4,1,1 0.5,1362076081,27649635,u:FO,de,en,0cf63ffe3dda158bc3dbd55682b355ae,frau/frau<n><f><sg><nom>,6,5,2,1 1.0,1362076081,27649635,u:FO,de,en,84920990d78044db53c1b012f5bf9ab5,das/das<det><def><nt><sg><nom>,4,4,1,1 1.0,1362076081,27649635,u:FO,de,en,56429751fdaedb6e491f4795c770f5a4,der/der<det><def><m><sg><nom>,4,3,1,1 1.0,1362076081,27649635,u:FO,de,en,1bacf218eaaf9f944e525f7be9b31899,kind/kind<n><nt><sg><nom>,4,4,1,1 1.0,1362082032,444407,u:dDwF,es,en,73eecb492ca758ddab5371cf7b5cca32,bajo/bajo<pr>,3,3,1,1 1.0,1362082044,5963,u:FO,de,en,76390c1350a8dac31186187e2fe1e178,lernt/lernen<vblex><pri><p3><sg>,8,6,6,6 0.75,1362082044,5963,u:FO,de,en,7dfd7086f3671685e2cf1c1da72796d7,die/die<det><def><f><sg><nom>,6,5,4,3 0.888888888889,1362082044,5963,u:FO,de,en,35a54c25a2cda8127343f6a82e6f6b7d,mann/mann<n><m><sg><nom>,6,5,9,8 0.8,1362082044,5963,u:FO,de,en,0cf63ffe3dda158bc3dbd55682b355ae,frau/frau<n><f><sg><nom>,8,6,5,4 0.8,1362082044,5963,u:FO,de,en,84920990d78044db53c1b012f5bf9ab5,das/das<det><def><nt><sg><nom>,5,5,5,4 1.0,1362082044,5963,u:FO,de,en,56429751fdaedb6e491f4795c770f5a4,der/der<det><def><m><sg><nom>,5,4,5,5 1.0,1362082044,5963,u:FO,de,en,1bacf218eaaf9f944e525f7be9b31899,kind/kind<n><nt><sg><nom>,5,5,3,3 1.0,1362082130,77,u:dDwF,es,en,73eecb492ca758ddab5371cf7b5cca32,bajo/bajo<pr>,5,5,1,1 0.0,1362082194,150,u:FO,de,en,84920990d78044db53c1b012f5bf9ab5,das/das<det><def><nt><sg><nom>,10,9,1,0 1.0,1362082194,150,u:FO,de,en,35a54c25a2cda8127343f6a82e6f6b7d,mann/mann<n><m><sg><nom>,15,13,1,1
No idea what some of these mean, but whatever
what could p_recall be?
No, that's the easiest one 🤣
it's not 0/1
did they measure over a session or over a set of users or something
for the same item
or did they include their HLR predictions into the dataset itself
Per session, it seems
No, based on the last two column names
0.888888888889,1362082044,5963,u:FO,de,en,35a54c25a2cda8127343f6a82e6f6b7d,mann/mann<n><m><sg><nom>,6,5,9,8
If session_seen=9 and session_correct=8, that gives us 0.888888888889
So yeah, checks out
yeah this isn't usable for us
Btw, I find it funny that Duolingo reports lower AUC on their own dataset than we on our
https://github.com/open-spaced-repetition/srs-benchmark
HLR 3 0.41±0.012 0.105±0.0030 0.633±0.0050
https://github.com/duolingo/halflife-regression/blob/master/settles.acl16.pdf
I'm not joking when I'm telling Jarrett to contact Duolingo and just straight up tell them "HLR sucks, use FSRS instead"
Is there enough outcome reporting to make that claim convincing?
Considering that we have a dataset with ~700 million reviews and Duolingo thought that 13 million reviews was good enough for their paper - yes
Review count and external testing results might not necessarily correlate with each other
They will hire him and forbid him from contributing on FSRS anymore 😦
Well, maybe right now US-China relationship are not that great for international hiring though
50.1% wow
Should I remove L2 regularization when changing the default value?
😅I guess we don't need to check the distribution.
After I change the default value of w[11] from 1.8 to 4.0, the median of optimized value of w[11] is 3.77.
Notice that median values of w[12] and w[13] aer also changed significantly.
When w[12] increases, S_fail decreases.
When w[13] decreases, S_fail decreases, too.
When w[11] increases, S_fail increases.
So, in some degree, the changes of w[12] and w[13] compensate the change of w[11].
Their paper has a several problem.
They uses the in-day correct rate of a word as the P(recall).
It assumes the trials of the same word in the session are iid but it's not true.
This one is also problematic when p = 0 or 1.
😅 I don't know whether they were aware of these problems. But the paper was accepted. That's why I thought the peer review makes nonsense when the peer knows nothing about the niche domain.
Note that to prevent computational overflow and under-
flow errors, we bound ˆp_Θ ∈ [0.0001, 0.9999] and
ˆh_Θ ∈ [15 min, 9 months] in practice.
(fwiw, in A.3)
Yeah, they were aware of that. But the solution was werid...
For testing the distribution issue, yez
the result is similiar
Good work FSRS friends ❤️
I thought Jarrett will stop at FSRS-5 lol.
Lunar new year is long gone, Jarrett is still here.
Can't wait for it to reach Anki 🥲 Good job
FSRS-5 has some severe problems in the formula of the same-day stability, so I have to fix it anyway.
😅 That's why I planned to release FSRS-5.5.
But I accepted more improvement ideas so we have FSRS-6.
@cosmic hedge I have a problem.
After I modify the Easy Days Config in the Options screen, the Easy Days Config in the Simulator screen doesn't keep sync with it.
try this
diff --git a/ts/routes/deck-options/SimulatorModal.svelte b/ts/routes/deck-options/SimulatorModal.svelte
index 1b587c985..afd0d1eb2 100644
--- a/ts/routes/deck-options/SimulatorModal.svelte
+++ b/ts/routes/deck-options/SimulatorModal.svelte
@@ -178,7 +178,7 @@ License: GNU AGPL, version 3 or later; http://www.gnu.org/licenses/agpl.html
);
}
- let easyDayPercentages = [...$config.easyDaysPercentages];
+ $: easyDayPercentages = [...$config.easyDaysPercentages];
</script>
<div class="modal" class:show={shown} class:d-block={shown} tabindex="-1">
I remember removing this on purpose for some reason
https://github.com/ankitects/anki/pull/3837/commits/8086edca5e19f8d02cf97072d4c3453142fd2bdd
I think it might have been save to preset options. Maybe it was that I used a subscribe for some reason. idk so long as it works. 😂
dae still has to review and merge it
noooooo
But dae takes 10 working days to respond to smth
and 10 more days to make a new build
Could someone bring this to daes attention.
My colleagues have finished the refactoring of our App's scheduling module recently, so I will take over the rest of work (refactoring the long-term scheduling algorithm). So I won't have time to improve FSRS in the next several months.
Btw, other than CMRR, there's this: https://forums.ankiweb.net/t/desired-retention-ui-overhaul/57678/33?u=expertium
But it hasn't got an explicit ok from Dae
And there's also this: https://forums.ankiweb.net/t/ideas-to-make-deck-preset-interactions-more-clear/58773/5?u=expertium
Which also hasn't got an explicit ok from Dae

Ok, how about an idea suggested by Brayan: answer buttons that show interval lengths The interval lengths above answer buttons would change instantly when desired retention is changed More from Brayan: put the fsrs parameters at the bottom of the FSRS section and add some title to the “query input” (idk what is called the form below...
The realest answer (from my survey on Evaluate)
Btw, results: https://docs.google.com/forms/d/1Uy8zr9QOS6u-oLVRwVCuQfyFUiSwKxt9pWEvlTGdn9k/viewanalytics
- 53% of users use Evaluate regularly
- Only 15% can give a range of reasonable values
- Only 14% know the mathematical formulas used to calculate the values
- 33% believe that removing Evaluate would have a negative impact on their Anki routine, 30% are unsure, and 37% believe it wouldn't have a negative impact
also question
why doesnt hard count as a good for first time cards?
as fulfilling one of the learning steps
curious why it was that way
Because learning steps are shit
alright
I've said this many times - the whole thing with learning steps shouldn't exist in the first place
how should it work then
It's a mess
Just the same algorithm for all intervals, from minutes to years
i see
well
doesnt that mean when u learn new things u have to be on anki the whole day
if u learn it in the morning
does retention decrease equally
e.g. over 8 waking hours vs 8 sleeping hours
That's a very good question. I don't know 🤷♂️
no and the reason is
sleep is when memory consolidation begins (oversimplified)
boom cooked
unfortunately i cannt bring myself to do new cards at the end of the day 💔
billions must nap
@unique salmon after starting a new deck
with lets say 30 new cards a day
when would you recommend to start the first optimization
i use default fsrs parameters
havent optimized yet
Whenever you want, really
https://www.reddit.com/r/Anki/comments/1k23tvn/atrocious_true_retention/
I also haven't optimized FSRS yet because I didn't know this was an option.
😭
I swear, if FSRS only had one toggle, people would still find ways to not use it properly
So this guy didn't realize that optimization is a thing AND he also didn't realize that he can control interval lengths by adjusting desired retention
maybe the problem is that people have to hit secret hidden buttons
secret
a giant-ass blue button in the middle of the screen
its in a corner with like 50 other things to also look at
Like, I can see not realizing that DR affects interval lengths if you have never changed DR, but not realizing that optimization is a thing...
We need an interactive tutorial so bad, man
no one would use it
or, the type of person who would is also the type to not have this problem in the first place
whats needed is basically an exam when you open anki the first time, you gotta answer a bunch of questions that shows you read the manual
only then can you use the program
have fun implementing such a thing
Hi, guys, how many new cards can i learn every day, which is the upper limit of human cognition?
That's a difficult question, and there aren't a lot of good estimates
https://supermemo.guru/wiki/How_much_knowledge_can_human_brain_hold
The upper limit of knowledge for a human brain may amount to 300,000 stable items of knowledge as consolidated in spaced repetition
Items = cards in Anki
Over 50 years, that's 6000 cards per year aka ~16 new cards per day