#FSRS Megathread
1 messages Β· Page 4 of 1
Ah I see π
I did a lot of Bunpro for grammar
but I stopped after reaching N1, because most points are more vocabulary than grammar now
There is a surprising amount of overlap between "it's a grammar point" and "it's a vocab"
Like, is stuff like γ«γγ£γ¦ a vocab or a grammar point?
Yeah that one I can understand
but at some point it was really like "Put γγγͺγ to mean it's sudden"
And a lot of things I was training didn't really occurs in any material I checked
γ«ε γγ¦γγθΎΌγγ¦... feels more like vocab to me
*shakes fist in general direction of Japan*
I find myself sometimes going too fast and getting meanings wrong. Then I take a harder look, remember the reading and go "of course!"
I really wish WK would export their warning list via the API...
I can understand though. They don't want to make it too easy to steal their secret sauce.
It would be really interesting if we could have access to all their review records. Then you could start doing some Kanji-specific SRS optimisation.
It's literally just a list of "yes this is right but not what we're asking for", "this is a common typo" and other cases where they give you a second chance when entering them
I implemented that to a limited degree myself
where if you type a correct reading of a Kanji, but not the one WK asks for, it will flash the input yellow and you can reconsider
But they had to pay people to curate that data. We are quite lucky that give us access to structured data at all.
Well, to use the API, you require a paid account
I know. I have made my own script that downloads the data and generates notes/cards.
They actually took down the WaniKani deck from AnkiWeb
which is fair, it's literally piracy imo
I got a lifetime sub. So I feel happy downloading the data.
Writing an Anki-AddOn that sync it for that one user is fair game
but then sharing that deck publicly is not
I keep meaning to improve my cards, but I got it good enough and keep getting distracted by shiny new projects.
I just took the old WK3 decks templates and slightly tweaked them
https://ankiweb.net/shared/info/391275087 if you want to see how it looks
I wish their SRS was not so bad. With their dataset you could probably get into fancy domain-specific SRS stuff like Math Academy. Instead they just do fixed intervals π¦
Yeah, but it'll be very hard to analyse
given they treat meaning+reading as separate but not seperate things
if you can systematically differentiate these then you can just make more presets
I wouldn't know how to possibly do that
I'd have to manually go through over 18000 cards and classify them
Why not press easy on them?
Interestingly, if you zoom enough on the 100% difficluty spike, you get something that look like a normal distribution
This was [90%,100%] with 100 steps
can we have tooltips (help text like in deck options) for stats page? I think it'll be helpful.
especially for the new fsrs related stats which are somewhat complex imo
expertium did bring this up before but nothing transpired after it
for now, the writings can be just copy pasted from the manual
@quasi shadow
bruh...
materialists be like, "gravity is also physical, it's made of 'virtual' particles like gravitons"
one day they will say consciousness is made up of virtual particles π€£
sorata, I mean this in the nicest way possible: stick to crunching numbers. No philosophy.
bro u should meditate. crunching numbers has destroyed your psyche.
Since D is the hot topic these days, I decided to get back to trying to improve D
First, I tried a very simple approach:
` def surprise_f(self, r: Tensor, binary_rating: Tensor):
r = r.clamp(0.0001, 0.9999)
surprise = -torch.log(1 - torch.abs(r - binary_rating))
return surprise.clamp(0, 100)
def next_d(self, old_d: Tensor, r: Tensor, rating: Tensor) -> Tensor:
binary_rating = torch.where(rating > 1, torch.ones_like(rating), torch.zeros_like(rating))
delta_d = -self.w[6] * (rating - 3) * self.surprise_f(r, binary_rating)
new_d = old_d + self.linear_damping(delta_d, old_d)
new_d = self.mean_reversion(self.init_d(4), new_d)
return new_d`
Here we multiply delta_d by a surprise factor=-ln(1-abs(R - grade)), where grade is binary. The bigger the difference between R (prediction) and binary grade (reality), the bigger the surprise factor.
As you can see in the image, it didn't help. Next I'll try completely re-defining D.
couldnt find in manual
what is the dotted lining supposed to represent
and whys it there
desired retention
Yep
i see
no lapses in this card, 93% desired retention
but these are its intervals
surely this cannot be right
Cause it's exceptionally rare that I'm so confident about an answer that I'd give it an "Easy".
It happens sometimes, but only for some few very basic words and kanji that I'm 100% confident I won't forget in the 3+ years hitting Easy will push it into the future.
Same same
So many cases where you were "Oh this one I'll press easy" then you realize you got it wrong for some reasons π
Also, the few .5-.1s to think "Was it easy ?" is like ~10-20% of my avg review time (~5s)
So yeah, easy is really more like card well known before Anki
that's why you should use a binary grading system of pass/fail, it's way less cognitive load
I don't see how pass/fail would work
I could do without the Easy button, but a lot of cards are Hard
and if the Hard button was gone, I'd probably press Again on them instead of passing them
Also, if "Hard" means "Took me longer than expected", that's the kind of things that with AI you can detect without the user having to press "Hard"
The answer time is absolutely useless to judge anything by
Unless you install surveilance cameras around the PC, and feed that into some predictive network, there can be so many other reasons a card took long to answer...
There's also plenty of other reasons that something was Hard, other than time
It is !
And I'm the kind of guy to alt tab
real ones know 1234 = goated
Anything that felt hard
Can be that it took me a lot of thinking to piece it back together from the mnemonics, or having almost confused it with a very loosely similar word
Really anything that makes me feel like this wasn't "Good"
Another example would be getting the meaning right, but thinking the most uncommon nuance of the vocab
In that case I often burry the card and then rate it hard the next day if I get it right then, or Fail it if I still don't get the nuance
That's also the beauty with neural networks, you give them all those values, if there is a trend, it will train to recognize it, if not, it won't
what if the neural network goes rogue
and just fucks up my deck so i fail my exams
yea checkmate
Well then don't use Anki in case it fuck up your collection, back to paper
i dont trust paper..
NN doesn't just go rogue, they minimize a goal function
Anki so far behaves very predictable. Adding some neural network to do "something" would make it more or less a random unpredictable rollercoaster
if the goal function is difference between prediction and outcome, if the difference is minimal, it can't be "THAT" bad
Neural Networks have a tendency to be incredibly volatile and hard/impossible to understand and debug. So no thanks.
It COULD be THAT bad
I'm assuming you're studying Japanese, in the case of vocab cards you would just:
- look at the kanji
- recall the reading (if you miss the reading, you need to fail the card)
- recall picture or definition (but if you have a strong gut feeling when seeing it, you should immediately pass the card)
The NN could conclude that it can get your desired retention to the set value by just showing you the same 10 cards every day, and the others never.
@polar maple I hear NN slander π€£
If you design the goal function as a the daily desired retention yes, but then it's your goal function that is wrong
If your goal function is to reduce the distance by reviews, it's OK
BTW
FSRS is not NN but it's how it's working right now
It minimize a cost function
Sooo, stop using FSRS
And do your how mind gymnastic to evaluate what you think is the best interval
Meaning and Reading are seperate cards
But the big advantage in both, is that FSRS/NN are aimed to REDUCE your cognitive load
It's WaniKani
And the why is pretty simple, cause they're different things to learn.
I'm doing WaniKani, just in Anki.
Why not using WK in WK ?
Cause their website and SRS sucks
It's not that slow, I think you can finish on their site in 1-1.5 years if you always stay on top of the reviews
But that's quite an intense workload
It'll have taken me ~2.5 years now when I'm done in mid-April
"done" in the sense of no more new or young cards
Live with the monks somewhere in the mountains
Mine goes up again at the end
WK has 6609 Vocabs and 2080 Kanji. Though the Vocab are primarily a reinforcement-tool for the Kanji, and learning them themselves is more a bonus.
what in the hard usageee
π
i only press hard around 4% of the time
I know I'm weird, but it's working. π€·ββοΈ
Which also would be fine with NN since it has a lot of parameters to bend the predictions π
I could randomly throw them into some deck, but I highly doubt I could learn them nearly as well by "just" doing that
While with the WK method, I'm pretty confident about the vast majority of the Kanji
WK just does an excellent job of giving you and reinforcing tools to be able to recognize even an "aged" Kanji, and it works exceptionally well
And I simply don't consider language learning any kind of rush or race. I can already read the vast majority of stuff, and am fairly confident about it.
So why would I go hard on trying to hyper-optimize it?
I don't see what's "in isolation" about it
The Vocab are somewhat, but WK very clearly says that it uses the Vocab to give you context for the Kanji
It's then your job to find context for the Vocab :D
That seems like it'd be horribly overloaded
way too much stuff on one card
People tend to forget other were perfectly learning languages before SRS apps existed
It's a complimentary tool
The appeal is having your little sandbox with little graphs
Learning Kanji without some kind of SRS system seems borderline impossible to me
I think it's what Japanese kids in school have been using since ages, just manually
Believing is not possible is often the first step into making it not possible
For example when I started learning english I had no internet, and no SRS knowledge or anything
I just looked up words in a dictionnary book
took me ages
but got it eventually
So yeah, in my case I see Anki as very nice supplement
but not like some kind of requirement
But guess what ... We think "Internet + Easy Tools to learn/review", so can only mean better learning right ?
Except now you also have "Constantly getting notified for random anonymous people talking to you online, getting "recommendation" for a new video, switching core decks every 6 months"
I'm also culprit of it but right now I have still 200 reviews to do today that should take me ~20min, but guess what, I'm losing there here, discussing "optimal way of doing it"
So no offense, it's also a self-reflection criticize
We need to make an addon that locks you out of discord until you have done all your reviews ;p
I guess ;D
Point is : NOTHING beat hard work and true effort. But we always play pretend by pretending trying to "optimize"
but I guess I'm off topic now
I was wondering, anyone could explain how to read a "B-W Matrix" ?
I searched a bit online but I'm not really sure I found the right info
It's under the "Memorised" graph
hover over the cells
I do I do
For example :
"Predicted 71.81, Actual 66 (Prediction at 71.81 ?) compared to a total of 84 prediction ?"
So it predicted 71.81 less than it should have ?
What does it describe ?
its for all the reviews that are done with that stability and difficulty
hold on i'll find the thing i copied XD
#1282005522513530952 message
Ah, x-axis is difficulty and y-axis stability
it means that fsrs is underestimating how well you know cards with that difficulty and stability by 13%
oh yeah i should really label that XD
yeaaaah with the axis explained now it makes more sense π
So yeah for example in my case, for Stabily around 7d, for Difficulty at 90%, it is over estimating for 7%
yeah
@unique salmon if we can determine that, even if it has only a very low impact on global RMSE, why not mitigate that by a malus for those card retention ?
Ok global RMSE won't change much
But it's not going to hurt to do a ~2-3% malus on that, even if we over estimate it for a few reviews, worst case scenario taht -6.9 will just go up until the "reverse mitigation" kicks in
I mean, I'm all for crunching numbers, but RMSE is just one part of the story in this case, no ?
I'm not sure what would be a good way to do that. And right now I want Jarrett to test something else: https://github.com/open-spaced-repetition/srs-benchmark/issues/186
Sure no worries
But do you more or less agree with the fact that, if RMSE itself can't be reduced that much anymore, having some compensation techniques for specific problematic class would be a nice way forward ?
I mean, we can't just add or subtract stuff from the forgetting curve, that would cause all sorts of issues
We should aim to improve FSRS formulas
Sure, if it's doable, I'm all for it π
Or just make a neural net that is far more accurate than FSRS Β―_(γ)_/Β―
That might also explain why though
Technically, we have one already
With enough parameters all those problematic classes of cards could be customized
But a one-shot pre-training might not be enough then
Who knows if someone else might have a different class of problematic cards
but I think @polar maple said it would be possible to modify the weight incrementally with new reviews
We'll wait for Alex's RWKV. It seems like it really does Just Work
As in - pretraining only beats the hell out of FSRS
Exciting stuff
If you search guinea pig, you know where I am
In the mean time ...
"deck:Japan::1. Vocabulary" prop:s>5 prop:s<7 prop:d>0.9 prop:r<.90 -is:due
My good old Filtered Deck will go "brrrrrrr" π
Poor's man "AI"
This is like a whole new level of tweaking...
Holy tweaking Batman
Yeaaah half my review are coming from those
That's also a bit why sometimes I say I don't think FSRS should be the only one to "find solution"
I mean, Anki could have different services, a prediction services which would be FSRS or RWKV
On top of those 2, you could then have some anomaly detection service, card interference service ...
So instead of having an ever-growing equation for FSRS, or having nothing left to be able to interact with RWKV, you could extend certain capabilities
you're free to use sm2 π€·ββοΈ
Yes ! But SM2 / FSRS still have really different paradigms
SM2 doesn't really predict for example
So you need to have a very clear responsability separations between those different capabilities
also i'd like to clarify that the b-w matrix is for reviews not cards so if your cards have changed stabilises or difficulties since that review then the search wont reallly find all those cards
i mean it still works well enough
Gotcha
But I think it's fine because then, I'm assuming that the difference for S=7 D=9 is high enough that potentially the current one might have issues
Well, we have the load balancer/fuzz and Easy Days
So yeah, we could add extra stuff on top
I have an idea for leech detection, but that requires storing DR at each review in Card Info
Actually, no, not DR. It would require storing R at each review
Wouldn't R be always more or less equal to the DR ?
Not always and not exactly
The DR I can understand, you want to check compared to an expected baseline
But maybe you should explicit your idea instead of us trying to guess what it is π
Uh, no, you can't
Not without some really weird conversion mechanism
That's what I meant by "weird conversion mechanism"
I'm not 100% sure myself. I want to use a series of reviews to calculate the probability that a card would be failed k times out of n total reviews, each with it's own R_i probability of recall. But I don't know how to do this exactly
I think I get the idea, and I think it's also a nice addition
I'll have to Google/try to do the math myself
If your doing reviews each time with a R=80% but you get them wrong 90% of the time, it's a bit strange
Yeah
It might also be more reactive than my previous idea of looking at the full history
Btw, this could also be used to find anti-leeches: cards that are so easy you almost wonder why you are even reviewing them
I mean, if you fail 3 times in a row a 90%, you can react to that more quickly than checking if those 3 fails happened in 30 reviews lapse of time
I mean, if you have 10% chance of getting something wrong, getting it 3-times in a row is a 0.1% chance
Yes, but I want to extend that to fails that didn't happen in a row
And all had different p(recall)
That gets complicated
Yes
Not good enough at stats to remember how to do that by heart haha
probabilities*
In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is conducted. It is named after Jacob Bernoulli, a 17th-century Swiss mathematician, who analyzed them in ...
But it's for fixed p
Where every trial has the same probability of success
Oh, here we go
https://en.wikipedia.org/wiki/Poisson_binomial_distribution
In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after SimΓ©on Denis Poisson.
In other words, it is the probability distribution of the
number of successes in a collectio...
Alright, I found a package that does this
https://github.com/tsakim/poibin
I am not even going to try to understand the math with complex numbers, but the usage is actually fairly simple. You just give it a list of probabilities for each trial and the number of successes, and then you can calculate the probability of a given number of successes.
Example:
p = np.asarray([0.9, 0.85, 0.95, 0.92, 0.87]) n_succ = 2
This gives me a p-value of 0.836%. So if a card has been reviewed 5 times with these probabilities (note that the order doesnβt matter) there is a 0.836% chance that 2 or fewer reviews will be successful.
@polar maple @quasi shadow Here's the code and an example of usage
EDIT: see an updated example #1282005522513530952 message
So now we can make an automated leech detector
(as long as we figure out how to port this to Rust)
(and if Anki stores R at the time of each review in Card Info, otherwise we can't do this. Actually, maybe we could do the same trick of re-calculating R that we do for the forgetting curve)
Yeah we could recompute and store it retroactively if necessary I think
Given the cardβs history, we can either store or re-calculate the probability of recall predicted by FSRS, and then use the Poisson binomial distribution to calculate the probability of a given number of successes. I am not even going to try to understand the math with complex numbers, but the usage is actually fairly simple. You just give...
Good job though
There is another issue, actually: with this function the p-value is always 0 if the number of successes is 0
So if the number of successes is 0, we have to do something else
I guess realistically it doesn't matter since cases where a user has never pressed Hard/Good/Easy would be extraordinarily rare.
If it's 0 success, normally it would not even be leaving the learning phase right ?
Not if "Set Due Date" is used though
Is it not a bug of the function/lib though ? If you have 1 fail 0 success at 80% DR, it's strange it's considered as a leech
Same-day reviews don't count
I think it's less of a bug and more just a limitation of the formula
There's always the "lazy" way to only consider cards with >= N day of review
like N=3
You mean N reviews?
There is an issue with that. For example, if you have 3 reviews at 70% p(recall), the probability of failing all 3 is 2.7%, not low enough. At 90% it would be sufficient, at 70% - nope
Isn't it what you would also expect from that algo ?
I mean, failing 3 times if it's lower DR is more expected than failing it 3 times with higher DR
And at least here you know exactly why, there's a clear formula
I mean that applyign this formula after a fixed number of reviews doesn't work equally well for all cards
And all DRs
What is sufficient to identify a leech at DR=90% is not sufficient at DR=70%
aaah sure
But I mean the idea of the >= N is just to have handle the edge case
if the N is too low to be possible ,it's not that much an issue
in general in programming you make sure your edge cases are treated (denominator different null, sqrt positive...) but then if a value is not possible with certain other parameters, you don't necessarly over complexify the code (except if, you can really change the algorithm complexity, but in this case, it's not really worth it)
Ok, hold on, something doesn't add up here, I'm investigating it
I'm running simulations to confirm that the function works, and it's off
Or, rather, the simulations do match the output of the function, but for the other number of successes...
Ok, I have no idea why the p-value in this function is calculated the way it is, I'll have to actually use my brain to figure out how to get what I need
Ok, so the way they calculate the p-value is weird plus there is the whole "exactly n successes" vs "n successes or fewer successes" thing. Ok, I got it all figured out now
@polar maple @quasi shadow Here's an updated usage example
Also, I asked Claude 3.7 Thinking to re-write it in Rust and remove the calculation of p-values (I calculate them from the PMF) and CDF, leaving only PMF. Idk if it's any good, but so far Claude 3.7 Thinking has been really freaking good, at least for Python.
https://drive.google.com/file/d/10QaOXwyh8F58wRTlGizOUc0VOaEOqBIy/view?usp=sharing
See this
Also, I was initially thinking of using 0.1% as a cutoff, but it seems like that's too conservative, let's use 1%
Oh, and let's limit it to N reviews >=2, in other words, if a card has only been reviewed once, let's not tag is as a leech
But yeah, we can identify leeches at very high DR with merely 2 reviews, that's cool as hell!
I agree it's really nice
Even if we can't really schedule it differently, at least we have something to flag those more precisely
The whole "Flag leeches after X lapses" was not correct for FSRS, not at least there is a nice alternative
At least, someone has to implement it haha
You take comissions @ashen light ? π
Also, it seems like identifying anti-leeches may not be viable. With 10 reviews at 90% R, there is a 34.9% chance of every single review being successful. Even with 10 reviews at 70% R, there is still a 2.8% chance of all of them being successful.
Then again, it's also arguably less important
Now the big question is: do we want a "Recalculate leeches" button if automatic leech detection is enabled? π€
Since changing FSRS parameters will change R, which in turn can change whether the card counts as a leech or no
Yes I would expect those card to grow stability very quickly
Hmm if R is stored at-review, and we don't touch it anymore, then I guess normally it should not be that necessary ?
Right now it's not stored, but recalculated (for the forgetting curve)
In your proof-of-concept, you consider all the history of the reviews or only a certain amount ? In both case, I guess a card marked as "leech" could leave that state in theory, with new good reviews ?
And recalculating would make it more accurate, so that's another reason to recalculate leeches too
It could. Also, we can limit the number of recent reviews used for the calculation
Say, last 32 or last 64 reviews
Not that it would matter for most cards
Yeah I was thinking, if a card let say with 10 Reviews, is counted as a leech, because let say there was 9 fail. Then, the user start to review it "more normally", with a success rate of around 90% (The DR). I guess, with the formula, it would slowly increase the p-value until it goes above the threshold
But of course, if something is a leech with 200 reviews, the user would have to review it a lot I guess before it leaves the Leeches state (which could be logical since well, it was a leech for so much time)
So limiting the window is maybe not a game changer
We should also probably add a rule that a change in the leech/not a leech status should occur no more frequently than once per 3 reviews, in case some cards are very close to the threshold all the time
So if card has been tagged as a leech, it needs at least 3 more reviews before it can be untagged
Which will likely be annoying to implement
have u tested the idea itself?
Lol, no
https://forums.ankiweb.net/t/automated-leech-detection/56887
I'm just waiting for Jarrett, ain't no way I'm writing Rust code
Given the cardβs history, we can either store or re-calculate the probability of recall predicted by FSRS, and then use the Poisson binomial distribution to calculate the probability of a given number of successes. I am not even going to try to understand the math with complex numbers, but the usage is actually fairly simple. You just give...
wdym complex numbers
you can do it with a couple for loops
unless you are crunching large inputs and want to use FFT but we don't need it for anki
The package uses FFT
And yeah, for small n you can calculate the probabilities exactly, but I don't want to mess with it
And we do want it to be fast for large n, for cards with a lot of reviews
for FFT to start to benefit i would expect that you need at least n = 10^4 or which just isn't happening
theres a simple O(n^2) way to compute this that can be written in like 5 lines of code
Here's a new example file
For n<=6 combinatorics are faster (thank Claude 3.7, lol, it has been saving my ass so many times). For n>6 FFT is faster
n=20, 10 successes
the only conclusion to draw is that claude didn't write good code
its not a maybe lol this is simpe computing
FFT has a larger constant overhead
6^2 can be done in tight for loops very quickly
Ok, I asked it to speed it up
Now combinatorics are faster for around n=40, then FFT is faster
Tbf, both are way under 1 second for n=64
try this
def poisson_binomial_pmf(p, k=None):
p = np.asarray(p)
n = len(p)
pmf = np.zeros(n + 1)
pmf[0] = 1
npmf = np.zeros(n + 1)
for i in range(n):
for j in range(n + 1):
npmf[j] = 0
for j in range(n):
npmf[j] += pmf[j] * (1 - p[i])
for j in range(1, n+1):
npmf[j] += pmf[j - 1] * p[i]
pmf, npmf = npmf, pmf
return pmf
lol what your computer isn't 1000x slower than mine
?
Here's the code
Meanwhile I tried to completely redefine D in terms of R minus binary grade. I won't go into the details because it sucks anyway, even if I let it run for 10 epochs instead of 5 like normally
So now I will work on implementing decay based on D...or at least try to
I wrote the same version of my code in C++ because python loops are slow. When i read the updated claude code it is pretty much equivalent to my code but writes more of it in python compared to numpy, i was considering doing a similar thing but luckily i read the claude code first. The c++ performance is more in line with what I expect with such small and tight iteration. For the C++ took the average of n = 100 to n = 200. Doing n = 40 alone took 2.9 microseconds.
milli: 0.01059
micro: 10.58614
Rust would prob get similar performance as C++ here
so FFT not needed
surely you can do it yourself π
can you commission me to do a study on the difference between truely forgetting and "was it A or B?"
There are, like, 2 people on this planet who can implement a leech tagger in Anki. One of them is you, the other one is Jarrett π
wait actually?
surely it isn't that hard
I mean I showed up like a handful of months ago and did shit its not like I did anything particuarly difficult
Unless there is some random Rust enjoyer who never comments but always reads everything in this channel/on the forums and is currently reading this...yes, actually
I mean surely someone here can just do it
like honestly, I would not say I have particuarly deep knowledge
I mean I showed up one day not touching anki in a decade and made a PR, its actually not that big a thing. someone else can just do something similar
even you could π
I don't know Rust, man
its not hard
I wouldn't even be able to make a PR in Python
you know python
Apps have their own App Python
It's like legalese
But for apps
Same goes for any other language tbh
here this is what you need
I mean, syntax stuff is the least interestng part of languages, just fix what the compiler complains about
Lol
me and another unnamed person were speculating on your actual usage
π
anyway
go hit that rust deck
then become a valued anki contributor
so is your background just in math (or stats) then?
If by "background" you mean "reading articles on the Internet and watching YouTube", then yes
oh if thats your background then you're fully equipped to read/watch the internet but with rust instead of math π
Wouldn't that be far too aggressive?
It seems to me that this method only works if you have a good model of the probability of recalling the card.
At only two reviews FSRS has not been given much of a chance to tweak D and S so its prediction might not be great.
You might get a lot of false positives for cards where Dβ and Sβ are not a good fit.
I think we can see how many false positives we get in the 20k dataset
comparing what the method gives us with a few first reviews vs with all the reviews
We can limit it to start only after the 3rd/4th/nth review
I'm not sure how you would do that without user feedback. How would you know whether the user considers this card a leech or not?
Well, considering that neither Jarrett nor Jake seem to be interested in implementing it anyway, I'm not sure if there is a reason to discuss it
Algo itself probably not, but I already apprehend building anki from sources locally on my mac π
But it's never too late to try
Also complex thing with jumping into codebase like this, is to know where to best put those logic, and being careful of different entrypoints you might not have expected
But unfortunately it's not something you learn until you break that software π
I haven't taken a look. The messages flooded me again.π
Hopefully, you'll take a look at the forum post I made once you have some time
my first post on forums was about using difficulty for leech detection
well that didn't work out and I didn't expect this to progress any further
a year later, seem like things will move forward
maybe it'll take a few years to get there
It'll take either two weeks or an infinite number years, nothing inbetween π€£
Seriously though, it's just math + doing the tagging. I don't see any major roadblocks. The only issue is whether there are people who are willing to implement it. I assume Dae will have no objections.
So it's either:
- There are people who want to implement it -> it gets done by the next Anki release
Or: - Nobody wants to implement it -> the idea dies in obscurity
It's like Easy Days - it could have been implemented literally years ago, it's just that it relies on one guy, Jake, to do the work
It's not like there were any new developments that suddenly made Easy Days possible
If we had a clone of Jake, he could've implemented Easy Days, like, 5 years ago π€£
well you're too hopeful if you expect it to easily work out. I imagine it'll take some amount of polishing to be actually good.
but yeah agree with u on other points.
Not really
- Use 1% as the cutoff for the leech tagger.
- Use the leech detector only if there are at least 3 reviews, to avoid early false positives.
- Make it so that a card's status as leech/not leech can only change once every 3 reviews, to avoid "zig-zagging" where it's a leech after one review, then not a leech after the next review, then a leech again, then not a leech again. Such cases would be rare, but we should still consider them.
- Add a "Automatic leech detection" button.
The only part that is debatable is re-calculating leeches. Should it be a separate button? Should it be combined with "Optimize", so that leeches are automatically recalculated when FSRS parameters change?
well, the metric itself might need some polishing is what I was trying to say.
but ofc no use debating over this
imo leeches should be recalculated yeah
and no more options please
There's not a whole lot of polishing to do, it's just a bunch of math. The only "polishing" that I can think of is the choice of the cutoff %
@polar maple I tested it again and FFT is somehow slower even for n=20000 π€
π
Ok, I genuinely don't understand what this guy was cooking
FFT uses an enormous amount of RAM for n>10000
It's just strictly worse than combinatorics
If I did the maths right the "Direct Convolution" algorithm should only need ~800KB of memory for n=50,000.
Pretty good memory efficiency compared to 74.5GB π
tried fft on c++
Now if you could do it in Rust...
nah no card has this many reviews
No, I mean just implement it at all
it turns out that we can write it with only 1 array so it would take 400 KB, even better
i got the same excuse as you, i don't know how to write rust
dang
all this talk of me writing a feature and I still don't even know what the feature actually is
I literally just showed up and picked as spot for the load balancer
- Take all probabilities of recall over the card's history
- Plug them into The Mathematizer 9000
- It returns a bunch of probabilities for every possible outcome, like 0 successes, 1 success, 2 successes...
nsuccesses, wherenis the number of reviews aka length of the array with probabilities, without the first review - Check how many successes (
k) the card actually has - Sum the first k probabilities to find
p(successes<=k) - If it's <1%, tag the card as a leech
Basically, we check how likely it is that a card would be successfully reviewed k times (or less than k, it's a "less than or equal to" kind of situation) out of n total reviews, given an array of probabilities from FSRS
Plus extra rules to avoid the card going from "leech" to "not a leech" too often and early false positives
1PM I was like "Let's try to build Anki locally", 1:01PM wife was : "Let's replace the washing machine". That's my excuse of letting my development anxiety win today
@ashen light, is there any reason why load_balancer is required for QueueBuilder and CardQueues, so it is load_balancer: LoadBalancer and not load_balancer: Option<LoadBalancer>?
In the sense that can it be refactored to be Option?
Because at the moment, even with disabled LB, Anki still uses LB code / runs code that required for LB functionality:
- Computing
LoadBalancer::newforQueueBuilder::new: https://github.com/ankitects/anki/blob/63c2a09ef6760890c03be4bd83f613c03c512d1f/rslib/src/scheduler/queue/builder/mod.rs#L149-L158 add_card: https://github.com/ankitects/anki/blob/63c2a09ef6760890c03be4bd83f613c03c512d1f/rslib/src/scheduler/answering/mod.rs#L352-L364remove_card: https://github.com/ankitects/anki/blob/63c2a09ef6760890c03be4bd83f613c03c512d1f/rslib/src/scheduler/queue/undo.rs#L42-L51
If you already have a background in something like C++ it is not too hard to learn Rust.
For reference here is what I got when I had a little go last night:
pub fn poisson_binomial_pmf(probabilities: &[f64]) -> Vec<f64> {
let n = probabilities.len();
let mut prev = vec![0.0; n + 1];
let mut curr = vec![0.0; n + 1];
prev[0] = 1.0;
for i in 1..=n {
let p = probabilities[i - 1];
curr[0] = prev[0] * (1.0 - p);
for j in 1..=i {
curr[j] = (prev[j] * (1.0 - p)) + (prev[j - 1] * p);
}
std::mem::swap(&mut prev, &mut curr);
}
prev
}
N.B. I don't really follow exactly what is happening in this algorithm, so I may have messed it up a little. It seems to be giving sensible results though.
@sonic forge ...I could have sworn that it was an Option<LoadBalancer> exactly because of that toggle
Yeah, but then we also need to do all the other stuff, not just pure math
Like tagging cards
oh wait it did need it, it probably could be optional if you want to make a pr for it
Yep, I looked at that bit and decided it didn't look fun enough to drop my current project.
also I'll look into the leech stuff later today I've only half-read this backlog and yuki's question was an easier answer
hey if Expertium can't do it then I can't either lol
to reduce the space complexity you can just maintain the current value in a register and update the array in-place. a[i] affects a[i+1] in the next iteration which is why we just store the unchanged value in a register first
Yeah, but I don't know C++ either π€£
honestly though rust is similar enough to something like python that you can just pick it up, i hear good things about the package manager as well
it's only hard learning a new language when it's a completely different paradigm, like going from python to haskell
Cargo β€οΈ
I'm kind of amazed how bad Python has been in comparison for so long. It has been getting a lot better in recent years. I'm really liking UV.
Have you ever had a go at constraint programming? That's quite a fun weird one. Not something that you can use for everything though.
I haven't. The syntax on wikipedia seems similar to something you can achieve on haskell with the List monad and do notation
I often used Haskell to verify my combinatorics homework for this reason, nice syntax
I like how I'm getting credit now for work I didn't do
jarret did easy days cause I totally ghosted
also why is everyone here afraid of rust
its literally the easiest language because it doesn't let you do stupid shit
I used https://www.minizinc.org/ . It's quite interesting because you have to focus more on defining what a good solution looks like instead of how to get the solution.
MiniZinc is a free and open-source constraint modeling language.
do u plan to do anything again π₯Ί
u brought us some good stuff (LB)
I guess it was Load Balancer then. My point still stands: Easy Days and/or Load Balancer could've been implemented years ago if someone had enough expertise and enthusiasm
I've made a handful of small PRs
maybe yall should learn rust so you can leverage your own enthusiam instead of praying someone randomly shows up and does it
we can write a wiki post on the forums titled "Cool Ideas to Implement: Needs Dev"
new dev comes here and we link it to them. then we sit down and just pray π
at that point it would literally be easier to just do these things yourselves
here: I'll coach someone doing this leech thing
and like
- Plug them into The Mathematizer 9000
someone pls write spec for mathematizer 9000
and don't just say "its like the mathematizer 6000 but with more features"
Written by Claude 3.7
Python version:
`def fast_poisson_binomial_pmf(p):
"""
Calculate the exact PMF of the Poisson Binomial distribution using
dynamic programming and vectorized NumPy operations.
Parameters:
-----------
p : array-like
Array of success probabilities for each Bernoulli trial
Returns:
--------
numpy array of PMF values for k=0,1,...,len(p)
"""
p = np.asarray(p, dtype=np.float64)
n = len(p)
# Validate input
if not np.all((0 <= p) & (p <= 1)):
raise ValueError("All probabilities must be between 0 and 1")
# Handle trivial cases
if n == 0:
return np.array([1.0])
# Initialize the PMF - we'll use a dynamic programming approach
# pmf[j] will represent P(X = j) after considering the first i trials
pmf = np.zeros(n + 1, dtype=np.float64)
pmf[0] = 1.0 # Base case: probability of 0 successes with 0 trials is 1
# Process each probability one at a time
for prob in p:
# For each new Bernoulli trial, we update the entire PMF
# We do this in reverse order to avoid overwriting values we still need
# The key insight: P(X=k after adding new trial) =
# P(X=k with no success in new trial) + P(X=k-1 with success in new trial)
# Calculate the effect of this probability on the entire PMF at once
# This is where the vectorization happens
pmf_shifted = np.zeros_like(pmf)
pmf_shifted[1:] = pmf[:-1] * prob # Probability of success for this trial
# Update PMF by combining the two possibilities
pmf = pmf * (1 - prob) + pmf_shifted # No success + success for this trial
return pmf`
pmf_exact = fast_poisson_binomial_pmf(p_succ) p_value_exact = sum(pmf_exact[0:n_succ + 1])
n_succ is how many successes there were in reality
cool can you turn that into rust for me
See the Google link
why not just ask claude expertium
woah, let's hope C3.8 gets that feature for us
btw, can I ask what happened with the hyperoptimise thingy?
@unique salmon prove ai isn't garbage and get an entire PR written only using ai
bet you can't
You mean preset grouping?
@spring adder
Give me 5 years π€£
Then Github will support LLMs making PRs natively
cuz imo in the ideal future presets should be seperated from params
wat
how
The whole point was to group decks into presets optimally
Not to do some...uhhhh...idk
Idk what you want
do u want 30 presets for you collection?
You mean presets? Why not
params aren't the
How do you separate parameters from presets?
only thing u change
I don't know how to do it on a code level ofc.
Forget about code level. Conceptually, how?
you'll have hyperoptimisation. you don't need to see the behind the scenes.
params will be optimised by one button for all decks.
or maybe invent "general-presets" and param-presets" and make everything more confusing.
So you just want to have per-deck parameters, except decks are grouped, except those groups aren't presets?
Thats...a very strange wish
lmao true
the problem is sometimes I'm trying to change my sort order and now I have to go through 40 fuckin presets all because I was trying to make my scheduling optimal.
whats so hard about making a PR
The fact that I don't even know the basics of Rust
I linked that anki deck
I guess the general thought is you seem to have a lot of stuff you'd like in anki and yet won't do the thing that'll let you actually do those things
relying on me is unreliable!
I'll do a thing then disappear for months
I also rely on Jarrett. Two people = more robustness π€£
like the only reason lb happened is cause I REALLY wanted it
I'm justsaying, its not as hard as you might think
The results of what I was attempting didn't look super promising, and I got bored.
There's definitely a benefit to grouping and splitting presets, and that process could probably be mostly automatic, but I don't really intend to look back into it.
I didn't dig too deep, but it looks like the first annoying thing would be that you don't have access to the revlog at the point where Anki currently marks leeches. You would have to work your way back until you find somewhere with access to the revlog and refactor everything in-between.
Do you have to convince Dae or as long as it's backward compatible and togglable (like off by default) it's all good ? π€
You mean the leech detector? As long as it's togglable, Dae won't object, I think
Oh I was thinking about some kind of being able to configure some triggers based on S/D/R... to trigger "Due" state
Could be fun with the B-W matrix showing you which class of cards (based on S/D) is over/underestimated by FSRS
The leech I guess is not that much disruptive
What I'm describing is somewhat close to Filtered Deck, but it could be then plugged dynamically to things like the B-W matrix
So instead of scheduling based on R, it could schedule based on R/S/D based on those past observation
Typically, the LB and the Easy Days would be part of this "Scheduling Post-Processing"
In fact, we can even argue it's not "Post-Processing" but plain and simple Scheduling
The R < DR might just be another rule in this set of rules
Hmm not really in fact, LB/Easy Days are per nature "Post Processing"
It would break both LB and Easy Days
Well, not "break" per se, but make them much worse at doing their job
We should allow some small deviation from desired retention
But having that split in place could help having more information about an Initial Schedule and the Post-Processed one (because sometimes, you don't know if you get 5d because it's 3d + 1d LB + 1d Easy Days, and you get R=50% instead of 90% ....)
Yeah LB/ED I was wrong to consider them as the same as the Scheduling
You only know how to LB/ED once you already solved the scheduling aspect
Maybe we could allow a deviation of 25% in terms of odds or something
https://github.com/open-spaced-repetition/fsrs4anki-helper/issues/419#issuecomment-2359076992
IMO the threshold with rescheduled should be based on "How low would be my Target R if I don't reschedule it now ?"
I investigated this a bit since I do reschedule a lot
@unique salmon what exactly is the calculation to find leeches for your idea? is it to find cards in the bottom 1% in terms of total failures?
And in general, it's a lot of big interval, like 6 month becoming 3 month, but in fact, the new Target R would be ~70% instead of 80%, since the stability is very very high in the first place
IMO, having a leech detection more suited than the number of Lapse
Having N lapse is not really a measure of a leech in FSRS
Pretty much, yes
See this
We add up the probabilities of 0, 1, 2...k successes, where k is the real number of successes
Which gives us the probability of failing this card n-k times or even more times
ok just a small concern, this would probably flag more than 1% of cards since the rarity of cards behaves as some sort of random walk and cards can fall below the 1% threshold (and come back over it) over time, so this idea requires some more investigation first
- Make it so that at least 3 reviews (without the first review) are required
- Allow a change in the leech status only once per 3 reviews
So if a card has been tagged as a leech, it cannot be un-leeched for the next 2 reviews
Oh, and yes, we would need to code the un-leeching part from zero
Right now Anki can automatically tag cards as leeches, but not automatically remove the tag
i mean it requires some proper investigation in terms of the memory model. Suppose that D doesn't exist in FSRS, then you would actually expect every single card to eventually become a leech at some point in their lifetime, but i'm not sure if this is the behaviour that you want
and now let's reintroduce D. Make the assumption that D is computed solely based on the first few reviews. Then on the 10th review and on, an easy card can very easily become a leech since it rolls the same dice as the high difficulty cards
the DR formula doesn't include D or anything
I'm really not sure what you're trying to say
just a retention based formula is not enough to find leeches
We're not using DR though, we're using R at the time of the review
Picture this, you have a tree that models card histories. Going right corresponds to a pass, going left corresponds to a fail. So suppose we sampled the nodes at 4, 5, 6, 7 for fail fail, fail pass, pass fail, pass pass. Now also suppose that 4 < 5 < 6 < 7 in terms of card easyness. This is reasonable in terms of the review history, 6 and 7 were passed on the first review, 4 and 5 failed. But your method would treat 4 and 6 as having the same rarity
(just suppose that each decision point is 50%)
so D must be used as part of the formula, not just R
or just use D only? technically it has the right interpretation
How would you use D if the detector is based on probabilities?
D is not a probability
But your method would treat 4 and 6 as having the same rarity
No? The detector is based on the entire history of the card (or the last 64 reviews, whatever)
2-4 = 2 fails
3-6 = 1 fail
2-4 -> left-left -> fail-fail
3-6 -> right-left -> pass-fail
4 = fail fail
5 = fail pass
6 = pass fail
7 = pass pass
in this example we suppose that these corresponds to 4 separate cards
ah yeah i miscounted but yeah 5 and 6 has the same rarity here
but all you need is to add another layer to the binary tree to make even weirder results
the point of this exercise is to show that counting failures does not preserve the order of the elements
in this one, counting failures suggests that the review history of the red line is not as bad as the blue line
bue has 2 failures, red has just 1
one simple thing to try is to find the distribution of D and just count the bottom 1% as lapses
You mean leeches?
Meh, then we're back to just counting without taking the probability of recall into account
Since D doesn't depend on R
Finding cards with the highest D is so strongly correlated with counting Agains it might as well be the same thing, up to a constant
sounds like a separate problem for FSRS if it cannot fine tune D based on R
that's what I'd expect but this way also saves easy cards from becoming leeches with 100% probability in the long run
I still don't get why this would happen
Easy cards won't have a small enough number of successful reviews to get tagged
sure but that relies on humans not using anki long enough. and remember this is just a worst case example that shows that the method is wrong. how else could it go wrong? what reason do we have to believe that it is even reasonable to use? that's why you should investigate more
sure but that relies on humans not using anki long enough.
?
If you mean that a card can have an unlucky streak just by accident, sure, but as long as the rest of the review history is normal, the number of successes will still be high enough for it to not get tagged
I mean, I guess it's theoretically possible for a normal card to fail 64 times in a row, but I bet that will never happen
just look at the binary tree example. it shows that easy cards can easily get tagged with high probability
you should use the fsrs simulator or something
but idk what metric you would go for to count proper leeches other than just the bottom 1% of D lol
Alright, assume a perfect scheduler that always schedules a card at exactly R=90%. Suppose we did 2 reviews.
The card always has a 90% chance to go "right" and a 10% chance to go "left". So in the end there are 4 possible outcomes:
- Left-left: 1% chance
- Left-right: 9% chance
- Right-left: 9% chance
- Right-right: 81% chance
Explain what's wrong
@polar maple
apply the same logic to the bigger binary tree here and you would wrongly find that the blue line is more of a leech than the red line
I still don't see the problem
Card 11 (or card in the state 11, whatever) is more of leech than card 10 because 10 has two successes and 11 has one success
Why would this be wrong?
here we make the assumption that cards from left to right are in decreasing difficulty; this makes sense when you examine individual reviews, cards that failed the first review are all of a harder difficulty than all cards that passed the first review
while this isn't a correct assumption this example shows that your idea isn't correct as-is
also since card 11 passed the first review, you would also expect the intervals that it uses to perhaps be longer than the ones in card 10
this is definitely true in the case of FSRS
i guess it boils down to this
give me evidence that your idea would work well
don't expect me to disprove it
it isn't mathematically correct or anything
so at least show that it works well empirically
Well, the current approach in ANki is based on just counting Agains. This doesn't take into account the fact that pressing Again when R is high is a pretty different situation from pressing Again when R is low. The former is surprising, the latter is not. So this method would be more precise because it takes the probability of recall into account. Of course, if FSRS sucks at predicting probabilities, this will suck as well.
As for how many cards will be tagged as leeches, we can use some threshold, like 1%. If the user has a large number of leeches, in reality more than 1% will be tagged as leeches. The more leeches - more precisely, cards for which FSRS consistently overestimates R - the more cards will be tagged as leeches, more than 1%.
Whether this results in satisfactory user experience is somheting that we won't know until we implement it.
replacing a bad method with another bad method isn't satisfactory especially when we have no reason to believe that this new method is any good
I literally just explained why it's better - because it takes into account the probability of recall
then here's another one: take the bottom 1% of D. Why is yours better than mine?
Failing a card 3 times at 99%, 99%, 99% is clearly worse than failing a card 3 times at 70%, 70%, 70%
Because D is not a probability, and is not directly related to it. So we're back to counting Agains, just in a roundabout way
i dont see why D has to be a probablity
i have never disagreed about this, but imo pass pass pass fail fail fail is not worse than fail fail fail pass pass pass especially when you look at the intervals that would be involved with these cards
This is just another interpretation of what a leech can be.
In this sentence, it shows you expect a leech to be the hardest card.
Expertium expect a leech as a state a card is when multiple reviews start to diverge far from what would be normal if R~=DR.
In your case, "leech = difficult", in expertium case, "leech = off the predictions"
The most difficult card, with R=DR most of the time, would be a leech for you, not for Expertium
My opinion is, the current leech definition is just worse than any of those 2 interpretations
yeah, i dont see why leech = difficult is not the goal here. we want to identify cards that would take too much effort to learn
Because given enough time, lapsing N time is just normal with FSRS
and leech = off predictions can easily happen for easy cards by just random luck as i have demonstrated in my examples
Personally I think historically, there was always a difference between a leech and a card with high difficulty, so I think it's intuitively different case
It can be hard, but your predicted R might be matched
Yep, that's the crux
It's also interesting to know, what cards can't be matched correctly to R
It's 2 different question
This is why Philsophy is sometimes useful haha
If a card is insanely difficult subjectively, but gets successfully recalled roughly as often as we expect it, then it's not a leech under my definition
So maybe we could have different leech detectors π
Anki will look like a boeing cockpit but it's all fine
i's fun
even in FSRS, D is not this 'subjective difficulty'
However, I'd argue D can't be used. The "neutral" point for D, will be higher for lower DR
D work really for similar DR
Also I noticed D has like multiple class with multiple normal distribution
You see this
you think "It's like an exponential"
but no
It's different curves
like different clusters or difficulty
So "leeches" could be the rightmost part of each curves
That's because it gets updated by a (roughly) fixed number that depends (mostly) on the grade
do not confuses issue with FSRS with this, we now know that FSRS is not actually a very good prediction model when it gets beaten by a simple moving average
This is FSRS "D" concept
if 100 people flip a fair coin for long enough, eventually all of them will have the lowest (buttom 1%) count of total tails
But we can then un-leech the card later
The tag can be removed as new reviews come in
Btw, this is another advantage over the current method, where the road to leeches is one-way π€£
i would rather not be wrongly notified of a leech
Well, he'll get an alarm "You got 1% unlucky here !", he'll check it, he'll move past
Also, with the "max D" solution, it will also happen
he got unlucky, press again too much time in a row -> max D
then what is the point of leech detection of the detections are of no value?
is it just another false positive?
how can i trust it?
We can choose a threshold such that there will be very few false positives
made for user that will have a chance to check what cards underperformed, and assess themselves the reasons
1% or 0.1% or 0.01% or whatver
Also, we're not flipping coin here
exactly, my first point i brought in is that a 1% threshold will flag more than 1% of cards
We're assessing memory
It's not because R=90% that TRULY the memory has a 90% chance of getting the valu
WE estimate it to be 90%
if he gets it 3 times in a row wrong, it's not just bad luck
It's bad memory
So the interpretation is totally different than a coin flip
We think he will got it at 90% with 60d stability ? Nop
30d ? Nop
5d ? Nop
It's way more than being unlucky
There IS a reason for this sudden loss of memory
it's not just a flipped coin
But that's not a critic of the method itself
it kind of is, i want a statistical test or something instead
This could be configurable also
We can make the threshold 0.2% if that makes you sleep at night better
Or 0.1%, whatever
i guess i disagree, memory can be random, if a card had 3 passes that brought it to 60d and now 3 failures brought it to 5d, it is probably an easy card but there is some interference somewhere, whereas a card that struggled at fail fail fail pass pass pass and is now at 5d as well, will probably grow much slower in the future and also encounter more failures
Sure, Interference could aggregate with Leech in this algorithm
Unfortunately interferences are a bit diffcult to find out
Maybe another term than "leech" could be better for sure
But the current "leech" (lapse >= N) would have to disappear then
So it's not really a criticize of Expertium's proposal, but a criticize of Anki own choice of using Leech as a concept
also i still don't understand why R != DR is even the goal lol, you can predict the distribution of R assuming that FSRS is a good prediction model, but how would you even interpret the bottom 1%? Is it just bad luck or something else? Whereas a high D even in FSRS has a more direct interpretation: these cards will have their intervals grow slower, so they are probably harder
One could even argue "Leech" could just mean "It leeches your workload for very low returns", computing something like "Utility*Stability/Reviews"
hmmmmmm i wonder if D does this...
π€
Unfortunately no
yes it does, high D cards have their stabilities grow slower
How would you define utility?
You're nitpicking the concepts you find useful or not
You can't really, thus why I think "Leech" in that interpretation would be useless
You'd just proxy a vaguely defined term "leech" with another "utility"
"Hard" cards and "Out-of-distribution" card could be better name than "Leeches"
But to me, "Leech" as it is right now is even worse, it's just useless
(The Lapse > N)
For SM2 it can make sense though
Anki having to maintain SM2+FSRS requires some flexibility in terms of interpretation if you want to keep the same UI and options
if "Leech" need to be amalgamed with "Out-of-distribution results", it's fine by me
@unique salmon what is your interpretation of cards that would be in the bottom 1%?
certain interpretations might even lead you to develop a formula for FSRS
Alright, how about a really dumb compromise - find cards within the bottom 5% D AND with <5% p(successes<=k), where k is the current number of successes
In other words, find cards that are leeches according to both methods π
Please don't nitpick the thresholds, btw
also, this idea could pretty much add a new dimension to the usual DSR models, if the running history likelihood is actually important, surely you could add it to DSR and get DSR + H or something and improve FSRS?
Cards for which the forgetting curve/FSRS formulas just don't work
D is not really comparable with different DR though
ok then i will stop talking, this is not the definition of a leech to me, leech is a hard card that i keep on forgetting
"Leeches are cards that you keep forgetting. Because they require so many reviews, they take up a lot more of your time, compared to other cards."
seems reasonable
With Lower DR you get Higher D, so it's not really working well unfortunately
Higher "Neutral" D let's call it like that
D is not influenced by DR
So the more you fail, the more the "balance" goes close to 100%
so DR=60% just by nature will have higher D than DR=90%
Yeah, but Alex wants to look at cards with relatively high D, relative to other cards from the same preset
Blue is 1 fail 1 good
Red is 3 Good 1 Fail
the balance point will be higher for blue
hmm ok
I'm looking at my top D now
Top 1.4%, I have this :
1 fail, 2 hard
Now I look in my top perfmer, D=82%
2 Fail in less reviews
Not that convinced about D
But yeah, basically it compounds over multiple lapse
You can easily fail 3 times in a row and still be considered "easier" than a card taht fails from time to time
IMO in those case, The probability detection is better
Since D will get higher and higher at each lapse, the "High D = leech" comes back to "The more you lapse, the more it's a leech"
Which is stupid
You ask a lot that we should justify those probability detection
But I start to feel you should start to justify it a bit more :/
wait a minute, i'm not the one trying to add a new feature here
In a perfect world with a perfect D, might make sense, but it's not the D we haev
Fixing a broken feature with something at least a bit useful*
Also, don't really have to justify it to you
for these two examples, isn't the second picture of lower difficulty? i mean visually it seems that it reached 22 days stbility much faster than the first picture, the first picture seems to indeed be more difficult
IT did because by nature, Difficulty start way lower initially
So it got to 22d stability "just because"
(In my optimization though)
yes, because the learning steps indicate that it was an easier card or something like that
(Maybe other start high ?)
tbh i'm very confused about your example, it does not paint a bad picture about difficulty at all
@unique salmon maybe you can explain it?
Idk what to say, other than "look at the FSRS formulas"
i mean, why would this example in particular be an argument against difficulty
those histories seems to be modelled by D well
The thing is, the more you lapse, the higher the D will become, the lower the stability will be.
Problem is, my top 1.4% most difficult card is not per say a very difficult one, it's just one that live long enough to have many lapses
i'm sure you can find other examples that paint D in a bad light but these examples just aren't it
But having lapses, is perfectly healthy
FSRS is predicting me 80% success rate, over time, having 10-15 lapses is just perfeclty normal
but those, will get incredible high D
Compared to card I might fail more, but in sooner lapses
Yeah, since "reversion to the mean" (as Jarrett calls it) is very weak for most users, it takes literally thousands of "Good"s to undo one "Again"
Going to sleep though, but you get the idea
So D becomes just "an Again counter"
as Sound suggested if we continue with this idea we would need to rename 'leech' since it isn't what most people expect anymore
this is the definition of leech that most people expect right now
i really struggle to see the problem
Do we need to, though? I mean, with my method a "leech" is still a very hard card, broadly speaking
"A card that you fail more often than expected"
"A card that you fail more often than expected" and "A hard card that you keep on forgetting" seem like the same thing with different wording, no?
The only situation we disagree on is if a card feels difficult subjectively, yet the number of lapses and successes is in line with what is theoretically expected
wait a moment, you defined your way out of this situation just earlier when i suggested if you can use the knowledge of history likelihood to improve FSRS formulas
if the history likelihood actually matters then you can improve FSRS right?
otherwise it doesn't matter and its just a useless metric
You mean using the history of most recent cards, not just this specific card?
Idk how we would use that in FSRS
i mean this specific card
Uhhh...then I'm not sure what do you mean
if the probability of this card's reviews matters then by all means incorporate it into FSRS
Do you mean something where the order matters?
Because in my current method fail - pass - fail is treated the same way as fail - fail - pass
So the idea of leech detection is that we find some signal that suggests that future reviews of this card will be difficult in some manner. But this metric, if it is insightful, should be able to be added as a formula into FSRS to improve predictions
so one way to show if this is actually a useful signal, the likelihood of the review history, is to see if you can find any formulas that uses this value
if you can, then you have found an improvement to FSRS. If you cannot then the metric is not insightful
Ah, ok. So you want me to try to incorporate this into the formulas themselves. Interesting. I'll think about it.
Idk how the hell I'm going to do the math, though
Like, with torch
i'd guess you need to do some plotting and then make some guesses
And also this means that we would have to store every R value in the memory state, Jarrett is not going to be happy about that
hmmm. Compute DSR with FSRS, add this historical likelihood thing as H, have a nn print out a forgetting curve from these 4 values
try this without H as well after
to compare
ask claude to update the numpy code to do it in parallel with another dimension
then ask it to convert it into torch
Actually, the more I think about it, the more is seems like a nightmare. Doing Poisson binomial PMF stuff and storing every value of R...man...
actually if its just to find the likelihood of the history you don't need poisson binomial pmf, you just multiply all the probabilities together
you don't need the bottom 1% or anything like that in this case
you just need the exact probability, which is easily computed by just multiplication
No, I need bottom 1%
ok sure, but it will prob make a nn have a harder time for the DSR + H idea
Ain't no way I'm making an nn compatible with the benchmarking code, mate
Ain't no way
nvm about this, the raw likelihood is useless, we care more about relative likelihoods so yeah poisson binomial pmf it is
Ok, screw it, I highly doubt I will be able to implement it. You can ask Jarrett
Unfortunate. But if this historical likelihood has rich information then such a nn should get significant performance boost from it so it should be investigated
otherwise the leech idea isn't promising
You can incorporate it into something like an LSTM as an input feature
And see if it helps
Sorry if I'm not being helpful here
This sounds pretty interesting, actually. It would make it self-reflective, in a way. Like, "ok, I see that my own predictions are off, I need to account for this fact"
With FSRS I can do a simplified version - just a moving average of abs(R - binary grade)
And then see if I can turn that into some sort of multiplier or something
Or maybe -ln(1-abs(R - binary grade))
I've actually tried incorporating this into the update of D as an extra multiplier, but it didn't do anything good
Maybe with some more parameters and with using it for S instead of D it could be useful
Maybe it needs to be it's own variable
I mean, instead of just a modifier for D
Well, at least a moving average of abs(R - binary grade) is workable, I can do it, unlike the PMF and all that stuff
a problem about this specifically is that LSTM could compute something similar internally even without it being given explicitly, that's why a nn would need only the 4 values DSRH, and no other input about the history of that card
since LSTM is given the full history of the card it has the same information required to compute H
(lmk if you want to call it something else btw, this historical likelihood thing)
π€£ This thread will become research references for spaced repetition.
too bad discord is where information goes to die
I'm asking my friend to develop a daily summary bot for our threadπ
He has developed a bot for telegram: https://github.com/asukaminato0721/telegram-summary-bot
can't wait to see how it generalizes my chat activity
"jake continues to refuse to help"
what do you think about jake as a person
gemini only let me paste half the chat XD
it missed the sarcastic jake
thats the only jake there actually is
I made a flowchart so that I don't have to type out the same thing all the time π€£
Thoughts?
TBF, I'd just put in the bottom "Or just wait to have more reviews before optimizing like crazy"
Also, first stpe would be "Is your Retention around your Desired Retention (~10% ballpark)". Yes -> Intervals are OK
"Have you less than 10K reviews". -> Review more
"Do you change all the time your DR" -> Stop
When people say "There is NO way this interval make sense", they tend to forget that FSRS didnt come up with that interval on its own, it just read your history and that's what it saw
10% is reasonable if your DR is 70%, but not 95%, for example
So it's had to say how much deviation is ok
how bad is it that when
i optimize with FSRShelper addon, and it gives me more cards to do, then i do it, but if it reduces the cards i have to do, then i undo the optimize
π
i always feel sketched having less to do idk why
I mean Stability and LB considered, unfortunately a 95% DR could translate into a 70-80% R, so you're average retention on those days might be way lower unfortunately
Higher the stability, the less it will be a problem, but for low stability, for example if you suddenly added a lot of new card/day, and ~50% of your card have stability <1d, you'll have a bigger difference
It's not a bug or a problem itself, since it will correct itself with higher stability and bigger deck, but it's still something to keep in mind when differences occurs
i have average card stability of 18 days with 93% dr on a deck from october 1 with avg of 8ish new cards a day avg difficulty 77%
As you can see, even if my DR is 84%, my Target R for many cards can be around 71-80%, sometimes just because they have very low stability, sometimes because the LB or rescheduler pushed them a bit too far
with only 8 cards being after 30 days out of 900
I wonder if we should tweak LB a bit. Here's the formula
Maybe we should make it more aggresively schedule cards earlier by using the square of the interval length (or something like that) in the weight
And your daily retention is at how much ? (The Actual one) ? And your RMSE ?
Ok !
this is on my end
To be honest your situation is quite good
There's symptoms that will show you if something is not right
The average stability that's why I wanted to have it so bad, it's because it's a good sign that too much card are added every day, so stability can't be built
I think keeping a high DR is also a smart move
I made the mistake to lower it with time to be able to add more new card/day, and I really shouldn't have
So after X lapse you put them in a leech deck, do you do something specific with them ?
last month was when i put them into their own leech deck and made it optimize with their own deck options, curently went from 13 rmse to 11 now
Different DR ?
Never thought of it but that can be quite good actually
It dropped your RMSE for both ?
It's OK, I'll experiment and search a bit
but as far as i remember yes
What I do is I do some Filtered Decks to manipulate when some cards are due
but it still pollutes my parameters, potentially
in the normal deck it was like 3% and being stubborn to stay around there, then when i separated the leeches out
Leech deck was like 14% ish
after a month, the normal deck is at like 2.5% right now, and leech deck is currently at
10.77%
but there are not so many reviews
only 600 in the past month
compared to the main deck which is like 6000
Yeah it's also difficult to compare because potentially, maybe the RMSE with the old parameters on the non-leech cards, would still have been lower than 3% (if the leeches were the one to mess with the RMSE, while not necessarly being optimized on)
But at least now you have a "proof" that for non-leech cards, your RMSE is quite low
let me check one of my language learning decks
.3170, 2.85% including leeches
after optimization
.3282, 2.34% excluding leeches
i do not have a leech deck for the language learning deck, probably i should tho
Would be fun @unique salmon some kind of "Multi-class FSRS", but I guess we're reinventing neural networks here
- Cluster cards by difficulty rating
- Create different parameters and optimize those for those different difficulty class
By zooming on my difficulty graph (and increasing granularity), I noticed how there is a lot of smaller normal distribution of difficulties :
actually i think i know why and its my own fault
because after 4 months of working TL -> NL i made a note type to make that deck NL -> TL
and so i guess it copied the tags
how can i do this
There's no easy way, I have a difficulty viewer branch of the addon and I'm tweaking in the code directly for now
Contribute to JSchoreels/Anki-Search-Stats-Extended development by creating an account on GitHub.
I think I can always find a local build with that view
but it's on another user session so I'll take a look later to upload it or to improve it so it can go in the main branch
but for now it's just personal stuff
okay no worries
probably mine will be like that
would you prefer me to send u the deck
im genuinely curious what mine would look like
if i export with include scheduling information + deck presets, does it keep statistics ? it should right
I think it will be easier if I just send you a local build when I have it π
haha okay
I'll check on my private session later
no worries
I'm on another one right now
It seems like a good idea!
@quasi shadow https://github.com/open-spaced-repetition/load-balance-simulator
Is this code up to date? I mean, have there been any changes to Anki's code that are not reflected here?
https://github.com/ankitects/anki/commit/69e699dc134419112956209a67cb0d62380d27cd
There was this change, but as far as I can tell it doesn't touch the load balancer itself, only Easy Days. So I assume the code from the repo above is up-to-date
the initial easy days impl was trying a bit too hard to force the graph into a certain shape and that just sorta lessened that effect
theres some extra multipliers in the logic for siblings and easy days but yeah the code in the comment above the lb is still correct
anyway re: lb biasing further to earlier days, is it necessary?
it already (if in a vacuum and days have the same amount of cards scheduled) will prioritize an earlier day. cards due naturally sort of gravitates to a 1/x curve. are the specific numbers of this not wokring properly? is it not 1/xing optimally?
According to Yuki and Sound, no
but really my actual question: given how it already will prioritize an earlier day, how would this cause problems the original fuzzer would not
call me when they double-blind some tests, yuki already had a mental bias against it before it even was in anki. sound has real numbers at least π
but my point point point is: can someone create a measure that can be tested or at least have a sample size of more than two people?
Nonetheless, I'll run simulations to see if I can both reduce volatility AND bring the average retention closer to the desired value
oh for sure
The thing is that because (1 / (cards_due))**2 is squared, it is has huge impact on the weight and it "outshines" the (1 / target_interval)
Current implementation only priorities earlier days if earlier day and further day have the same card count (or near the same) - so the (1 / (cards_due))**2 value is the same.
It is obvious that (1 / (cards_due)) and (1 / target_interval) variables need to be raised to the same power to accomplish fair LB
So yes, the point is that these two variables need to be in the same power.
so I think theres a bit of a misunderstanding? in (1/cards_due)**2 the ^2 makes it smaller, not larger? 1/2 * 1/2 = 1/4
though the numbers are the real numbers, perhaps normalizing those numbers would be better
either way, "priorities earlier days if earlier day and further day have the same card count (or near the same)" yes most due graphs look like this and so it should end up being no different than the normal fuzzing routines in the long term
Sorry if I was not clear about it, but yeah, the fuzzer would have the same problem
But what about the case when user's due graph looks like decreasing exponent (y = 1000/x, x>= 5, for example)?
Further days have smaller card count.
I am a little confused, because I messed up calculations. It seems like the current formula weight = (1 / (cards_due))**2 * (1 / target_interval) already priorities the target_interval - because cards_due is squared is has less impact than target_interval, right?
So basically my point personally is just : If I have a card with low stability, I'd prefer to not take an extra hit with LB/Fuzz
Typically in this example, my DR was 84% when I did it, the Target R will be 79% on March 14th, but it will already drop below my DR tomorrow (since it's 85% today)
It's thus a bit silly because one of my beloved Filtered deck is to mark as due, cards with R<DR ("deck:Japan::1. Vocabulary" prop:r<0.844 -is:due)
(Yeaaah I also do multiple more than 2 decimals lol)
But doing so, my "Future Target R" graph is just perfect
Without it, my average Target R would be, everyday, ~5% lower than my DR
Is it a big deal ? A bit, look how my weekly Retention is way more stable than before
Before doing so, I would have to do some mental gymnastic thinking "If I want to remember 80% of the words when I see it, should I put 90% DR ? 85% DR ?"
Now I'm always in the ball park of DR+/-RMSE
Which is way more motivating than wondering if today will be a bad day or not haha
you probably were but I just forgot
Now to the question "Isn't it a ~1/x", in theory with a regular rhythm of new card/day, it should yes ! But of course, if you stop adding new cards, you'll get a flatter curve, and if you suddenly add more, it will be more aggressive.
I might be wet dreaming, but I think the best way to know what would be the "ideal" curve for a user, would be to base it on his "Review Intervals" curve
"Further days have smaller card count." also typically further days are less susceptible to having retention issues by being off a day. most the issues @bold terrace brought up initally were about fuzzing at short intervals
Yes, example :
I reviewed it 2025/01/17, Target R will be 80% on 2025/05/29 with stability 1.8 month.
If I reschedule it, the target R will be 85.54% on 2025/04/13.
Basically, there's 1.5 month of "room" between a 80 and a 85% R, which is more than fine
Very low target R happen when well, a 1d stability card take a +1d increment just by passing in the Fuzzer/LB
I'm also talking about short intervals. I tweaked my weights, so graduating interval becomes 3 days. It is important that card interval would become two or three days and not four or five.
Also, even without any LB/Fuzz (since my Filtered Deck overwrite the scheduling), I still have somewhat constant amount of review severyday (the spike is just a change of DR and I did the backlog), in 2 days, the curve went back to previous baseline
(As you can see, half my reviews are through Filtered Decks now)
thats just the fuzzer doing fuzzer things independent of the loadbalancer, and yeah it might fuzz short intervals a bit too hard and there was some discussion before about caring about stability when doing this stuff but I don't think anything came of it?
and like, at the first 5 days, the lb is very weighted towards earlier days
Oh, right, we were talking about making fuzz based on S rather than interval lengths
Actually, wait, wouldn't that make the problem worse at high DR?
At DR>90% S>ivl, so the intervals would have more fuzz, not less
What about ...
Making ...
NO fuss π ?
I googled a bit about how to disable it, it seems it's like something sacred in Anki
But seriously, let people turn it off
π
Especially now with FSRS where we SEE the R impacted
WIth SM2 I guess people could make wild assumptions without having anything to rely on
"YOu have low stability ? YOU are the problem"
But now it's clear that a +1d fuzz at early stage of memorizing something is not that great
I tried programming %correct into the rust simulator quickly and ran it with and without the fuzz turned on
idk if this helps? π
WIth 0 New/day I think you're removing the main problematic point lol
I mean anki has always had fuzz
I see you have put 80 review/day, so I'd suggest do the same with ~10 new/day
there was like a brief period of time where it didn't because anki was being rewritten and it wasn't added in yet
Yeah but it's software, "soft" meaning there is nothing sacred here
It's not because something was always there it has to stay
I mean at this point dae is very opposed to any option unless it really pulls its weight
and I don't think this toggle does
Β―_(γ)_/Β―
you're free to make your own build with no fuzzing though
Yeah, that's the point. Cards in short term stage (<=10 days) should be placed in the exact interval as FSRS predicted
its pretty easy to remove
https://github.com/ankitects/anki/blob/9b5da546be49f37c8d6c286e09c86074b2f0c278/rslib/src/scheduler/states/fuzz.rs#L16
static FUZZ_RANGES: [FuzzRange; 3] = [ FuzzRange { start: 2.5, end: 7.0, factor: 0.15, }, FuzzRange { start: 7.0, end: 20.0, factor: 0.1, }, FuzzRange { start: 20.0, end: f32::MAX, factor: 0.05, }, ];
As far as I can tell, fuzz isn't applied to intervals <2.5 (before rounding, I assume)
for some reason when you give the rust simulator new cards it does this π€·ββοΈ
You need to read all the code, this is not an exclusive selection
maybe i programmed it wrong
But we already discussed it lol
I wonder if it would be preferable to increase that 2.5 to like 6
It's a compounding stuff
Maybe disable all fuzzing for <=10 or <=7 intervals?