#FSRS Megathread

1 messages Β· Page 4 of 1

tepid spoke
#

do not optimize it

bold terrace
#

Don't know that one, I'll take a look

#

Right now I really like my own deck

tepid spoke
#

it's a fully integrated grammar course in an Anki-Deck

bold terrace
#

Ah I see πŸ™‚

#

I did a lot of Bunpro for grammar

#

but I stopped after reaching N1, because most points are more vocabulary than grammar now

tepid spoke
#

There is a surprising amount of overlap between "it's a grammar point" and "it's a vocab"

#

Like, is stuff like γ«γ‚ˆγ£γ¦ a vocab or a grammar point?

bold terrace
#

Yeah that one I can understand

#

but at some point it was really like "Put いきγͺγ‚Š to mean it's sudden"

#

And a lot of things I was training didn't really occurs in any material I checked

#

γ«εŠ γˆγ¦γ€γ‚’θΎΌγ‚γ¦... feels more like vocab to me

cursive badge
#

*shakes fist in general direction of Japan*

#

I find myself sometimes going too fast and getting meanings wrong. Then I take a harder look, remember the reading and go "of course!"

tepid spoke
#

I really wish WK would export their warning list via the API...

cursive badge
#

I can understand though. They don't want to make it too easy to steal their secret sauce.

#

It would be really interesting if we could have access to all their review records. Then you could start doing some Kanji-specific SRS optimisation.

tepid spoke
#

It's literally just a list of "yes this is right but not what we're asking for", "this is a common typo" and other cases where they give you a second chance when entering them

#

I implemented that to a limited degree myself

#

where if you type a correct reading of a Kanji, but not the one WK asks for, it will flash the input yellow and you can reconsider

cursive badge
tepid spoke
#

Well, to use the API, you require a paid account

cursive badge
#

I know. I have made my own script that downloads the data and generates notes/cards.

tepid spoke
#

They actually took down the WaniKani deck from AnkiWeb

#

which is fair, it's literally piracy imo

cursive badge
#

I got a lifetime sub. So I feel happy downloading the data.

tepid spoke
#

Writing an Anki-AddOn that sync it for that one user is fair game

#

but then sharing that deck publicly is not

cursive badge
#

I keep meaning to improve my cards, but I got it good enough and keep getting distracted by shiny new projects.

tepid spoke
#

I just took the old WK3 decks templates and slightly tweaked them

cursive badge
#

I wish their SRS was not so bad. With their dataset you could probably get into fancy domain-specific SRS stuff like Math Academy. Instead they just do fixed intervals 😦

tepid spoke
#

Yeah, but it'll be very hard to analyse

#

given they treat meaning+reading as separate but not seperate things

polar maple
#

if you can systematically differentiate these then you can just make more presets

tepid spoke
#

I wouldn't know how to possibly do that

#

I'd have to manually go through over 18000 cards and classify them

quasi shadow
#

Why not press easy on them?

bold terrace
#

Interestingly, if you zoom enough on the 100% difficluty spike, you get something that look like a normal distribution

#

This was [90%,100%] with 100 steps

hasty fractal
#

can we have tooltips (help text like in deck options) for stats page? I think it'll be helpful.

#

especially for the new fsrs related stats which are somewhat complex imo

#

expertium did bring this up before but nothing transpired after it

#

for now, the writings can be just copy pasted from the manual

unique salmon
#

@quasi shadow

hasty fractal
#

bruh...

#

materialists be like, "gravity is also physical, it's made of 'virtual' particles like gravitons"

#

one day they will say consciousness is made up of virtual particles 🀣

unique salmon
#

sorata, I mean this in the nicest way possible: stick to crunching numbers. No philosophy.

hasty fractal
#

bro u should meditate. crunching numbers has destroyed your psyche.

unique salmon
#

Since D is the hot topic these days, I decided to get back to trying to improve D
First, I tried a very simple approach:
` def surprise_f(self, r: Tensor, binary_rating: Tensor):
r = r.clamp(0.0001, 0.9999)
surprise = -torch.log(1 - torch.abs(r - binary_rating))
return surprise.clamp(0, 100)

def next_d(self, old_d: Tensor, r: Tensor, rating: Tensor) -> Tensor:
    binary_rating = torch.where(rating > 1, torch.ones_like(rating), torch.zeros_like(rating))
    delta_d = -self.w[6] * (rating - 3) * self.surprise_f(r, binary_rating)
    new_d = old_d + self.linear_damping(delta_d, old_d)
    new_d = self.mean_reversion(self.init_d(4), new_d)
    return new_d`

Here we multiply delta_d by a surprise factor=-ln(1-abs(R - grade)), where grade is binary. The bigger the difference between R (prediction) and binary grade (reality), the bigger the surprise factor.
As you can see in the image, it didn't help. Next I'll try completely re-defining D.

robust hill
#

couldnt find in manual

#

what is the dotted lining supposed to represent

#

and whys it there

bold terrace
#

desired retention

unique salmon
#

Yep

robust hill
#

no the one going down

#

sorry

unique salmon
#

Ah, just a projection

#

Into the future

robust hill
#

i see

#

no lapses in this card, 93% desired retention

#

but these are its intervals

#

surely this cannot be right

tepid spoke
#

It happens sometimes, but only for some few very basic words and kanji that I'm 100% confident I won't forget in the 3+ years hitting Easy will push it into the future.

bold terrace
#

Same same

#

So many cases where you were "Oh this one I'll press easy" then you realize you got it wrong for some reasons πŸ˜‚

#

Also, the few .5-.1s to think "Was it easy ?" is like ~10-20% of my avg review time (~5s)

#

So yeah, easy is really more like card well known before Anki

naive dome
tepid spoke
#

I don't see how pass/fail would work

#

I could do without the Easy button, but a lot of cards are Hard

#

and if the Hard button was gone, I'd probably press Again on them instead of passing them

bold terrace
tepid spoke
#

The answer time is absolutely useless to judge anything by

#

Unless you install surveilance cameras around the PC, and feed that into some predictive network, there can be so many other reasons a card took long to answer...

#

There's also plenty of other reasons that something was Hard, other than time

bold terrace
#

And I'm the kind of guy to alt tab

robust hill
#

real ones know 1234 = goated

tepid spoke
#

Anything that felt hard

robust hill
tepid spoke
#

Can be that it took me a lot of thinking to piece it back together from the mnemonics, or having almost confused it with a very loosely similar word

#

Really anything that makes me feel like this wasn't "Good"

#

Another example would be getting the meaning right, but thinking the most uncommon nuance of the vocab

#

In that case I often burry the card and then rate it hard the next day if I get it right then, or Fail it if I still don't get the nuance

bold terrace
#

That's also the beauty with neural networks, you give them all those values, if there is a trend, it will train to recognize it, if not, it won't

robust hill
#

what if the neural network goes rogue

#

and just fucks up my deck so i fail my exams

#

yea checkmate

bold terrace
#

Well then don't use Anki in case it fuck up your collection, back to paper

robust hill
#

i dont trust paper..

bold terrace
#

NN doesn't just go rogue, they minimize a goal function

tepid spoke
#

Anki so far behaves very predictable. Adding some neural network to do "something" would make it more or less a random unpredictable rollercoaster

bold terrace
#

if the goal function is difference between prediction and outcome, if the difference is minimal, it can't be "THAT" bad

tepid spoke
#

Neural Networks have a tendency to be incredibly volatile and hard/impossible to understand and debug. So no thanks.

#

It COULD be THAT bad

naive dome
tepid spoke
#

The NN could conclude that it can get your desired retention to the set value by just showing you the same 10 cards every day, and the others never.

unique salmon
#

@polar maple I hear NN slander 🀣

bold terrace
#

If your goal function is to reduce the distance by reviews, it's OK

#

BTW

#

FSRS is not NN but it's how it's working right now

#

It minimize a cost function

#

Sooo, stop using FSRS

#

And do your how mind gymnastic to evaluate what you think is the best interval

tepid spoke
bold terrace
#

But the big advantage in both, is that FSRS/NN are aimed to REDUCE your cognitive load

tepid spoke
#

It's WaniKani

#

And the why is pretty simple, cause they're different things to learn.

#

I'm doing WaniKani, just in Anki.

bold terrace
#

Why not using WK in WK ?

tepid spoke
#

Cause their website and SRS sucks

#

It's not that slow, I think you can finish on their site in 1-1.5 years if you always stay on top of the reviews

#

But that's quite an intense workload

#

It'll have taken me ~2.5 years now when I'm done in mid-April

#

"done" in the sense of no more new or young cards

robust hill
#

what if we cant trust our own memories

#

what should we do

unique salmon
cursive badge
tepid spoke
#

WK has 6609 Vocabs and 2080 Kanji. Though the Vocab are primarily a reinforcement-tool for the Kanji, and learning them themselves is more a bonus.

robust hill
#

😭

#

i only press hard around 4% of the time

cursive badge
bold terrace
tepid spoke
#

I could randomly throw them into some deck, but I highly doubt I could learn them nearly as well by "just" doing that

#

While with the WK method, I'm pretty confident about the vast majority of the Kanji

#

WK just does an excellent job of giving you and reinforcing tools to be able to recognize even an "aged" Kanji, and it works exceptionally well

#

And I simply don't consider language learning any kind of rush or race. I can already read the vast majority of stuff, and am fairly confident about it.

#

So why would I go hard on trying to hyper-optimize it?

#

I don't see what's "in isolation" about it

#

The Vocab are somewhat, but WK very clearly says that it uses the Vocab to give you context for the Kanji

#

It's then your job to find context for the Vocab :D

#

That seems like it'd be horribly overloaded

#

way too much stuff on one card

bold terrace
#

People tend to forget other were perfectly learning languages before SRS apps existed

#

It's a complimentary tool

#

The appeal is having your little sandbox with little graphs

tepid spoke
#

Learning Kanji without some kind of SRS system seems borderline impossible to me

#

I think it's what Japanese kids in school have been using since ages, just manually

bold terrace
#

Believing is not possible is often the first step into making it not possible

#

For example when I started learning english I had no internet, and no SRS knowledge or anything

#

I just looked up words in a dictionnary book

#

took me ages

#

but got it eventually

#

So yeah, in my case I see Anki as very nice supplement

#

but not like some kind of requirement

#

But guess what ... We think "Internet + Easy Tools to learn/review", so can only mean better learning right ?
Except now you also have "Constantly getting notified for random anonymous people talking to you online, getting "recommendation" for a new video, switching core decks every 6 months"

#

I'm also culprit of it but right now I have still 200 reviews to do today that should take me ~20min, but guess what, I'm losing there here, discussing "optimal way of doing it"

#

So no offense, it's also a self-reflection criticize

cursive badge
bold terrace
#

Point is : NOTHING beat hard work and true effort. But we always play pretend by pretending trying to "optimize"

#

but I guess I'm off topic now

bold terrace
#

I was wondering, anyone could explain how to read a "B-W Matrix" ?

I searched a bit online but I'm not really sure I found the right info

#

It's under the "Memorised" graph

cosmic hedge
bold terrace
#

I do I do

#

For example :
"Predicted 71.81, Actual 66 (Prediction at 71.81 ?) compared to a total of 84 prediction ?"

#

So it predicted 71.81 less than it should have ?

#

What does it describe ?

cosmic hedge
#

its for all the reviews that are done with that stability and difficulty

#

hold on i'll find the thing i copied XD

#

#1282005522513530952 message

bold terrace
#

Ah, x-axis is difficulty and y-axis stability

cosmic hedge
#

it means that fsrs is underestimating how well you know cards with that difficulty and stability by 13%

#

oh yeah i should really label that XD

bold terrace
#

yeaaaah with the axis explained now it makes more sense πŸ˜„

#

So yeah for example in my case, for Stabily around 7d, for Difficulty at 90%, it is over estimating for 7%

cosmic hedge
#

yeah

bold terrace
#

@unique salmon if we can determine that, even if it has only a very low impact on global RMSE, why not mitigate that by a malus for those card retention ?

#

Ok global RMSE won't change much

#

But it's not going to hurt to do a ~2-3% malus on that, even if we over estimate it for a few reviews, worst case scenario taht -6.9 will just go up until the "reverse mitigation" kicks in

#

I mean, I'm all for crunching numbers, but RMSE is just one part of the story in this case, no ?

unique salmon
bold terrace
#

Sure no worries

#

But do you more or less agree with the fact that, if RMSE itself can't be reduced that much anymore, having some compensation techniques for specific problematic class would be a nice way forward ?

unique salmon
#

I mean, we can't just add or subtract stuff from the forgetting curve, that would cause all sorts of issues

bold terrace
#

I see

#

And I agree

#

I was thinking more like "Post Processing" techniques

unique salmon
#

We should aim to improve FSRS formulas

bold terrace
#

Sure, if it's doable, I'm all for it πŸ™‚

unique salmon
#

Or just make a neural net that is far more accurate than FSRS Β―_(ツ)_/Β―

bold terrace
#

That might also explain why though

unique salmon
#

Technically, we have one already

bold terrace
#

With enough parameters all those problematic classes of cards could be customized

#

But a one-shot pre-training might not be enough then

#

Who knows if someone else might have a different class of problematic cards

#

but I think @polar maple said it would be possible to modify the weight incrementally with new reviews

unique salmon
unique salmon
bold terrace
#

Exciting stuff

#

If you search guinea pig, you know where I am

#

In the mean time ...

#

"deck:Japan::1. Vocabulary" prop:s>5 prop:s<7 prop:d>0.9 prop:r<.90 -is:due

My good old Filtered Deck will go "brrrrrrr" πŸ˜„

#

Poor's man "AI"

unique salmon
#

Holy tweaking Batman

bold terrace
#

Yeaaah half my review are coming from those

#

That's also a bit why sometimes I say I don't think FSRS should be the only one to "find solution"

#

I mean, Anki could have different services, a prediction services which would be FSRS or RWKV

#

On top of those 2, you could then have some anomaly detection service, card interference service ...

#

So instead of having an ever-growing equation for FSRS, or having nothing left to be able to interact with RWKV, you could extend certain capabilities

cosmic hedge
bold terrace
#

Yes ! But SM2 / FSRS still have really different paradigms

#

SM2 doesn't really predict for example

#

So you need to have a very clear responsability separations between those different capabilities

cosmic hedge
#

also i'd like to clarify that the b-w matrix is for reviews not cards so if your cards have changed stabilises or difficulties since that review then the search wont reallly find all those cards

#

i mean it still works well enough

bold terrace
#

Gotcha

#

But I think it's fine because then, I'm assuming that the difference for S=7 D=9 is high enough that potentially the current one might have issues

unique salmon
#

So yeah, we could add extra stuff on top

#

I have an idea for leech detection, but that requires storing DR at each review in Card Info

#

Actually, no, not DR. It would require storing R at each review

bold terrace
#

Wouldn't R be always more or less equal to the DR ?

unique salmon
#

Not always and not exactly

bold terrace
#

The DR I can understand, you want to check compared to an expected baseline

#

But maybe you should explicit your idea instead of us trying to guess what it is πŸ˜„

unique salmon
#

Uh, no, you can't

#

Not without some really weird conversion mechanism

#

That's what I meant by "weird conversion mechanism"

unique salmon
bold terrace
#

I think I get the idea, and I think it's also a nice addition

unique salmon
#

I'll have to Google/try to do the math myself

bold terrace
#

If your doing reviews each time with a R=80% but you get them wrong 90% of the time, it's a bit strange

unique salmon
#

Yeah

bold terrace
#

It might also be more reactive than my previous idea of looking at the full history

unique salmon
#

Btw, this could also be used to find anti-leeches: cards that are so easy you almost wonder why you are even reviewing them

bold terrace
#

I mean, if you fail 3 times in a row a 90%, you can react to that more quickly than checking if those 3 fails happened in 30 reviews lapse of time

#

I mean, if you have 10% chance of getting something wrong, getting it 3-times in a row is a 0.1% chance

unique salmon
#

Yes, but I want to extend that to fails that didn't happen in a row

#

And all had different p(recall)

#

That gets complicated

bold terrace
#

Yes

#

Not good enough at stats to remember how to do that by heart haha

#

probabilities*

unique salmon
#

In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is conducted. It is named after Jacob Bernoulli, a 17th-century Swiss mathematician, who analyzed them in ...

#

But it's for fixed p

#

Where every trial has the same probability of success

#

In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after SimΓ©on Denis Poisson.
In other words, it is the probability distribution of the
number of successes in a collectio...

unique salmon
#

Alright, I found a package that does this
https://github.com/tsakim/poibin
I am not even going to try to understand the math with complex numbers, but the usage is actually fairly simple. You just give it a list of probabilities for each trial and the number of successes, and then you can calculate the probability of a given number of successes.
Example:
p = np.asarray([0.9, 0.85, 0.95, 0.92, 0.87]) n_succ = 2
This gives me a p-value of 0.836%. So if a card has been reviewed 5 times with these probabilities (note that the order doesn’t matter) there is a 0.836% chance that 2 or fewer reviews will be successful.
@polar maple @quasi shadow Here's the code and an example of usage

EDIT: see an updated example #1282005522513530952 message

GitHub

Poisson Binomial Probability Distribution for Python - tsakim/poibin

#

So now we can make an automated leech detector

#

(as long as we figure out how to port this to Rust)

#

(and if Anki stores R at the time of each review in Card Info, otherwise we can't do this. Actually, maybe we could do the same trick of re-calculating R that we do for the forgetting curve)

bold terrace
unique salmon
#
bold terrace
#

Good job though

unique salmon
#

There is another issue, actually: with this function the p-value is always 0 if the number of successes is 0

#

So if the number of successes is 0, we have to do something else

#

I guess realistically it doesn't matter since cases where a user has never pressed Hard/Good/Easy would be extraordinarily rare.

bold terrace
#

If it's 0 success, normally it would not even be leaving the learning phase right ?

#

Not if "Set Due Date" is used though

#

Is it not a bug of the function/lib though ? If you have 1 fail 0 success at 80% DR, it's strange it's considered as a leech

unique salmon
#

Same-day reviews don't count

unique salmon
bold terrace
#

There's always the "lazy" way to only consider cards with >= N day of review

#

like N=3

unique salmon
#

You mean N reviews?

bold terrace
#

Just wanted to be sure we exclude the same day review

#

but yeah

unique salmon
#

There is an issue with that. For example, if you have 3 reviews at 70% p(recall), the probability of failing all 3 is 2.7%, not low enough. At 90% it would be sufficient, at 70% - nope

bold terrace
#

Isn't it what you would also expect from that algo ?

#

I mean, failing 3 times if it's lower DR is more expected than failing it 3 times with higher DR

#

And at least here you know exactly why, there's a clear formula

unique salmon
#

I mean that applyign this formula after a fixed number of reviews doesn't work equally well for all cards

#

And all DRs

#

What is sufficient to identify a leech at DR=90% is not sufficient at DR=70%

bold terrace
#

aaah sure

#

But I mean the idea of the >= N is just to have handle the edge case

#

if the N is too low to be possible ,it's not that much an issue

#

in general in programming you make sure your edge cases are treated (denominator different null, sqrt positive...) but then if a value is not possible with certain other parameters, you don't necessarly over complexify the code (except if, you can really change the algorithm complexity, but in this case, it's not really worth it)

unique salmon
#

Ok, hold on, something doesn't add up here, I'm investigating it

#

I'm running simulations to confirm that the function works, and it's off

#

Or, rather, the simulations do match the output of the function, but for the other number of successes...

#

Ok, I have no idea why the p-value in this function is calculated the way it is, I'll have to actually use my brain to figure out how to get what I need

#

Ok, so the way they calculate the p-value is weird plus there is the whole "exactly n successes" vs "n successes or fewer successes" thing. Ok, I got it all figured out now

unique salmon
#

Also, I was initially thinking of using 0.1% as a cutoff, but it seems like that's too conservative, let's use 1%

unique salmon
#

Oh, and let's limit it to N reviews >=2, in other words, if a card has only been reviewed once, let's not tag is as a leech

#

But yeah, we can identify leeches at very high DR with merely 2 reviews, that's cool as hell!

bold terrace
#

I agree it's really nice

#

Even if we can't really schedule it differently, at least we have something to flag those more precisely

#

The whole "Flag leeches after X lapses" was not correct for FSRS, not at least there is a nice alternative

#

At least, someone has to implement it haha

#

You take comissions @ashen light ? πŸ˜„

unique salmon
#

Also, it seems like identifying anti-leeches may not be viable. With 10 reviews at 90% R, there is a 34.9% chance of every single review being successful. Even with 10 reviews at 70% R, there is still a 2.8% chance of all of them being successful.

#

Then again, it's also arguably less important

#

Now the big question is: do we want a "Recalculate leeches" button if automatic leech detection is enabled? πŸ€”
Since changing FSRS parameters will change R, which in turn can change whether the card counts as a leech or no

bold terrace
bold terrace
unique salmon
bold terrace
#

In your proof-of-concept, you consider all the history of the reviews or only a certain amount ? In both case, I guess a card marked as "leech" could leave that state in theory, with new good reviews ?

unique salmon
#

And recalculating would make it more accurate, so that's another reason to recalculate leeches too

unique salmon
#

Say, last 32 or last 64 reviews

#

Not that it would matter for most cards

bold terrace
#

Yeah I was thinking, if a card let say with 10 Reviews, is counted as a leech, because let say there was 9 fail. Then, the user start to review it "more normally", with a success rate of around 90% (The DR). I guess, with the formula, it would slowly increase the p-value until it goes above the threshold

#

But of course, if something is a leech with 200 reviews, the user would have to review it a lot I guess before it leaves the Leeches state (which could be logical since well, it was a leech for so much time)

#

So limiting the window is maybe not a game changer

unique salmon
#

We should also probably add a rule that a change in the leech/not a leech status should occur no more frequently than once per 3 reviews, in case some cards are very close to the threshold all the time

#

So if card has been tagged as a leech, it needs at least 3 more reviews before it can be untagged

#

Which will likely be annoying to implement

hasty fractal
unique salmon
# hasty fractal have u tested the idea itself?

Lol, no
https://forums.ankiweb.net/t/automated-leech-detection/56887
I'm just waiting for Jarrett, ain't no way I'm writing Rust code

polar maple
#

you can do it with a couple for loops

#

unless you are crunching large inputs and want to use FFT but we don't need it for anki

unique salmon
#

And yeah, for small n you can calculate the probabilities exactly, but I don't want to mess with it

#

And we do want it to be fast for large n, for cards with a lot of reviews

polar maple
#

theres a simple O(n^2) way to compute this that can be written in like 5 lines of code

unique salmon
#

n=20, 10 successes

polar maple
#

the only conclusion to draw is that claude didn't write good code

unique salmon
#

Lol

#

Maybe

polar maple
#

its not a maybe lol this is simpe computing

#

FFT has a larger constant overhead

#

6^2 can be done in tight for loops very quickly

unique salmon
#

Tbf, both are way under 1 second for n=64

polar maple
# unique salmon

try this


def poisson_binomial_pmf(p, k=None):
    p = np.asarray(p)
    n = len(p)
    pmf = np.zeros(n + 1)
    pmf[0] = 1
    npmf = np.zeros(n + 1)
    for i in range(n):
        for j in range(n + 1):
            npmf[j] = 0
        for j in range(n):
            npmf[j] += pmf[j] * (1 - p[i])
        for j in range(1, n+1):
            npmf[j] += pmf[j - 1] * p[i]
        pmf, npmf = npmf, pmf
    return pmf
polar maple
#

lol what your computer isn't 1000x slower than mine

unique salmon
#

?

#

Meanwhile I tried to completely redefine D in terms of R minus binary grade. I won't go into the details because it sucks anyway, even if I let it run for 10 epochs instead of 5 like normally
So now I will work on implementing decay based on D...or at least try to

polar maple
# unique salmon

I wrote the same version of my code in C++ because python loops are slow. When i read the updated claude code it is pretty much equivalent to my code but writes more of it in python compared to numpy, i was considering doing a similar thing but luckily i read the claude code first. The c++ performance is more in line with what I expect with such small and tight iteration. For the C++ took the average of n = 100 to n = 200. Doing n = 40 alone took 2.9 microseconds.

milli: 0.01059
micro: 10.58614
#

Rust would prob get similar performance as C++ here

#

so FFT not needed

ashen light
#

can you commission me to do a study on the difference between truely forgetting and "was it A or B?"

unique salmon
ashen light
#

wait actually?

#

surely it isn't that hard

#

I mean I showed up like a handful of months ago and did shit its not like I did anything particuarly difficult

unique salmon
# ashen light wait actually?

Unless there is some random Rust enjoyer who never comments but always reads everything in this channel/on the forums and is currently reading this...yes, actually

ashen light
#

I mean surely someone here can just do it

#

like honestly, I would not say I have particuarly deep knowledge

#

I mean I showed up one day not touching anki in a decade and made a PR, its actually not that big a thing. someone else can just do something similar

#

even you could πŸƒ

unique salmon
#

I don't know Rust, man

ashen light
#

its not hard

unique salmon
#

I wouldn't even be able to make a PR in Python

ashen light
#

you know python

unique salmon
#

Apps have their own App Python

#

It's like legalese

#

But for apps

#

Same goes for any other language tbh

ashen light
#

here this is what you need

#

I mean, syntax stuff is the least interestng part of languages, just fix what the compiler complains about

ashen light
#

actually, do you even use anki

#

I have seen no evidence that you do

unique salmon
ashen light
#

lame

#

here I was hoping to out you as a fraud

#

rip

unique salmon
#

Lol

ashen light
#

me and another unnamed person were speculating on your actual usage

#

πŸƒ

#

anyway

#

go hit that rust deck

#

then become a valued anki contributor

#

so is your background just in math (or stats) then?

unique salmon
#

If by "background" you mean "reading articles on the Internet and watching YouTube", then yes

ashen light
#

oh if thats your background then you're fully equipped to read/watch the internet but with rust instead of math πŸƒ

cursive badge
hasty fractal
#

I think we can see how many false positives we get in the 20k dataset

#

comparing what the method gives us with a few first reviews vs with all the reviews

unique salmon
unique salmon
#

Well, considering that neither Jarrett nor Jake seem to be interested in implementing it anyway, I'm not sure if there is a reason to discuss it

bold terrace
#

But it's never too late to try

#

Also complex thing with jumping into codebase like this, is to know where to best put those logic, and being careful of different entrypoints you might not have expected

#

But unfortunately it's not something you learn until you break that software πŸ˜‚

quasi shadow
unique salmon
hasty fractal
#

my first post on forums was about using difficulty for leech detection

#

well that didn't work out and I didn't expect this to progress any further

#

a year later, seem like things will move forward

#

maybe it'll take a few years to get there

unique salmon
# hasty fractal maybe it'll take a few years to get there

It'll take either two weeks or an infinite number years, nothing inbetween 🀣
Seriously though, it's just math + doing the tagging. I don't see any major roadblocks. The only issue is whether there are people who are willing to implement it. I assume Dae will have no objections.
So it's either:

  • There are people who want to implement it -> it gets done by the next Anki release
    Or:
  • Nobody wants to implement it -> the idea dies in obscurity
#

It's like Easy Days - it could have been implemented literally years ago, it's just that it relies on one guy, Jake, to do the work

#

It's not like there were any new developments that suddenly made Easy Days possible

#

If we had a clone of Jake, he could've implemented Easy Days, like, 5 years ago 🀣

hasty fractal
#

but yeah agree with u on other points.

unique salmon
# hasty fractal well you're too hopeful if you expect it to easily work out. I imagine it'll tak...

Not really

  1. Use 1% as the cutoff for the leech tagger.
  2. Use the leech detector only if there are at least 3 reviews, to avoid early false positives.
  3. Make it so that a card's status as leech/not leech can only change once every 3 reviews, to avoid "zig-zagging" where it's a leech after one review, then not a leech after the next review, then a leech again, then not a leech again. Such cases would be rare, but we should still consider them.
  4. Add a "Automatic leech detection" button.

The only part that is debatable is re-calculating leeches. Should it be a separate button? Should it be combined with "Optimize", so that leeches are automatically recalculated when FSRS parameters change?

hasty fractal
#

but ofc no use debating over this

#

imo leeches should be recalculated yeah

#

and no more options please

unique salmon
#

πŸ’€

#

Ok, I genuinely don't understand what this guy was cooking

#

FFT uses an enormous amount of RAM for n>10000

#

It's just strictly worse than combinatorics

cursive badge
#

If I did the maths right the "Direct Convolution" algorithm should only need ~800KB of memory for n=50,000.
Pretty good memory efficiency compared to 74.5GB πŸ˜‚

unique salmon
polar maple
#

nah no card has this many reviews

unique salmon
#

No, I mean just implement it at all

polar maple
polar maple
unique salmon
#

dang

ashen light
#

all this talk of me writing a feature and I still don't even know what the feature actually is

ashen light
unique salmon
# ashen light all this talk of me writing a feature and I still don't even know what the featu...
  1. Take all probabilities of recall over the card's history
  2. Plug them into The Mathematizer 9000
  3. It returns a bunch of probabilities for every possible outcome, like 0 successes, 1 success, 2 successes...n successes, where n is the number of reviews aka length of the array with probabilities, without the first review
  4. Check how many successes (k) the card actually has
  5. Sum the first k probabilities to find p(successes<=k)
  6. If it's <1%, tag the card as a leech

Basically, we check how likely it is that a card would be successfully reviewed k times (or less than k, it's a "less than or equal to" kind of situation) out of n total reviews, given an array of probabilities from FSRS

#

Plus extra rules to avoid the card going from "leech" to "not a leech" too often and early false positives

bold terrace
sonic forge
#

@ashen light, is there any reason why load_balancer is required for QueueBuilder and CardQueues, so it is load_balancer: LoadBalancer and not load_balancer: Option<LoadBalancer>?
In the sense that can it be refactored to be Option?
Because at the moment, even with disabled LB, Anki still uses LB code / runs code that required for LB functionality:

cursive badge
# polar maple i got the same excuse as you, i don't know how to write rust

If you already have a background in something like C++ it is not too hard to learn Rust.

For reference here is what I got when I had a little go last night:

pub fn poisson_binomial_pmf(probabilities: &[f64]) -> Vec<f64> {
    let n = probabilities.len();

    let mut prev = vec![0.0; n + 1];
    let mut curr = vec![0.0; n + 1];

    prev[0] = 1.0;

    for i in 1..=n {
        let p = probabilities[i - 1];

        curr[0] = prev[0] * (1.0 - p);

        for j in 1..=i {
            curr[j] = (prev[j] * (1.0 - p)) + (prev[j - 1] * p);
        }

        std::mem::swap(&mut prev, &mut curr);
    }

    prev
}

N.B. I don't really follow exactly what is happening in this algorithm, so I may have messed it up a little. It seems to be giving sensible results though.

ashen light
#

@sonic forge ...I could have sworn that it was an Option<LoadBalancer> exactly because of that toggle

unique salmon
#

Like tagging cards

ashen light
#

oh wait it did need it, it probably could be optional if you want to make a pr for it

cursive badge
ashen light
#

also I'll look into the leech stuff later today I've only half-read this backlog and yuki's question was an easier answer

polar maple
unique salmon
polar maple
#

it's only hard learning a new language when it's a completely different paradigm, like going from python to haskell

cursive badge
#

Cargo ❀️

#

I'm kind of amazed how bad Python has been in comparison for so long. It has been getting a lot better in recent years. I'm really liking UV.

cursive badge
polar maple
#

I often used Haskell to verify my combinatorics homework for this reason, nice syntax

ashen light
#

jarret did easy days cause I totally ghosted

#

also why is everyone here afraid of rust

#

its literally the easiest language because it doesn't let you do stupid shit

cursive badge
hasty fractal
#

u brought us some good stuff (LB)

unique salmon
ashen light
#

I've made a handful of small PRs

#

maybe yall should learn rust so you can leverage your own enthusiam instead of praying someone randomly shows up and does it

hasty fractal
#

we can write a wiki post on the forums titled "Cool Ideas to Implement: Needs Dev"

#

new dev comes here and we link it to them. then we sit down and just pray πŸ™

ashen light
#

at that point it would literally be easier to just do these things yourselves

#

here: I'll coach someone doing this leech thing

#

and like

#
  1. Plug them into The Mathematizer 9000
    someone pls write spec for mathematizer 9000
#

and don't just say "its like the mathematizer 6000 but with more features"

unique salmon
#

Written by Claude 3.7

#

Python version:
`def fast_poisson_binomial_pmf(p):
"""
Calculate the exact PMF of the Poisson Binomial distribution using
dynamic programming and vectorized NumPy operations.

Parameters:
-----------
p : array-like
    Array of success probabilities for each Bernoulli trial

Returns:
--------
numpy array of PMF values for k=0,1,...,len(p)
"""
p = np.asarray(p, dtype=np.float64)
n = len(p)

# Validate input
if not np.all((0 <= p) & (p <= 1)):
    raise ValueError("All probabilities must be between 0 and 1")

# Handle trivial cases
if n == 0:
    return np.array([1.0])

# Initialize the PMF - we'll use a dynamic programming approach
# pmf[j] will represent P(X = j) after considering the first i trials
pmf = np.zeros(n + 1, dtype=np.float64)
pmf[0] = 1.0  # Base case: probability of 0 successes with 0 trials is 1

# Process each probability one at a time
for prob in p:
    # For each new Bernoulli trial, we update the entire PMF
    # We do this in reverse order to avoid overwriting values we still need
    # The key insight: P(X=k after adding new trial) =
    #   P(X=k with no success in new trial) + P(X=k-1 with success in new trial)

    # Calculate the effect of this probability on the entire PMF at once
    # This is where the vectorization happens
    pmf_shifted = np.zeros_like(pmf)
    pmf_shifted[1:] = pmf[:-1] * prob  # Probability of success for this trial

    # Update PMF by combining the two possibilities
    pmf = pmf * (1 - prob) + pmf_shifted  # No success + success for this trial

return pmf`
#

pmf_exact = fast_poisson_binomial_pmf(p_succ) p_value_exact = sum(pmf_exact[0:n_succ + 1])
n_succ is how many successes there were in reality

ashen light
#

cool can you turn that into rust for me

unique salmon
hasty fractal
#

why not just ask claude expertium

unique salmon
#

I did

#

Hence the link

#

But I can't ask it do the PR

hasty fractal
#

woah, let's hope C3.8 gets that feature for us

#

btw, can I ask what happened with the hyperoptimise thingy?

ashen light
#

@unique salmon prove ai isn't garbage and get an entire PR written only using ai

#

bet you can't

unique salmon
#

@spring adder

hasty fractal
#

ye that

#

hyperoptimise is a better name actually

unique salmon
hasty fractal
#

cuz imo in the ideal future presets should be seperated from params

unique salmon
#

how

#

The whole point was to group decks into presets optimally

#

Not to do some...uhhhh...idk

#

Idk what you want

hasty fractal
unique salmon
#

You mean presets? Why not

hasty fractal
#

params aren't the

unique salmon
#

How do you separate parameters from presets?

hasty fractal
#

only thing u change

hasty fractal
unique salmon
#

Forget about code level. Conceptually, how?

hasty fractal
#

you'll have hyperoptimisation. you don't need to see the behind the scenes.

#

params will be optimised by one button for all decks.

#

or maybe invent "general-presets" and param-presets" and make everything more confusing.

unique salmon
#

So you just want to have per-deck parameters, except decks are grouped, except those groups aren't presets?
Thats...a very strange wish

hasty fractal
#

lmao true

#

the problem is sometimes I'm trying to change my sort order and now I have to go through 40 fuckin presets all because I was trying to make my scheduling optimal.

ashen light
unique salmon
ashen light
#

I linked that anki deck

#

I guess the general thought is you seem to have a lot of stuff you'd like in anki and yet won't do the thing that'll let you actually do those things

#

relying on me is unreliable!

#

I'll do a thing then disappear for months

unique salmon
#

I also rely on Jarrett. Two people = more robustness 🀣

ashen light
#

like the only reason lb happened is cause I REALLY wanted it

#

I'm justsaying, its not as hard as you might think

spring adder
cursive badge
#

I didn't dig too deep, but it looks like the first annoying thing would be that you don't have access to the revlog at the point where Anki currently marks leeches. You would have to work your way back until you find somewhere with access to the revlog and refactor everything in-between.

bold terrace
unique salmon
bold terrace
#

Oh I was thinking about some kind of being able to configure some triggers based on S/D/R... to trigger "Due" state

#

Could be fun with the B-W matrix showing you which class of cards (based on S/D) is over/underestimated by FSRS

#

The leech I guess is not that much disruptive

#

What I'm describing is somewhat close to Filtered Deck, but it could be then plugged dynamically to things like the B-W matrix

#

So instead of scheduling based on R, it could schedule based on R/S/D based on those past observation

#

Typically, the LB and the Easy Days would be part of this "Scheduling Post-Processing"

#

In fact, we can even argue it's not "Post-Processing" but plain and simple Scheduling

#

The R < DR might just be another rule in this set of rules

#

Hmm not really in fact, LB/Easy Days are per nature "Post Processing"

unique salmon
#

We should allow some small deviation from desired retention

bold terrace
#

But having that split in place could help having more information about an Initial Schedule and the Post-Processed one (because sometimes, you don't know if you get 5d because it's 3d + 1d LB + 1d Easy Days, and you get R=50% instead of 90% ....)

bold terrace
#

You only know how to LB/ED once you already solved the scheduling aspect

unique salmon
bold terrace
#

IMO the threshold with rescheduled should be based on "How low would be my Target R if I don't reschedule it now ?"

#

I investigated this a bit since I do reschedule a lot

polar maple
#

@unique salmon what exactly is the calculation to find leeches for your idea? is it to find cards in the bottom 1% in terms of total failures?

bold terrace
#

And in general, it's a lot of big interval, like 6 month becoming 3 month, but in fact, the new Target R would be ~70% instead of 80%, since the stability is very very high in the first place

bold terrace
#

Having N lapse is not really a measure of a leech in FSRS

unique salmon
#

We add up the probabilities of 0, 1, 2...k successes, where k is the real number of successes

#

Which gives us the probability of failing this card n-k times or even more times

polar maple
#

ok just a small concern, this would probably flag more than 1% of cards since the rarity of cards behaves as some sort of random walk and cards can fall below the 1% threshold (and come back over it) over time, so this idea requires some more investigation first

unique salmon
#

So if a card has been tagged as a leech, it cannot be un-leeched for the next 2 reviews

#

Oh, and yes, we would need to code the un-leeching part from zero

#

Right now Anki can automatically tag cards as leeches, but not automatically remove the tag

polar maple
#

i mean it requires some proper investigation in terms of the memory model. Suppose that D doesn't exist in FSRS, then you would actually expect every single card to eventually become a leech at some point in their lifetime, but i'm not sure if this is the behaviour that you want

#

and now let's reintroduce D. Make the assumption that D is computed solely based on the first few reviews. Then on the 10th review and on, an easy card can very easily become a leech since it rolls the same dice as the high difficulty cards

#

the DR formula doesn't include D or anything

unique salmon
#

I'm really not sure what you're trying to say

polar maple
#

just a retention based formula is not enough to find leeches

unique salmon
#

We're not using DR though, we're using R at the time of the review

polar maple
#

Picture this, you have a tree that models card histories. Going right corresponds to a pass, going left corresponds to a fail. So suppose we sampled the nodes at 4, 5, 6, 7 for fail fail, fail pass, pass fail, pass pass. Now also suppose that 4 < 5 < 6 < 7 in terms of card easyness. This is reasonable in terms of the review history, 6 and 7 were passed on the first review, 4 and 5 failed. But your method would treat 4 and 6 as having the same rarity

#

(just suppose that each decision point is 50%)

#

so D must be used as part of the formula, not just R

#

or just use D only? technically it has the right interpretation

unique salmon
#

How would you use D if the detector is based on probabilities?

#

D is not a probability

unique salmon
#

2-4 -> left-left -> fail-fail
3-6 -> right-left -> pass-fail

polar maple
#

ah yeah i miscounted but yeah 5 and 6 has the same rarity here

#

but all you need is to add another layer to the binary tree to make even weirder results

#

the point of this exercise is to show that counting failures does not preserve the order of the elements

#

in this one, counting failures suggests that the review history of the red line is not as bad as the blue line

#

bue has 2 failures, red has just 1

polar maple
unique salmon
#

Meh, then we're back to just counting without taking the probability of recall into account

#

Since D doesn't depend on R

#

Finding cards with the highest D is so strongly correlated with counting Agains it might as well be the same thing, up to a constant

polar maple
polar maple
unique salmon
#

Easy cards won't have a small enough number of successful reviews to get tagged

polar maple
#

sure but that relies on humans not using anki long enough. and remember this is just a worst case example that shows that the method is wrong. how else could it go wrong? what reason do we have to believe that it is even reasonable to use? that's why you should investigate more

unique salmon
#

If you mean that a card can have an unlucky streak just by accident, sure, but as long as the rest of the review history is normal, the number of successes will still be high enough for it to not get tagged

#

I mean, I guess it's theoretically possible for a normal card to fail 64 times in a row, but I bet that will never happen

polar maple
unique salmon
#

Ok, I'll do a graph with probabilities later

polar maple
#

you should use the fsrs simulator or something

#

but idk what metric you would go for to count proper leeches other than just the bottom 1% of D lol

unique salmon
#

Alright, assume a perfect scheduler that always schedules a card at exactly R=90%. Suppose we did 2 reviews.
The card always has a 90% chance to go "right" and a 10% chance to go "left". So in the end there are 4 possible outcomes:

  1. Left-left: 1% chance
  2. Left-right: 9% chance
  3. Right-left: 9% chance
  4. Right-right: 81% chance

Explain what's wrong

#

@polar maple

polar maple
# polar maple

apply the same logic to the bigger binary tree here and you would wrongly find that the blue line is more of a leech than the red line

unique salmon
#

Card 11 (or card in the state 11, whatever) is more of leech than card 10 because 10 has two successes and 11 has one success

#

Why would this be wrong?

polar maple
#

while this isn't a correct assumption this example shows that your idea isn't correct as-is

#

also since card 11 passed the first review, you would also expect the intervals that it uses to perhaps be longer than the ones in card 10

#

this is definitely true in the case of FSRS

unique salmon
#

I still don't see the problem

#

Genuinely

polar maple
#

i guess it boils down to this

#

give me evidence that your idea would work well

#

don't expect me to disprove it

#

it isn't mathematically correct or anything

#

so at least show that it works well empirically

unique salmon
#

Well, the current approach in ANki is based on just counting Agains. This doesn't take into account the fact that pressing Again when R is high is a pretty different situation from pressing Again when R is low. The former is surprising, the latter is not. So this method would be more precise because it takes the probability of recall into account. Of course, if FSRS sucks at predicting probabilities, this will suck as well.
As for how many cards will be tagged as leeches, we can use some threshold, like 1%. If the user has a large number of leeches, in reality more than 1% will be tagged as leeches. The more leeches - more precisely, cards for which FSRS consistently overestimates R - the more cards will be tagged as leeches, more than 1%.
Whether this results in satisfactory user experience is somheting that we won't know until we implement it.

polar maple
#

replacing a bad method with another bad method isn't satisfactory especially when we have no reason to believe that this new method is any good

unique salmon
#

I literally just explained why it's better - because it takes into account the probability of recall

polar maple
#

then here's another one: take the bottom 1% of D. Why is yours better than mine?

unique salmon
#

Failing a card 3 times at 99%, 99%, 99% is clearly worse than failing a card 3 times at 70%, 70%, 70%

unique salmon
polar maple
polar maple
bold terrace
#

The most difficult card, with R=DR most of the time, would be a leech for you, not for Expertium

#

My opinion is, the current leech definition is just worse than any of those 2 interpretations

polar maple
#

yeah, i dont see why leech = difficult is not the goal here. we want to identify cards that would take too much effort to learn

bold terrace
#

Because given enough time, lapsing N time is just normal with FSRS

polar maple
#

and leech = off predictions can easily happen for easy cards by just random luck as i have demonstrated in my examples

bold terrace
#

Personally I think historically, there was always a difference between a leech and a card with high difficulty, so I think it's intuitively different case

#

It can be hard, but your predicted R might be matched

bold terrace
#

It's also interesting to know, what cards can't be matched correctly to R

#

It's 2 different question

#

This is why Philsophy is sometimes useful haha

unique salmon
#

If a card is insanely difficult subjectively, but gets successfully recalled roughly as often as we expect it, then it's not a leech under my definition

bold terrace
#

So maybe we could have different leech detectors πŸ˜„

#

Anki will look like a boeing cockpit but it's all fine

#

i's fun

polar maple
unique salmon
polar maple
polar maple
unique salmon
#

The tag can be removed as new reviews come in

#

Btw, this is another advantage over the current method, where the road to leeches is one-way 🀣

polar maple
bold terrace
#

Also, with the "max D" solution, it will also happen

#

he got unlucky, press again too much time in a row -> max D

polar maple
#

is it just another false positive?

#

how can i trust it?

bold terrace
#

Human interpretation and feedback

#

The whole review chain is not made for machine

unique salmon
#

We can choose a threshold such that there will be very few false positives

bold terrace
#

made for user that will have a chance to check what cards underperformed, and assess themselves the reasons

unique salmon
#

1% or 0.1% or 0.01% or whatver

bold terrace
#

Also, we're not flipping coin here

polar maple
bold terrace
#

We're assessing memory

#

It's not because R=90% that TRULY the memory has a 90% chance of getting the valu

#

WE estimate it to be 90%

#

if he gets it 3 times in a row wrong, it's not just bad luck

#

It's bad memory

#

So the interpretation is totally different than a coin flip

#

We think he will got it at 90% with 60d stability ? Nop
30d ? Nop
5d ? Nop

It's way more than being unlucky

#

There IS a reason for this sudden loss of memory

#

it's not just a flipped coin

unique salmon
polar maple
bold terrace
unique salmon
#

We can make the threshold 0.2% if that makes you sleep at night better

#

Or 0.1%, whatever

bold terrace
#

By dychotomy finding the %-age that will flag 1% haha

#

Just joking

polar maple
bold terrace
#

Unfortunately interferences are a bit diffcult to find out

#

Maybe another term than "leech" could be better for sure

#

But the current "leech" (lapse >= N) would have to disappear then

#

So it's not really a criticize of Expertium's proposal, but a criticize of Anki own choice of using Leech as a concept

polar maple
#

also i still don't understand why R != DR is even the goal lol, you can predict the distribution of R assuming that FSRS is a good prediction model, but how would you even interpret the bottom 1%? Is it just bad luck or something else? Whereas a high D even in FSRS has a more direct interpretation: these cards will have their intervals grow slower, so they are probably harder

bold terrace
#

One could even argue "Leech" could just mean "It leeches your workload for very low returns", computing something like "Utility*Stability/Reviews"

polar maple
#

πŸ€”

bold terrace
#

Unfortunately no

polar maple
#

yes it does, high D cards have their stabilities grow slower

bold terrace
#

"Utility"

#

A low stability card bumping your workload could have high value

unique salmon
#

How would you define utility?

bold terrace
#

You're nitpicking the concepts you find useful or not

bold terrace
#

You'd just proxy a vaguely defined term "leech" with another "utility"

#

"Hard" cards and "Out-of-distribution" card could be better name than "Leeches"

#

But to me, "Leech" as it is right now is even worse, it's just useless

#

(The Lapse > N)

#

For SM2 it can make sense though

#

Anki having to maintain SM2+FSRS requires some flexibility in terms of interpretation if you want to keep the same UI and options

#

if "Leech" need to be amalgamed with "Out-of-distribution results", it's fine by me

polar maple
#

@unique salmon what is your interpretation of cards that would be in the bottom 1%?

#

certain interpretations might even lead you to develop a formula for FSRS

unique salmon
#

Alright, how about a really dumb compromise - find cards within the bottom 5% D AND with <5% p(successes<=k), where k is the current number of successes
In other words, find cards that are leeches according to both methods πŸ˜…

#

Please don't nitpick the thresholds, btw

polar maple
#

also, this idea could pretty much add a new dimension to the usual DSR models, if the running history likelihood is actually important, surely you could add it to DSR and get DSR + H or something and improve FSRS?

unique salmon
bold terrace
#

D is not really comparable with different DR though

polar maple
#

"Leeches are cards that you keep forgetting. Because they require so many reviews, they take up a lot more of your time, compared to other cards."

#

seems reasonable

bold terrace
#

With Lower DR you get Higher D, so it's not really working well unfortunately

#

Higher "Neutral" D let's call it like that

#

D is not influenced by DR

#

So the more you fail, the more the "balance" goes close to 100%

#

so DR=60% just by nature will have higher D than DR=90%

unique salmon
#

Yeah, but Alex wants to look at cards with relatively high D, relative to other cards from the same preset

bold terrace
#

Blue is 1 fail 1 good

#

Red is 3 Good 1 Fail

#

the balance point will be higher for blue

bold terrace
#

I'm looking at my top D now

#

Top 1.4%, I have this :

#

1 fail, 2 hard

#

Now I look in my top perfmer, D=82%

#

2 Fail in less reviews

#

Not that convinced about D

#

But yeah, basically it compounds over multiple lapse

#

You can easily fail 3 times in a row and still be considered "easier" than a card taht fails from time to time

#

IMO in those case, The probability detection is better

#

Since D will get higher and higher at each lapse, the "High D = leech" comes back to "The more you lapse, the more it's a leech"

#

Which is stupid

#

You ask a lot that we should justify those probability detection

#

But I start to feel you should start to justify it a bit more :/

polar maple
bold terrace
#

In a perfect world with a perfect D, might make sense, but it's not the D we haev

bold terrace
#

Also, don't really have to justify it to you

polar maple
# bold terrace Top 1.4%, I have this :

for these two examples, isn't the second picture of lower difficulty? i mean visually it seems that it reached 22 days stbility much faster than the first picture, the first picture seems to indeed be more difficult

bold terrace
#

So it got to 22d stability "just because"

#

(In my optimization though)

polar maple
bold terrace
#

(Maybe other start high ?)

polar maple
#

tbh i'm very confused about your example, it does not paint a bad picture about difficulty at all

#

@unique salmon maybe you can explain it?

unique salmon
polar maple
#

i mean, why would this example in particular be an argument against difficulty

#

those histories seems to be modelled by D well

bold terrace
polar maple
#

i'm sure you can find other examples that paint D in a bad light but these examples just aren't it

bold terrace
#

But having lapses, is perfectly healthy

#

FSRS is predicting me 80% success rate, over time, having 10-15 lapses is just perfeclty normal

#

but those, will get incredible high D

#

Compared to card I might fail more, but in sooner lapses

unique salmon
#

Yeah, since "reversion to the mean" (as Jarrett calls it) is very weak for most users, it takes literally thousands of "Good"s to undo one "Again"

bold terrace
#

Going to sleep though, but you get the idea

unique salmon
polar maple
polar maple
#

i really struggle to see the problem

unique salmon
#

"A card that you fail more often than expected"

unique salmon
#

The only situation we disagree on is if a card feels difficult subjectively, yet the number of lapses and successes is in line with what is theoretically expected

polar maple
#

if the history likelihood actually matters then you can improve FSRS right?

#

otherwise it doesn't matter and its just a useless metric

unique salmon
#

You mean using the history of most recent cards, not just this specific card?

#

Idk how we would use that in FSRS

polar maple
#

i mean this specific card

unique salmon
#

Uhhh...then I'm not sure what do you mean

polar maple
#

if the probability of this card's reviews matters then by all means incorporate it into FSRS

unique salmon
#

Do you mean something where the order matters?

#

Because in my current method fail - pass - fail is treated the same way as fail - fail - pass

polar maple
#

So the idea of leech detection is that we find some signal that suggests that future reviews of this card will be difficult in some manner. But this metric, if it is insightful, should be able to be added as a formula into FSRS to improve predictions

#

so one way to show if this is actually a useful signal, the likelihood of the review history, is to see if you can find any formulas that uses this value

#

if you can, then you have found an improvement to FSRS. If you cannot then the metric is not insightful

unique salmon
#

Ah, ok. So you want me to try to incorporate this into the formulas themselves. Interesting. I'll think about it.

#

Idk how the hell I'm going to do the math, though

#

Like, with torch

polar maple
#

i'd guess you need to do some plotting and then make some guesses

unique salmon
#

And also this means that we would have to store every R value in the memory state, Jarrett is not going to be happy about that

polar maple
# unique salmon Like, with torch

hmmm. Compute DSR with FSRS, add this historical likelihood thing as H, have a nn print out a forgetting curve from these 4 values

#

try this without H as well after

#

to compare

unique salmon
#

No, I mean, Poisson binomial stuff

#

But with torch arrays and whatnot

polar maple
#

ask claude to update the numpy code to do it in parallel with another dimension

#

then ask it to convert it into torch

unique salmon
#

Actually, the more I think about it, the more is seems like a nightmare. Doing Poisson binomial PMF stuff and storing every value of R...man...

polar maple
#

actually if its just to find the likelihood of the history you don't need poisson binomial pmf, you just multiply all the probabilities together

#

you don't need the bottom 1% or anything like that in this case

#

you just need the exact probability, which is easily computed by just multiplication

unique salmon
#

No, I need bottom 1%

polar maple
#

ok sure, but it will prob make a nn have a harder time for the DSR + H idea

unique salmon
#

Ain't no way I'm making an nn compatible with the benchmarking code, mate

#

Ain't no way

polar maple
unique salmon
#

Ok, screw it, I highly doubt I will be able to implement it. You can ask Jarrett

polar maple
#

Unfortunate. But if this historical likelihood has rich information then such a nn should get significant performance boost from it so it should be investigated

#

otherwise the leech idea isn't promising

unique salmon
#

You can incorporate it into something like an LSTM as an input feature

#

And see if it helps

#

Sorry if I'm not being helpful here

unique salmon
#

With FSRS I can do a simplified version - just a moving average of abs(R - binary grade)

#

And then see if I can turn that into some sort of multiplier or something

unique salmon
#

I've actually tried incorporating this into the update of D as an extra multiplier, but it didn't do anything good

#

Maybe with some more parameters and with using it for S instead of D it could be useful

#

Maybe it needs to be it's own variable

#

I mean, instead of just a modifier for D

#

Well, at least a moving average of abs(R - binary grade) is workable, I can do it, unlike the PMF and all that stuff

polar maple
#

since LSTM is given the full history of the card it has the same information required to compute H

#

(lmk if you want to call it something else btw, this historical likelihood thing)

quasi shadow
#

🀣 This thread will become research references for spaced repetition.

ashen light
#

too bad discord is where information goes to die

quasi shadow
ashen light
#

can't wait to see how it generalizes my chat activity

#

"jake continues to refuse to help"

cosmic hedge
#

gemini only let me paste half the chat XD

hasty fractal
#

it missed the sarcastic jake

ashen light
#

thats the only jake there actually is

unique salmon
#

I made a flowchart so that I don't have to type out the same thing all the time 🀣
Thoughts?

robust hill
#

what if

#

fsrs gives intervals that are too medium

bold terrace
#

TBF, I'd just put in the bottom "Or just wait to have more reviews before optimizing like crazy"

#

Also, first stpe would be "Is your Retention around your Desired Retention (~10% ballpark)". Yes -> Intervals are OK

#

"Have you less than 10K reviews". -> Review more

#

"Do you change all the time your DR" -> Stop

#

When people say "There is NO way this interval make sense", they tend to forget that FSRS didnt come up with that interval on its own, it just read your history and that's what it saw

unique salmon
#

So it's had to say how much deviation is ok

robust hill
#

how bad is it that when

#

i optimize with FSRShelper addon, and it gives me more cards to do, then i do it, but if it reduces the cards i have to do, then i undo the optimize

#

😭

#

i always feel sketched having less to do idk why

bold terrace
#

Higher the stability, the less it will be a problem, but for low stability, for example if you suddenly added a lot of new card/day, and ~50% of your card have stability <1d, you'll have a bigger difference

#

It's not a bug or a problem itself, since it will correct itself with higher stability and bigger deck, but it's still something to keep in mind when differences occurs

robust hill
#

i have average card stability of 18 days with 93% dr on a deck from october 1 with avg of 8ish new cards a day avg difficulty 77%

bold terrace
#

As you can see, even if my DR is 84%, my Target R for many cards can be around 71-80%, sometimes just because they have very low stability, sometimes because the LB or rescheduler pushed them a bit too far

robust hill
#

with only 8 cards being after 30 days out of 900

unique salmon
#

I wonder if we should tweak LB a bit. Here's the formula
Maybe we should make it more aggresively schedule cards earlier by using the square of the interval length (or something like that) in the weight

bold terrace
robust hill
#

havent finished today yet

#

log loss .3368 rmse 2.52

bold terrace
#

So it's quite nice !

#

And how many reviews per day more or less ?

#

200 ?

robust hill
bold terrace
#

Ok !

robust hill
bold terrace
#

To be honest your situation is quite good

robust hill
#

some day i would not do new cards, some days id do a lot

#

well thats good atleast

bold terrace
#

There's symptoms that will show you if something is not right

#

The average stability that's why I wanted to have it so bad, it's because it's a good sign that too much card are added every day, so stability can't be built

#

I think keeping a high DR is also a smart move

#

I made the mistake to lower it with time to be able to add more new card/day, and I really shouldn't have

robust hill
#

u should see the leech deck

bold terrace
#

So after X lapse you put them in a leech deck, do you do something specific with them ?

robust hill
# robust hill

last month was when i put them into their own leech deck and made it optimize with their own deck options, curently went from 13 rmse to 11 now

bold terrace
#

Different DR ?

robust hill
#

same dr but optimize the parameters against leeches only

#

its working pretty well

bold terrace
#

Never thought of it but that can be quite good actually

robust hill
#

i talked about it here

#

and someone gave me an idea

bold terrace
#

It dropped your RMSE for both ?

robust hill
#

i dont remember

#

ive uploaded screenshots before so one sec

#

i guess i dont have it

bold terrace
#

It's OK, I'll experiment and search a bit

robust hill
#

but as far as i remember yes

bold terrace
#

What I do is I do some Filtered Decks to manipulate when some cards are due

#

but it still pollutes my parameters, potentially

robust hill
#

in the normal deck it was like 3% and being stubborn to stay around there, then when i separated the leeches out
Leech deck was like 14% ish
after a month, the normal deck is at like 2.5% right now, and leech deck is currently at

#

10.77%

#

but there are not so many reviews

#

only 600 in the past month

#

compared to the main deck which is like 6000

bold terrace
#

Yeah it's also difficult to compare because potentially, maybe the RMSE with the old parameters on the non-leech cards, would still have been lower than 3% (if the leeches were the one to mess with the RMSE, while not necessarly being optimized on)

#

But at least now you have a "proof" that for non-leech cards, your RMSE is quite low

robust hill
#

let me check one of my language learning decks

#

.3170, 2.85% including leeches

#

after optimization
.3282, 2.34% excluding leeches

#

i do not have a leech deck for the language learning deck, probably i should tho

bold terrace
#

Would be fun @unique salmon some kind of "Multi-class FSRS", but I guess we're reinventing neural networks here

  • Cluster cards by difficulty rating
  • Create different parameters and optimize those for those different difficulty class
robust hill
#

but some cards are tagged leech for a strange reason

#

0 lapses, 7 reviews

bold terrace
#

By zooming on my difficulty graph (and increasing granularity), I noticed how there is a lot of smaller normal distribution of difficulties :

robust hill
#

because after 4 months of working TL -> NL i made a note type to make that deck NL -> TL

#

and so i guess it copied the tags

bold terrace
#

There's no easy way, I have a difficulty viewer branch of the addon and I'm tweaking in the code directly for now

#

I think I can always find a local build with that view

#

but it's on another user session so I'll take a look later to upload it or to improve it so it can go in the main branch

#

but for now it's just personal stuff

robust hill
#

okay no worries

#

probably mine will be like that

#

would you prefer me to send u the deck

#

im genuinely curious what mine would look like

#

if i export with include scheduling information + deck presets, does it keep statistics ? it should right

bold terrace
#

I think it will be easier if I just send you a local build when I have it πŸ™‚

robust hill
#

haha okay

bold terrace
#

I'll check on my private session later

robust hill
#

no worries

bold terrace
#

I'm on another one right now

robust hill
#

take your time

#

i keep procrastinating my studies anyways 😭

unique salmon
unique salmon
ashen light
#

the initial easy days impl was trying a bit too hard to force the graph into a certain shape and that just sorta lessened that effect

#

theres some extra multipliers in the logic for siblings and easy days but yeah the code in the comment above the lb is still correct

#

anyway re: lb biasing further to earlier days, is it necessary?

#

it already (if in a vacuum and days have the same amount of cards scheduled) will prioritize an earlier day. cards due naturally sort of gravitates to a 1/x curve. are the specific numbers of this not wokring properly? is it not 1/xing optimally?

unique salmon
ashen light
#

but really my actual question: given how it already will prioritize an earlier day, how would this cause problems the original fuzzer would not

#

call me when they double-blind some tests, yuki already had a mental bias against it before it even was in anki. sound has real numbers at least πŸƒ

#

but my point point point is: can someone create a measure that can be tested or at least have a sample size of more than two people?

unique salmon
#

Nonetheless, I'll run simulations to see if I can both reduce volatility AND bring the average retention closer to the desired value

ashen light
#

oh for sure

sonic forge
#

The thing is that because (1 / (cards_due))**2 is squared, it is has huge impact on the weight and it "outshines" the (1 / target_interval)
Current implementation only priorities earlier days if earlier day and further day have the same card count (or near the same) - so the (1 / (cards_due))**2 value is the same.
It is obvious that (1 / (cards_due)) and (1 / target_interval) variables need to be raised to the same power to accomplish fair LB

#

So yes, the point is that these two variables need to be in the same power.

ashen light
#

so I think theres a bit of a misunderstanding? in (1/cards_due)**2 the ^2 makes it smaller, not larger? 1/2 * 1/2 = 1/4

#

though the numbers are the real numbers, perhaps normalizing those numbers would be better

#

either way, "priorities earlier days if earlier day and further day have the same card count (or near the same)" yes most due graphs look like this and so it should end up being no different than the normal fuzzing routines in the long term

bold terrace
sonic forge
# ashen light either way, "priorities earlier days if earlier day and further day have the sam...

But what about the case when user's due graph looks like decreasing exponent (y = 1000/x, x>= 5, for example)?
Further days have smaller card count.
I am a little confused, because I messed up calculations. It seems like the current formula weight = (1 / (cards_due))**2 * (1 / target_interval) already priorities the target_interval - because cards_due is squared is has less impact than target_interval, right?

bold terrace
#

So basically my point personally is just : If I have a card with low stability, I'd prefer to not take an extra hit with LB/Fuzz

Typically in this example, my DR was 84% when I did it, the Target R will be 79% on March 14th, but it will already drop below my DR tomorrow (since it's 85% today)

#

It's thus a bit silly because one of my beloved Filtered deck is to mark as due, cards with R<DR ("deck:Japan::1. Vocabulary" prop:r<0.844 -is:due)

#

(Yeaaah I also do multiple more than 2 decimals lol)

#

But doing so, my "Future Target R" graph is just perfect

#

Without it, my average Target R would be, everyday, ~5% lower than my DR

#

Is it a big deal ? A bit, look how my weekly Retention is way more stable than before

#

Before doing so, I would have to do some mental gymnastic thinking "If I want to remember 80% of the words when I see it, should I put 90% DR ? 85% DR ?"

#

Now I'm always in the ball park of DR+/-RMSE

#

Which is way more motivating than wondering if today will be a bad day or not haha

ashen light
bold terrace
#

Now to the question "Isn't it a ~1/x", in theory with a regular rhythm of new card/day, it should yes ! But of course, if you stop adding new cards, you'll get a flatter curve, and if you suddenly add more, it will be more aggressive.

I might be wet dreaming, but I think the best way to know what would be the "ideal" curve for a user, would be to base it on his "Review Intervals" curve

ashen light
bold terrace
#

Very low target R happen when well, a 1d stability card take a +1d increment just by passing in the Fuzzer/LB

sonic forge
bold terrace
#

Also, even without any LB/Fuzz (since my Filtered Deck overwrite the scheduling), I still have somewhat constant amount of review severyday (the spike is just a change of DR and I did the backlog), in 2 days, the curve went back to previous baseline

#

(As you can see, half my reviews are through Filtered Decks now)

ashen light
#

and like, at the first 5 days, the lb is very weighted towards earlier days

unique salmon
ashen light
#

I think for intervals under like a week it could be preferable!

#

I'd help but

ashen light
unique salmon
#

Actually, wait, wouldn't that make the problem worse at high DR?
At DR>90% S>ivl, so the intervals would have more fuzz, not less

bold terrace
#

What about ...

#

Making ...

#

NO fuss πŸ˜„ ?

#

I googled a bit about how to disable it, it seems it's like something sacred in Anki

#

But seriously, let people turn it off

#

πŸ˜‚

#

Especially now with FSRS where we SEE the R impacted

#

WIth SM2 I guess people could make wild assumptions without having anything to rely on

#

"YOu have low stability ? YOU are the problem"

#

But now it's clear that a +1d fuzz at early stage of memorizing something is not that great

cosmic hedge
#

I tried programming %correct into the rust simulator quickly and ran it with and without the fuzz turned on

#

idk if this helps? πŸ˜‚

bold terrace
ashen light
#

I mean anki has always had fuzz

bold terrace
#

I see you have put 80 review/day, so I'd suggest do the same with ~10 new/day

ashen light
#

there was like a brief period of time where it didn't because anki was being rewritten and it wasn't added in yet

bold terrace
#

It's not because something was always there it has to stay

ashen light
#

I mean at this point dae is very opposed to any option unless it really pulls its weight

#

and I don't think this toggle does

#

Β―_(ツ)_/Β―

#

you're free to make your own build with no fuzzing though

sonic forge
ashen light
#

its pretty easy to remove

unique salmon
# bold terrace But now it's clear that a +1d fuzz at early stage of memorizing something is not...

https://github.com/ankitects/anki/blob/9b5da546be49f37c8d6c286e09c86074b2f0c278/rslib/src/scheduler/states/fuzz.rs#L16
static FUZZ_RANGES: [FuzzRange; 3] = [ FuzzRange { start: 2.5, end: 7.0, factor: 0.15, }, FuzzRange { start: 7.0, end: 20.0, factor: 0.1, }, FuzzRange { start: 20.0, end: f32::MAX, factor: 0.05, }, ];

As far as I can tell, fuzz isn't applied to intervals <2.5 (before rounding, I assume)

GitHub

Anki's shared backend and web components, and the Qt frontend - ankitects/anki

cosmic hedge
bold terrace
cosmic hedge
#

maybe i programmed it wrong

bold terrace
#

But we already discussed it lol

ashen light
#

I wonder if it would be preferable to increase that 2.5 to like 6

bold terrace
#

It's a compounding stuff

sonic forge
#

Maybe disable all fuzzing for <=10 or <=7 intervals?