#FSRS Megathread

1 messages · Page 10 of 1

quasi shadow
#

😎 21 parameters ≈ 297 weights

unique salmon
#

Now if only we could finally make D that depends on R, I would consider FSRS to be complete

unique salmon
#

...oh, right

lapis hearth
#

You bet your sweet potato I won't forget about this

quasi shadow
#

FSRS-6 with optimizable decay reduces 16% RMSE(bins) relatively.

#

The absolute difference is 0.0085, which is equal to the difference between FSRS v4 and FSRS-5 recency

#

😎 So it's good enough for a major version.

#

😅 The only problem is I have to refactor fsrs-rs to support it...

cosmic hedge
quasi shadow
#

So, there are two ways: 1) store the decay in parameters, or 2) store it in the card

unique salmon
#

Oh, yeah, decay will probably have to be stored in card info

unique salmon
cosmic hedge
lapis hearth
quasi shadow
quasi shadow
#

I’m afraid that FSRS-6 cannot catch up this release.

bold terrace
#

I was at first like “oh no gain then” then I saw the GRU haha

#

Well done 🙏🏻

unique salmon
quasi shadow
lapis hearth
#

the next update could wait long enough

#

He could make betas

bold terrace
#

Now I have wet dream about D shenanigans (clustered parameters etc …) but let’s celebrate first 😂

unique salmon
#

Just let me run 50 more tests with neural D, surely I will find something good

#

🍃

hasty fractal
#

we gotta throw a party!

#

hmm... so do we gotta update the manual too? optimise if u change DR.

#

@unique salmon please confirm before I open an issue.

hasty fractal
#

oh ok then it's a nice change

bold terrace
unique salmon
bold terrace
#

Sure

unique salmon
#

Imagine two scenarios: the user presses “Easy” when R=99% and the user presses “Easy” when R=1%. Clearly, in the latter case this is a very surprising outcome, whereas in the former case it’s not surprising at all. Meaning that D should be updated by a different amount in those cases.

bold terrace
#

Yeah as I see it the optimizer find already the best way to set how D move to fit user history, and the fact it optimizes itself into very distinct clusters might be a sign that instead of trying to bind the equation with those D parameters, we could just optimize every other parameters based on D clusters

hasty fractal
#

Wait, so you might need to optimise after you've changed your DR?

#

Because you'll have more reviews in a different R region?

#

someone other than expertium confirm it for me.

#

I think I'm confused. I'll leave it upto others. Signing off.

unique salmon
#

Like
Again = +2
Hard = +1
Good = +0
Easy = -1

unique salmon
#

If you change DR your data won't change

#

At least not until you actually do reviews

bold terrace
#

It could have decided differently if that would have been the optimal way

unique salmon
#

Nope. The linear relationship is hard-coded

bold terrace
#

My low D has a more healthy D management for example

unique salmon
#

The D update formula is basically just new_d = old_d - (parameter * (grade - 3))
Where Again=1, Hard=2, Good=3, Easy=4
So for Good new_d = old_d - (parameter * (3 - 3)), which is just old_d
For Again new_d = old_d - (parameter * (1 - 3)), which is old_d + 2* parameter
We've tried making the values associated with each grade optimizable, it didn't do shit

#

So overall
Again -> new_d = old_d + 2*parameter
Hard -> new_d = old_d + parameter
Good -> new_d = old_d
Easy -> new_d = old_d - parameter

#

Hence why you get clusters

#

Then there is extra stuff to make it a little smoother

bold terrace
#

My "Low-Normal D deck" : 0.3888, 1.4114, 3.4578, 32.9702, 7.4108, 0.4662, 1.5312, 0.0677, 1.3478, 0.3241, 0.8557, 1.9796, 0.0889, 0.2942, 2.2884, 0.1258, 3.3983, 0.3663, 0.7039

#

And the distribution is quite nice too

unique salmon
unique salmon
hasty fractal
#

that's what I meant ofc

#

Because you'll have more reviews in a different R region

unique salmon
#

Oh, ok, my bad. Still, I'm not sure if I would recommend optimizing parameters more than before FSRS-6

hasty fractal
#

hmm, fair enough.

soft skiff
#

I would like to ask if there is an exam at the end of next month, is one month of "fsrs" review enough?

quasi shadow
#

I'm evaluating some collections with extremely low decay.

#

Their retention is > 93%.

#

And their decay is < 0.03.

polar maple
#

i guess decay would be low whenever you have data where R looks like it is increasing over time, which can happen by chance if the collection size is small

quasi shadow
bold terrace
quasi shadow
quiet saddle
#

I have a question about the parameters I use for FSRS:
When I switched to FSRS I followed some explanations suggesting to use something along preset:"Bases" -is:suspended as parameter for the optimization field. But now that I think about it, as I suspend the leeches it means FSRS never use the data on them to optimize. So maybe the -is:suspended part creates some kind of survivor bias 🤔
Should I remove that part?

quasi shadow
#

😅 The mystery of float 32.

#

I cannot solve this problem (

unique salmon
#

@quasi shadow here are the results of trying optimizable decay
It's very slightly better than decay=-0.2, according to my tests. Regularization doesn't help much, and increasing learning speed makes results worse
Regarding clamping, I already said it on Github - even if for some users decay >-0.1 provides a better fit, we shouldn't use it for scheduling reasons. We don't want people to have intervals measured in thousands of years

quasi shadow
#

In my test, it’s ~5% better.

unique salmon
#

Here it's like 2-4% better

#

In these tests the decay parameter is clamped between (-0.1, -0.7) btw

#

Again, as I said on Github, we can extend the lower limit to -1, but the upper limit must be -0.1. Anything closer to 0 than that will not be usable for scheduling

#

For example, with S=1 and decay=-0.025, the first interval at DR=80% would be something like 120 days, and the first interval at DR=70% would be something like 25000 days

unique salmon
unique salmon
# quasi shadow Slightly better?

S=1

Decay=-0.025, the first interval at DR=80% is around 120 days, the first interval at DR=70% is around 25000 days
Decay=-0.1, the first interval at DR=80% is around 4.5 days, the first interval at DR=70% is around 18 days
Decay=-0.15, the first interval at DR=80% is around 3.4 days, the first interval at DR=70% is around 9.5 days

On second thought, even -0.1 is a little crazy for scheduling. Let's make the limit -0.15. Again, I understand that for some users it won't provide the best fit, but we have to worry about the intervals being reasonable.
@bold terrace @polar maple @hasty fractal your input is welcome

unique salmon
#

I feel like we have a spectrum, where Alex is on the far end of "Screw scheduling as long as metrics look good", Jarrett somewhat closer to the center, and me at the other end (but not super far) of "Screw metrics a long as scheduling looks good"

bold terrace
#

Personally I think if an user has a profile that lead the optimizer to get a decay very close to 0, I think it's fine as long as he realize that he will have to push the DR very high, or set some max interval limit. I don't think it's very healthy to put restrictions on decay itself if the problem is the interval.

After all, if S=1, if DR=90% then in all cases the first interval would be 1d, by definition, right ? So it's not per say a big big issue as long as the user is aware that since he don't drop to 70-80% easily, either he chose a higher DR, either he chose to "never see those cards".

But I'm not 100% against putting some limit, I'm just worried about the kind of compouding effect it could have : If the decay is limited let's say to 0.15, but the user truly would need a 0.10, then other parameters will be optimized to try to make the reviews anyhow longer. Sure, it won't really have the same drastic effect than a decay lower, but if the user keep outperforming the prediction, the other parameter will try to compensate for that decay we didn't allow to go to 0.10, for ex

bold terrace
unique salmon
hasty fractal
#

actually I'm a left-leaning moderate liberal.

unique salmon
#

We now have SRS-left and SRS-right 🤣

bold terrace
#

I'd be vertically aligned lol : The model itself should be metric-focused, but the UX could be controlled by external factor than the model itself 😄

unique salmon
#

SRS political compass

bold terrace
#

alt-left

#

or alt-right

#

But alt-right has bad connotation

#

😄

hasty fractal
#

(rightly so)

unique salmon
#

To me anything political has bad connotations tbh

hasty fractal
#

expertium hides his political opinions behind an apolitical mask

#

we've seen it all

bold terrace
#

I also think that people might consider things as "not looking good" when they don't necessarly realize that it's their history that led their current interval to be what is is, and even if the scheduler overshoot, it will self-correct once those overshoot will be indeed evaluated as being overshot. And even if the card getting scheduled 2y later won't be re-reviewd for the next 2 year, a lot other card will be, and the optimizer will take those in account, and with some regular reschedule, those will be adjusted

#

So 2 cases scenario : Either you're in fact studying for something in 1 month, and you're not in a mood to see as little as possible but you want to maximise your score : Then crank up the DR, do mass-review, set your max interval to 1d, whatever.

But if in contrary your goal is long term learning of something, forgetting a few things for a few months is in the grand scheme of things, a really trivial thing

unique salmon
# bold terrace Personally I think if an user has a profile that lead the optimizer to get a dec...

Personally I think if an user has a profile that lead the optimizer to get a decay very close to 0, I think it's fine as long as he realize that he will have to push the DR very high, or set some max interval limit. I don't think it's very healthy to put restrictions on decay itself if the problem is the interval.

Sadly, I'm pretty sure the result of this approach will be 100 posts with "Why is my first interval 100 years?"

bold terrace
#

Max Interval means : Whatever the DR, I always want to be recalled something every X interval

unique salmon
#

And then every single interval will max out

#

Nah, we gotta set a reasonable limit to decay

bold terrace
#

And if with time they realize they have a 99.99% retention in that hardcoded interval, they will gradually get more confidence increasing it 🙂

bold terrace
unique salmon
#

The thing about decay is that the difference between -0.1 and -0.01 is not 10x longer intervals, more like x50000 longer intervals (at DR=80%, specifically)

bold terrace
# bold terrace And if with time they realize they have a 99.99% retention in that hardcoded int...

Because in fine, I think most of those people are people not confident enough to trust an algorithm, so maybe the max interval route for them is not the worst. Let's not forget that at a max interval of 90d, you could have 1000 card and still it will give you an average daily workload of ˜11 reviews per day .... The price to pay "to be sure to never go north to 90d" is not that big to pay right ?

#

And I'm pretty sure those guys have less than 500 cards in a review state

#

Personally at 3k and having done Anki for the past ~15 months, I'm in a state that if you tell me ~20% of my cards will ahve a 1y interval instead of 30d, I'll tell you thank you lol

unique salmon
#

@quasi shadow what are the 5th and 95th percentiles here?
(if the 5th percentile is >-0.1 [aka <0.1 in absolute values], let's just pretend it's -0.1 😉)

cursive badge
unique salmon
#

Lol

bold terrace
#

I mean, I think algorithm should be as "pure" of any external alteration, while setting external limits are OK 🙂 By doing so, you can more easily troubleshoot if you have a clear information that : "Right now, the system thinks you'll need X months to remember it, but the interval will be 30d because you wanted it like that"

unique salmon
#

The thing is, even people for whom it provides a better fit wouldn't be happy, because nobody wants 100 years intervals

#

I really don't think that making R 0.1% more accurate is worth making users 100x more concerned about interval lengths

bold terrace
#

And maybe those users have a big DR

cursive badge
#

*cough* *points vigorously at book covered in sigils and giving off a menacing aura*

unique salmon
bold terrace
#

After all, the decay was optimized that way, because predicting the next interval to be 100 years later, was indeed the better prediction

bold terrace
#

I'd say, let's see already what the top 95th percentile has as a decay

unique salmon
#

Just guess
S=100 days, DR=95%, decay=-0.01

bold terrace
#

maybe we're arguing for 0.10 instead of 0.09 or 0.11

bold terrace
#

273 year ?

unique salmon
#

0.45 days

bold terrace
#

wait xD

#

Ah

#

WAit

#

Isn't maybe a sign that an exponential is not that great xD ?

unique salmon
#

At DR=99% and decay=-0.01 and S=100 days, the interval would be something like 0.0045 days

bold terrace
#

Strange that between [100,90] and [90, 0] you have a compressing/expanding effect on interval no ?

#

I mean, mathematically it make sense

#

but then it means people with low decay might just be people with DR<90%

#

and people with higher decay people with DR>90% ?

unique salmon
bold terrace
#

As long as the prediction are accurate 🤷

unique salmon
#

This is decay=-0.5 (purple) vs decay=-0.01 (green) at S=100 days

bold terrace
#

But once again, might be a sign that the prediction are good because it was training at that specific DR

#

Not because the forgetting curve is truly good

#

The optimized decay might just be a way to accomdate slightly the prediction around DR

#

but going from 90% DR to 60% or from 60% to 90% is asking for very bad prediction

unique salmon
#

It's only bad for people with REALLY low retention, like, 20-35%

#

This graph is kinda weird because people with really low retentions were combined into one bin and I'm not sure what their average retention within that bin is, I assume around 20%

unique salmon
#

(I think)

#

(Jarrett, is it fixed here?)

bold terrace
#

I think there are still a lot of stuff we don't understand behind short memory and long term memory relation 😅

unique salmon
#

Man, this is getting tiresome
@quasi shadow, my guy, can we just agree to make the limits of decay (-0.1, -0.8) and be done with it? 😅

bold terrace
#

Maybe it's something like "You have a certain level of recall probability in long term memory, and short term recall might both make you more able to recall it right now, as well as bumping SLIGHTLY your long term recall chance"

#

Which could explain that kindof "baseline" recall that people seems to never get below (~20-40% let's say) but they drop like crazy initially

#

and maybe thus multiple decay rate would be necessary laughcry

sick moth
bold terrace
#

One to represent the short term loss, one to represent the long term baseline

unique salmon
#

Actually, let's take it even further, like Alex
Let's calculate THREE stabilities for THREE curves with THREE different decays and then take their WEIGHTED average

#

Ngl, I actually kinda want to try that. It sounds horrible, but I want to see the metrics

bold terrace
#

Yeah the weigheted average would make more sense I think

cursive badge
#

At a certain point you just end up putting it all into the Memotron 9000 Neural Network

bold terrace
#

Short Term probability after 5d might be 0%
Mid term might be ~60%
Long Term might be 40%

A pure avg would account the 0% of short term as being as important as the other

bold terrace
#

Memotron is all fun until he decide to kill everyone

unique salmon
#

His unreleased neural net can achieve RMSE of around 1.4% and logloss of around 0.27, beating the hell out of everything you see here

lapis hearth
#

Holy shit I just realized the number of params

cursive badge
#

The downside of going full NN is that it becomes much more difficult to fine tune and fine tuning can break it entirely if done wrong.

bold terrace
#

Also, all of this is real fun but to be honest we tend to forget how much FSRS is already god tier

#

I mean since switching to DR=90% and spliting in Low/High D my deck

#

I have not a single day with a difference of more than 1.5% from my DR

#

Sure 1.3% would be better than 1.5%

#

But it's already completely god tier

#

Or how clean my average stability doesn't deviate from the trend

lapis hearth
bold terrace
#

sum(R*f(S)) same

cursive badge
#

It's kind of amazing that any of this works at all considering how little information we give the scheduling algorithms.

lapis hearth
#

There is so much information FSRS is missing out on (Time of Day, Sleeping Time, Answer Time, Contextual Content of the cards, Interference and Similarity of cards with other cards etc.. etc.)

bold terrace
unique salmon
#

Btw, his neural net uses answer time, deck ID, preset ID, sibling information and whatnot

bold terrace
#

Next graph I'd like to do is this one but with percentile on x-axis instead of actual repetitions count

#

Would be great to see that the 90-100th percentile represent ~20% of your workload

lapis hearth
lapis hearth
unique salmon
lapis hearth
#

Full blow Neural Network Mode

lapis hearth
unique salmon
#

And instead of optimizing it for each user and testing on the same user, it's just pre-trained on 5k users and tested on the other 5k

lapis hearth
#

So it could be even better than expected

unique salmon
#

So the optimization procedure is very different

unique salmon
lapis hearth
#

Should it not be trained and optimized on the same users

unique salmon
#

I'm saying that unlike FSRS, where you optimize parameters for each user, this one is trained on a massive dataset and then parameters are kept fixed
So there would be no "Optimize" if it was used in Anki

lapis hearth
#

Hmm

#

Dae wouldnt like this

unique salmon
#

I think FSRS-6 will be good enough that there won't be much of a reason to use a neural net

lapis hearth
#

When is it coming out presumably. I cannot wait

unique salmon
#

Oh, and Alex's net uses fractional interval lengths too

cursive badge
#

If it can inherently learn enough that it doesn't need fine-tuning it would be amazing. A lot fewer support requests if there are fewer dials to twiddle 😅

lapis hearth
unique salmon
#

So it kind of has a short-term memory model somewhere within it's matrices with tons of floating point numbers

lapis hearth
cosmic hedge
bold terrace
#

Speaking of which, I made some progress on the percentile x-axis 🙂

#

It's really nice because now each bar represent 5% of your card

#

and you see the total load ratio it represents

#

in my case, my last 5% represent 10% of my load

#

But the 5% between 60 and 65%, represent only 4%

bold terrace
#

no it's the same but it's still in a feature branch

robust hill
#

new name dropped for him

unique salmon
#

(aside from Dae saying "FSRS is good enough", which is quite likely)

polar maple
#

the problem is probably syncing issues which i don't understand fully

unique salmon
cursive badge
polar maple
polar maple
unique salmon
#

Surely there are no benefits beyond 2-4 curves

polar maple
polar maple
#

because now we can maybe later on interpret it as a probability distribution over stabilities

unique salmon
#

Oh, that's interesting

cursive badge
polar maple
#

also for cpu performance i expect maybe around 200 rows of the revlog / second, which is enough in the amortized sense imo but there could be other problems that i'm not aware of

cursive badge
#

Hmmm. That's a tricky one. My first thought is you could save "snapshots" at each sync so you only have to reset to the oldest common point, but that doesn't solve it completely. You could always have a rogue device that hasn't been synced for a while that forces you to go really far back in time.

lapis hearth
#

Is it even possible for Anki to even have a neural network. Does it work like FSRS, easy to run on basic consumer grade laptops

unique salmon
#

An example of a user with decay=-0.028 that Jarrett shared
That negative slope 🤣
FSRS's predictions are anti-correlated with his retention - the lower the value of R that FSRS predicts, the higher the user's retention

cursive badge
unique salmon
#

Distillation time! Just train a 10x smaller net on the big "teacher" net's predictions

lapis hearth
#

But that is alien to the concept of Anki

#

And you would exclude a lot of people

#

People who live in poorer countries

cursive badge
#

You would be surprised what you can get running on phones though. You can even run LLMs that give vaguely sensible output on phones now.

clever cargo
#

someone's going to provide anki as a service in that case (i guess that's already ankiweb xD)

lapis hearth
#

Is it because of the millions of params that this NN is tough on a device

clever cargo
#

even qt5 being dropped

lapis hearth
polar maple
lapis hearth
#

@polar maple would you want to personally use a NN on your own Anki cards

lapis hearth
#

If it is safe enough, I am all for it

polar maple
#

not wrong but we'll have to try it to see

unique salmon
#

Anyway, can we all just collectively convince Jarrett to clamp decay to (-0.15, -0.7) or (-0.15, -0.8)?

polar maple
lapis hearth
#

He said he would add optimizable decay didnt he

unique salmon
unique salmon
#

At 70% DR

polar maple
lapis hearth
#

But only the weak would choose DR at 70%

unique salmon
polar maple
#

if i input the 100 most common english words to anki and i want a 99% DR then i also expect an infinite interval 🤣

clever cargo
#

there's no short-term memory model, and there's no long-term dementia-or-death model either apparently

polar maple
#

anki now predicts your death

lapis hearth
polar maple
#

so we should get a good memory model first

unique salmon
#

CIVIL WAR

polar maple
#

that's what separates FSRS and SM-2 in the first place

polar maple
#

FSRS predicts R, SM-2 doesn't

#

so FSRS claims superiority

unique salmon
#

Sure, but at some point making R more accurate by a fraction of a percent at the cost of user experience is just a terrible trade-off

cursive badge
# polar maple anki now predicts your death

It turns out the only way to get a perfect scheduler is to first invent an oracle algorithm that perfectly simulates the future. All the "this is the day you die" stuff is just a bonus. ;p

unique salmon
#

And decay between 0 and -0.1 is exactly such case

bold terrace
#

sum load sry

polar maple
unique salmon
#

How? Please no "We predict R using one value of decay but use a different value for scheduling"

#

Capping max. interval? Then all intervals will just be equal to the max. interval

#

Capping the relative increase between two consecutive intervals? Same issue, though probably better in practice because it's harder for the user to spot

#

Maybe some combination of capping both interval lengths AND the relative increase. But then that could lead to TR not being equal to DR

#

Decay close to 0 just introduces intervals that are way too insane

unique salmon
#

S=1 day
Decay=-0.01, the first interval at DR=80% is around 130 000 days, the first interval at DR=70% is around 10^11 days
Decay=-0.025, the first interval at DR=80% is around 120 days, the first interval at DR=70% is around 25 000 days
Decay=-0.1, the first interval at DR=80% is around 4.5 days, the first interval at DR=70% is around 18 days
Decay=-0.15, the first interval at DR=80% is around 3.4 days, the first interval at DR=70% is around 9.5 days

polar maple
#

remember that predicted R does affect S updates so it is in our best interest to have it be as accurate as possible

hasty fractal
#

truth doesn't matter if people don't use anki and it's not like there's a global state that'll help us

polar maple
#

i just want the memory model and the scheduler to have separate responsibilities

#

don't lie in the memory model to get good scheduling, keep them separate

lapis hearth
unique salmon
# unique salmon > S=1 day > Decay=-0.01, the first interval at DR=80% is around 130 000 days, th...

S=1 day
Decay=-0.01, the first interval at DR=80% is around 130 000 days, the first interval at DR=70% is around 10^11 days
Decay=-0.025, the first interval at DR=80% is around 120 days, the first interval at DR=70% is around 25 000 days
Decay=-0.1, the first interval at DR=80% is around 4.5 days, the first interval at DR=70% is around 18 days
Decay=-0.15, the first interval at DR=80% is around 3.4 days, the first interval at DR=70% is around 9.5 days

Let's just vote based on this
@bold terrace @hasty fractal @polar maple @lapis hearth @cursive badge @cosmic hedge
I want to choose the limit of of the "decay" parameter in the upcoming FSRS-6. The closer it is to 0, the longer the intervals at DR<90%. I want you guys to vote on what the limit should be based on these examples

hasty fractal
bold terrace
polar maple
polar maple
#

how about 0.01 internally and 0.1 externally?

unique salmon
#

If you use a different value of decay, the other parameters will be sub-optimal

lapis hearth
unique salmon
#

You messed up the description of Decay =-0.0025 (and it's -0.025 btw), but oh well

lapis hearth
polar maple
unique salmon
#

Nothing like reviewing every card every day, lel

polar maple
cursive badge
#

I abstain because I've not been following the conversation close enough to make an informed choice.
Also isn't it suspected that at a certain stability memories enter another domain where they are effectively permanent. Hence that other algorithm that begins with S that Jarrett worked on.

polar maple
cursive badge
#

Some things might just be so easy to remember that they one-shot into permanent status and the tiny decays are just silly ways to try to model that.

lapis hearth
polar maple
#

@unique salmon btw how did you implement the regularization for decay?

unique salmon
#

I have an extremely dumb and janky idea:

  1. Optimize parameters
  2. If decay >-0.1 (aka <0.1 in absolute terms), optimize them again with decay=-0.1
  3. Keep both sets of parameters
  4. Use the first set with very small decay to schedule intervals
  5. Use the second set as a "sanity check": the intervals given by the first set of parameters AT ANY DR should not be shorter than with the second set at DR=99% AND they also should not be longer than with the second set at 70%

So when Anki calculates the interval length for a card, it checks this:
interval(params_2, S, 99%) <= interval(params_1, S, users_DR) <= interval(params_2, S, 70%)

polar maple
#

just add a scheduling layer to play nice to human values and output a smaller interval

lapis hearth
# polar maple not sure what you mean

What I mean is an interval could look very absurd to you when it is actually the truth. But then you want to end up choosing some decay value which makes intervals look better to the eye.

I am saying it should not come to this

polar maple
#

@unique salmon can you try with a much higher std?

unique salmon
polar maple
#

increase std to a very high value until you can see the decays are stuck at 0.2, then decrease it a bit to let it learn near the neighbourhood of [-0.1, etc]

unique salmon
cursive badge
polar maple
robust hill
#

im onn both sides at once

polar maple
lapis hearth
cosmic hedge
# unique salmon > S=1 day > Decay=-0.01, the first interval at DR=80% is around 130 000 days, th...
GitHub

candidate for FSRS-6
Log Loss: 0.3273 -> 0.3257 (-0.0016)
RMSE(bins): 0.0518 -> 0.0510 (-1.5%)
Model: FSRS-5-dev
Total number of users: 9999
Total number of reviews: 349923850
Weighte...

cursive badge
unique salmon
#

I wonder what's the ratio of "time spent arguing about the limits of decay" versus "time spent coding FSRS-6"

cursive badge
#

At that point it R kind of a lie anyway because we are operating outside of the bounds where it is valid 🤷‍♂️

polar maple
cosmic hedge
cursive badge
cosmic hedge
unique salmon
#

Well, it wouldn't be zero for Jarrett, since me and him have been arguing

polar maple
unique salmon
#

I'm not even joking, first interval for S=1 at decay=-0.01 is like 10^11 days at DR=70%

cosmic hedge
#

if id remember that card in 273 million years then yes

unique salmon
#

I don't think Jarrett has uploaded a file with metrics for all 10k users with optimizable decay, has he?

#

Actually, no, I think we need something a little different - find all users with decay >-0.1 and re-run the optimization for them with decay clamped to -0.1, and see how much worse the metrics become

cursive badge
polar maple
unique salmon
bold terrace
#

Also, for DR between 90% and 100%, decay of 0.01 will make it learn learn almost every day

#

So, all those risk of millions of year of interval, can be easily controlled by DR

#

If the guy has 10 million year of interval for DR=70% but only a few days for DR=90%, I don't think it's a big issue

#

It's just that yes, workload compared to DR won't be that easy to map anymore, but that's normal

unique salmon
unique salmon
#

It defeats the purpose of Anki

bold terrace
#

Maybe but that's why the fact it's controlled by DR is fine

#

To be honest, it depends a lot of the approach of the guy, but I think with a dynamic Decay, now everyone can be represented, so it's a big win

#

The people who want to never review anymore a card if they have at least 70% chance of recalling forever ? That's a win
THe people who want to review endlessly their card with a DR at 99% and a agressive decay ? That's a win
People who want sensible interval and control their workload with DR ? That's a win

unique salmon
#

Alright, on Github I told Jarrett that if he wants to, he can re-run FSRS-6 with opt. decay clamped to (-0.1, -0.8) or (-0.15, -0.8) and check how much worse metrics become

bold terrace
#

SELECT cid as card_id, id as review_time, ease as review_rating, type as review_state, time as review_duration FROM revlog ?

unique salmon
#

Anki nerds arguing whether some parameter in the Poopen-Farten algorithm should be 0.1234 or 0.1235

bold terrace
#

40min later the optimizer runs laughcry

#

0.2924

#

I guess that would be my new decay

#

For my normal D deck the result would be
"w": [0.1687, 1.1435, 3.1934, 20.4036, 7.2316, 0.5491, 2.0316, 0.0686, 1.3334, 0.1155, 0.8393, 1.8538, 0.1024, 0.3336, 2.3554, 0.1919, 3.0933, 0.7447, 0.3726, 0.079, 0.1328],

My current log loss being 0.3530 and RMSE 3.23, the optimizer tell me now :

Loss before training: 0.3686
Loss after training: 0.3654
Last rating = all
R-squared: 0.8470
MAE: 0.0077
ICI: 0.0064
E50: 0.0043
E90: 0.0154
EMax: 0.1520
RMSE(bins): 0.0257
AUC: 0.6197
#

RMSE from 3.23 to 0.0257 seems like a violent upgrade

#

Seems my decay for that deck is a .13 instead of the previous 0.20

#

Let's see on the hard now

#

"w": [0.0104, 0.0222, 0.0743, 0.0617, 7.766, 0.2282, 2.4887, 0.0302, 0.9422, 0.2648, 0.4128, 1.8164, 0.1254, 0.2906, 2.2589, 0.2292, 2.9629, 0.6093, 0.1445, 0.1923, 0.3794],

#

current logloss : 0.4395, RMSE:4.42%

Loss before training: 0.7136
Loss after training: 0.6013
Last rating = all
R-squared: 0.9718
MAE: 0.0177
ICI: 0.0129
E50: 0.0103
E90: 0.0233
EMax: 0.0565
RMSE(bins): 0.0470
AUC: 0.6616

Not much gain on that one, seem it's even worst 🤔

unique salmon
#

It's weird that there is such a big discrepancy in log-loss. Something's off

bold terrace
#

The 0.13 decay and 3.79 might make sense though since the first deck, I would highly doubt my retention would drop lower than 50-60% even if I was not reviewing them for multiple month

unique salmon
bold terrace
#

Yes sorry

#

but still violent

#

in the good sense

unique salmon
bold terrace
#

the left one is for my hard deck

#

right one for the normal/easy

unique salmon
#

Jesus that is dogshit calibration

#

On both

polar maple
bold terrace
#

Yes

polar maple
#

lol

bold terrace
#

I got ~.12 decay for my normal D deck, and ~.37 for my hard one

#

Thing is, I only review things at my DR

unique salmon
#

For your hard one there's almost no correlation between predicted retention and actual retention

bold terrace
#

Soooo I guess the "actual R" for everything outside 90% is dogshit 😄

polar maple
#

for the hard deck was it from taking lower D cards from the normal deck?

robust hill
#

its like

bold terrace
#

the spike at ~97% D

robust hill
#

every day the topic of discussion is changed so much

bold terrace
#

my Retention for the hard above

#

The one for the normal

#

I don't have to complain to be honest

#

But yeah those graphs are funky

polar maple
#

ok so i guess since D is closely related to the lapse ratio, it is already going to be flattened to be a certain R

bold terrace
#

Yep

#

That's why I also think some clustering could be interesting

#

there's really different profiles of card/review story inside the same deck

unique salmon
robust hill
#

yk what i vote for .001 decay

bold terrace
#

Last test, I'll run on both

#
Paste this into your scheduling code
{
    // Generated, Optimized anki deck settings
    "deckName": "revlog.yomitan.both",// PLEASE CHANGE THIS TO THE DECKS PROPER NAME
    "w": [0.0564, 0.3174, 2.3289, 17.0138, 6.991, 0.8772, 2.3117, 0.001, 1.1084, 0.1602, 0.6028, 1.7299, 0.122, 0.2495, 2.1961, 0.1854, 3.1603, 0.7786, 0.3116, 0.1502, 0.3762],
    "requestRetention": 0.7,
    "maximumInterval": 36500,
},

Loss before training: 0.5829
Loss after training: 0.5331
Last rating = all
R-squared: 0.9567
MAE: 0.0181
ICI: 0.0109
E50: 0.0087
E90: 0.0280
EMax: 0.0573
RMSE(bins): 0.0332
AUC: 0.6626

From logloss .4225 and RMSE .0344

#

Dogshit version 2

#

😄

unique salmon
#

Congrats, now there is negative correlation between FSRS predictions and real retention

polar maple
#

on fsrs-optimizer does it use a train/test split?

#

i don't see how the calibration can be so bad if it is trained and evaluated on the same data

bold terrace
#

To be fair I have almost no card with Predicted R under 80%

unique salmon
unique salmon
#

In Anki suspended cards are excluded by default, but in the google colab optimizer it's the opposite

#

That could explain the difference in log-loss

#

Btw, this is the hardest deck I have and it has reasonable calibration

#

Not great, but at least somewhat reasonable

bold terrace
#

The distribution of your predicted R also look better

#

But for example I can't really having review with predicted 0.2-0.6

#

My DR was at 80-90 and I never skip any day so having a 40% is quite unlikely

unique salmon
bold terrace
#

Sure Sure

#

With my Filtered Deck I also think I'm able to really squeeze the predcited R close to the DR

#

which can also explain that distribution

unique salmon
#

I really want you to try not using filtered decks in whatever version will Anki will have FSRS-6 + fine-tuned LB
Fine-tuned LB is guaranteed to make it into the next release, idk about FSRS-6

bold terrace
#

Well my workflow is quite simple, I have one Filtered Deck for R<DR, and I keep checking the ratio it represents

Interestingly, moving from 85% to 90% DR made the number of items scheduled by the Filtered Deck lower than before

#

I also have higher and higher stability those past weeks so I think it also plays a role

#

The future avg predicited R is also closer to DR, thus limiting the need of those Filtered Reviews

#

Still, this is without LB

#

So I get it's the Fuzz that still push the due date a bit further than what they should

unique salmon
#

I increased the weight of interval lengths in the fuzz formula, making it more likely to schedule cards earlier

#

So in the next Anki release LB will be better

bold terrace
#

Does it also affect the fuzz or only the LB ?

unique salmon
#

...unless the simulations are very inaccurate

unique salmon
#

Just different names

#

LB is more appropriate

#

LB is just "fuzz that chooses the random interval in a less random way"

bold terrace
bold terrace
unique salmon
#

Ah, idk

#

I don't know how that command works

bold terrace
#

The Fuzz apparently is there in Anki for years now

unique salmon
#

Yes

#

LB is new, fancier fuzz

#

I didn't know you can disable LB but still have the old fuzz

#

You might be the only person on the planet using it 🤣

bold terrace
#

Probably 😛

bold terrace
#

Load by Lapse 20-quantile 🙂 Definitely a bit more gradual than Load by Reps 20-quantile

ashen light
bold terrace
#

I said so many times flagging a leech based on lapse was dumb

#

But now I realize it was my statement that was dumb

#

I check other deck for another language, same tendency

quasi shadow
#

The preliminary result:

#
Model: FSRS-6-dev
Total number of users: 844
Total number of reviews: 27826685
Weighted average by reviews:
FSRS-6-dev LogLoss (mean±std): 0.3346±0.1594
FSRS-6-dev RMSE(bins) (mean±std): 0.0491±0.0330
FSRS-6-dev AUC (mean±std): 0.7109±0.0790

Weighted average by log(reviews):
FSRS-6-dev LogLoss (mean±std): 0.3557±0.1665
FSRS-6-dev RMSE(bins) (mean±std): 0.0652±0.0432
FSRS-6-dev AUC (mean±std): 0.7056±0.0874

Weighted average by users:
FSRS-6-dev LogLoss (mean±std): 0.3583±0.1680
FSRS-6-dev RMSE(bins) (mean±std): 0.0675±0.0444
FSRS-6-dev AUC (mean±std): 0.7048±0.0895

parameters: [0.20255, 1.1585, 2.8436, 15.9828, 6.96915, 0.562, 2.2429, 0.00835, 1.51745, 0.11915, 1.0329, 1.7994, 0.11795, 0.2945, 2.28385, 0.21265, 3.00505, 0.7968, 0.29115, 0.14205, 0.204]

Model: FSRS-6
Total number of users: 844
Total number of reviews: 27826685
Weighted average by reviews:
FSRS-6 LogLoss (mean±std): 0.3342±0.1593
FSRS-6 RMSE(bins) (mean±std): 0.0486±0.0327
FSRS-6 AUC (mean±std): 0.7103±0.0806

Weighted average by log(reviews):
FSRS-6 LogLoss (mean±std): 0.3552±0.1667
FSRS-6 RMSE(bins) (mean±std): 0.0646±0.0430
FSRS-6 AUC (mean±std): 0.7050±0.0885

Weighted average by users:
FSRS-6 LogLoss (mean±std): 0.3578±0.1682
FSRS-6 RMSE(bins) (mean±std): 0.0669±0.0441
FSRS-6 AUC (mean±std): 0.7042±0.0906

parameters: [0.19025, 1.1416, 2.84035, 16.0223, 6.96865, 0.56225, 2.24175, 0.00775, 1.52485, 0.11935, 1.0378, 1.79665, 0.11955, 0.2907, 2.27985, 0.2125, 3.00505, 0.81515, 0.28365, 0.13125, 0.2077]
#

The clipper I apply to FSRS-6-dev is w[20] = w[20].clamp(0.15, 0.8).

bold terrace
#

IMO I find it too arbitrary to clamp just based on what we feel should be right or not

quasi shadow
#

It's ~1% worse than (0.01, 1.0)

bold terrace
#

If we really want to clamp, we could always use the 2th/98th percentile of the training set

bold terrace
#

Feels random but quite good actually 😅

quasi shadow
#

2% percentile of decay values: 0.0347
98% percentile of decay values: 0.7270

bold terrace
#

At least we can assume that 96% of people will fit in that clamp 🙂

#

And the 4% we can just ask them to reflect on how they use the algorithm haha

quasi shadow
#

If the I use (0.1, 0.8) as the clipper:

Model: FSRS-6-dev
Total number of users: 876
Total number of reviews: 28673715
Weighted average by reviews:
FSRS-6-dev LogLoss (mean±std): 0.3339±0.1604
FSRS-6-dev RMSE(bins) (mean±std): 0.0486±0.0325
FSRS-6-dev AUC (mean±std): 0.7101±0.0785

Weighted average by log(reviews):
FSRS-6-dev LogLoss (mean±std): 0.3557±0.1667
FSRS-6-dev RMSE(bins) (mean±std): 0.0646±0.0426
FSRS-6-dev AUC (mean±std): 0.7053±0.0869

Weighted average by users:
FSRS-6-dev LogLoss (mean±std): 0.3582±0.1681
FSRS-6-dev RMSE(bins) (mean±std): 0.0669±0.0437
FSRS-6-dev AUC (mean±std): 0.7046±0.0891

parameters: [0.19415, 1.13795, 2.8374, 15.98545, 6.9694, 0.56155, 2.2378, 0.00775, 1.51735, 0.11995, 1.0336, 1.799, 0.1187, 0.29145, 2.28435, 0.2106, 3.0051, 0.81215, 0.28495, 0.1352, 0.2056]

Model: FSRS-6
Total number of users: 876
Total number of reviews: 28673715
Weighted average by reviews:
FSRS-6 LogLoss (mean±std): 0.3338±0.1605
FSRS-6 RMSE(bins) (mean±std): 0.0485±0.0325
FSRS-6 AUC (mean±std): 0.7091±0.0804

Weighted average by log(reviews):
FSRS-6 LogLoss (mean±std): 0.3556±0.1670
FSRS-6 RMSE(bins) (mean±std): 0.0645±0.0427
FSRS-6 AUC (mean±std): 0.7046±0.0881

Weighted average by users:
FSRS-6 LogLoss (mean±std): 0.3581±0.1684
FSRS-6 RMSE(bins) (mean±std): 0.0668±0.0438
FSRS-6 AUC (mean±std): 0.7039±0.0902

parameters: [0.1933, 1.1416, 2.84035, 16.0035, 6.9689, 0.5619, 2.2396, 0.0077, 1.5194, 0.1196, 1.03675, 1.79805, 0.1194, 0.2913, 2.28145, 0.2105, 3.0053, 0.8154, 0.2847, 0.1302, 0.2079]
#

It's only 0.2% worse.

#

OK, let's use it.

#

@polar maple @unique salmon The Civil War of Decay has its ending!

#

😂 In this week, I have run a dozen of benchmarks.

bold terrace
#

Computer goes brrr

#

Btw I played a bit with fsrs-optimizer yesterday, I tried to run it on my "normal D" deck (~ low lapse), "high D" (higher lapse count), and on both aggregate. I got as decays 0.1328, 0.3794 and on both 0.3762

I get more and more the feeling those past weeks that behind a user or even a deck, there might be multiple population of cards/review.

Thing that right now is somewhat handle with D, but since we can see even the decay could be very different based on which population we're in, wouldn't make a sense to try to see how to cluster the reviews and having different sets of parameters for different populations ?

quasi shadow
#

So, the heterogeneity is still very high.

bold terrace
#

Sure but look at D and how for many people it is a proxy for "Lapse" (which can be infered from the reviews alone)

Also, isn't it possible to run a first optimization, and based on Difficulty to then cluster it and run 2nd-layer optimization on each ?

#

(But I agree that then, implementing that in Anki would be difficult, having parameters not really based on deck but attached to cards, based on a population-id..)

unique salmon
#

@quasi shadow how many users (%) have decay between 0.1 and 0.8?

#

I assume something like 80%?

quasi shadow
#

Number of users with decay between 0.1 and 0.8: 8074
Percentage of users with decay between 0.1 and 0.8: 80.75%

bold terrace
#

But if [.1, .8] is only 0.2% worse ... I mean ... being in that 20% (1-80.75%) is probably fine

#

We could argue that for those 20%, the prediction will be worst than what they could, but I guess they'll already be way better than before

unique salmon
#

http://www.incompleteideas.net/IncIdeas/BitterLesson.html
TLDR: forget about carefully crafted rule-based models that utilize human knowledge, just use general-purpose models AND LOTS OF COMPUTE and get better results
Chess? Just use a lot of compute and a general-purpose model
Go? Just use a lot of compute and a general-purpose model
Image recognition and speech recognition? Just use a lot of compute and a general-purpose model

I like this article because right now we have a crystal clear example of it: a neural net outperforming the carefully crafted FSRS with its simple formulas based on our understanding of human memory. If all we want is maximum predictive accuracy, making a giant neural net and just taking advantage of more compute would be a better approach

quasi shadow
unique salmon
quasi shadow
#

so why not implement it in Anki?

unique salmon
#

Ask Alex

quasi shadow
#

@polar maple is there any problem to implement it in Anki?

cursive badge
# quasi shadow <@142448513622605824> is there any problem to implement it in Anki?

I don't know that it is the only problem, but from the discussion I had with them yesterday speed / sync is one problem.
It doesn't run fast enough to just give it the entire revlog each time you start Anki (~200 reviews/s) so you need to cache the NN internal state.
Caching the internal state causes sync problems when you want to merge non-linear revlogs.

robust hill
#

we must carb load jessie

#

to increase our retention

quasi shadow
#

derpy So FSRS will survive.

lapis hearth
unique salmon
lapis hearth
#

FSRS or neural nets

unique salmon
#

FSRS

slim hollow
#

the NN nets currently used are super small (thousands), if you talk about big neural nets they go into bilions of parameters

unique salmon
#

The largest neural net in the srs-benchmark repo has 9k params, but Alex has another with 2.7 million params that he hasn't released yet. It blows all other algorithms out of the water, according to his preliminary tests

cursive badge
#

llama 4 behemoth apparently has 2 trillion parameters 😮

slim hollow
#

a 2.7m network should be able to run on cpu at ok speed depending on architecture, but anki can currently run on pretty much anything

bold terrace
#

Don't know, sure any kind of algorithm that is able to "learn by itself" is impressive, but I also noticed how, difficult they might be to actually make better without become hugely inefficient in terms of energy, and how the black-box aspect of it make it difficult as a dev to make a good feedback loop with them (Train them, test them, improve their weakness...)

#

But it's still a very useful tool in the toolkit

unique salmon
#

Same goes for image generation

#

And image recognition too

slim hollow
#

what NN are good at is noticing patterns that can be non linear which is hard to achieve if you want to model things like FSRS

bold terrace
unique salmon
#

That's a strange question. I was talking about generating realistic, human-like text

#

You are asking a completely different question

#

Oh, btw, this Veritasium video is also about the "bitter lesson", even if he doesn't say it that way
https://youtu.be/P_fHJIYENdI
Neural nets outperformed algorithms made by expert biologists

The biggest problems in the world might be solved by tiny molecules unlocked using AI. Take your big idea online today with https://ve42.co/hostinger - code VE at checkout.

A huge thank you to John Jumper and Kathryn Tunyasuvunakool at Google Deepmind; and to David Baker and the Institute for Protein Design at the University of Washington for t...

▶ Play video
bold terrace
#

I mean I work in software development for the past 10 years, went the computer science route, did a few projects on AI, a project on computer vision, and while of course AI is a super super super great tool, I still felt regularly the 2 limitations I explained above

bold terrace
#

Which is not necessarly true

#

For problems fundamentally related to probability, with different layers of fact-checking the results, it can be immensely useful

bold terrace
#

But for problem requiring a very specific solution, it fall a bit short.

#

I even find it very funny how the most common example of job that could be replaced by AI would be software development, when in fact I think it's probably all the others jobs of this industry (project management, analyst, manager...) that could more easily be

unique salmon
#

We don't have an AI that can do all of that and more and replace all humans...yet 🙂

lapis hearth
unique salmon
lapis hearth
#

Yes

cursive badge
lapis hearth
#

Not necessarily Alex's

bold terrace
#

But hey, I spent already 3-4y in that industry with everyone explaining how blockchain would change absolutely everyting in the society

#

I guess the next 2-3y will be AI hype laughcry

cursive badge
#

Blockchain was just entirely dumb from the start though.

#

LLMs at least do something useful.

bold terrace
#

Well, it's still a tool in the toolkit, but the problem it could solve were indeed a bit too much specific to really be a broad revolution

unique salmon
# bold terrace I guess the next 2-3y will be AI hype <:laughcry:1018614934386524300>

Oh come on. I get that people love saying "X is a bubble" and "X is just a trend, it's gotta end", but this is AI we're talking about. The only way it won't be a revolutionary technology is if achieving general intelligence is - for whatever reason - so incredibly difficult that it will take 1000+ years and in the meantime we will just have ChatGPT-7-Pro or something

cursive badge
#

Blockchain has such vanishingly narrow use cases. Even its big initial example of "decentralised money" never really worked, it was far too unstable to be used as cash. It was so weird seeing so many people trying to shoehorn it into completely irrelevant things.

unique salmon
#

And for the record, I don't think that making AGI is going to take a 1000 years

#

Or even 100 for that matter

#

100 starting from today, I mean

bold terrace
#

Why people associate LLM and AGI though

#

We're talking AI-hype, LLM replacing humans

#

you talk about AGI

#

I mean, doing prolog was considered AI at some point

#

When I did AI, I did Alpha-Beta/Min-Max

#

The current AI stuff was called Datamining in my classes back then

#

AGI has never had anything to do with all those things

unique salmon
#

If you think modern LLMs will never be generally intelligent (fair enough, btw, I won't argue with that), do you envision the future like this?

  1. LLMs plateau around 2028-2030 when the current "just throw in more compute" paradigm runs out of gas as it's physically impossible to build bigger datacenters and produce more chips to train larger models + the entire Internet is used for training, so there is no more unused training data
  2. Instead of the paradigm shifting to something else, all of the progress just dies out and then there are no interesting news about AI for decades
#

Because I ABSOLUTELY do not think that number 2 will happen

bold terrace
#

I'm definitely more on option 1. AI/LLM will continue to exist, will continue to solve very very interesting problem with a way nothing else can solve right now, but unfortunately most startup based on it will die out, big companies will find something else to promote to make investors excited

cursive badge
#

It might not be no interesting news, but there could be a gap before anything new that gets people widely excited comes out.

bold terrace
#

Porn Industry might be an exception though

#

I mean Zuck' spent I don't know how much million in VR

cursive badge
#

I was really excited about AlphaGo. I don't think normal people were 😂

unique salmon
bold terrace
#

2 doesn't have to come after 1

#

I don't see why the progress would have to die

unique salmon
#

I thought you will say "1 and 2"

#

Like, I thought you will say "Yes, this is what I imagine"

unique salmon
#

I thought you will say "Yes, I imagine that the progress will halt for decades"

cursive badge
bold terrace
#

recently it's the whole "vibe coding" that everyone talks about

cursive badge
#

My biggest personal gripe is AI hype made all the GPUs cost silly money 😦

bold terrace
#

When I see that nvidia box at 3000$, even as a skeptical about AI and coding, I almost pull the trigger laughcry

#

My Github Copilot right now goes "Enable/Disable" every 10min, it's kinda maniac

#

So when my CEO say "AI will replaces Devs" I'm like "I WISH IT WOULD"

#

I mean, making me saving time

#

But I guess he has better insights than poor me haha

cursive badge
bold terrace
#

lol ! I bought a 4070 Ti S one year ago

#

I'm happy now when I see the 5070 is basically a worst model

#

I also saw on reddit a lot of people are very very disappointed with the latest Llama scout

#

maybe you'll be able to buy a new GPU soon ;D

unique salmon
#

We can already make AI that talks in a human-like way and, by some metrics, even outperforms humans. For example, frontier LLMs definitely know more simple facts like "When was Shakespeare born?" than the average person, and are better at solving math problems than the average person. So can we get to general intelligence via more compute, more training data, and some incremental improvements to the Transformer architecture? Or do we need some special sauce?

In the straight-line-goes-brrrr world we just need to work on how we train AI, scale it up even more, and tweak the Transformer architecture. And then we get AGI.

In the secret-sauce world, ChatGPT-7.5.5-Pro-Ulta will be better at answering PhD-level questions than any human alive, yet will be unemployable as a software engineer, let alone as a movie director or a CEO. And things will remain that way for who knows how long.

So the crux is: how straightforward is the path from the current AI (which, again, is in some sense already superhuman, compared to an average Joe) to AGI that is actually undeniably superhuman at everything?

bold terrace
#

Replace "Bruno" by "AGI" laughcry

#

But #off-topic anyway I guess

quiet saddle
# quiet saddle I have a question about the parameters I use for FSRS: When I switched to FSRS I...

Just to follow-up on that question I asked yesterday: since I have a specific tag for the card I suspend because they became leeches and I'm pretty sure they're just difficult cards and not badly design cards, I had the option to add those specific cards for the FSRS optimization field. I did that, and rescheduled all cards, and that added ~1000 cards to my backlog.

Today I followed my usual procedure to reduce that backlog, reviewing by decreasing retrievability. It's a little to soon to draw conclusion, but I was surprise by the feeling that many of those cards were on the "edge of being forgotten". Also the scheduler is less optimistic with new cards introduced today, which seems good since it's difficult to judge the difficulty of a card with only one review.

#

So, I'm even more convinced that the usual advice "suspend the leeches" and "don't use the suspended card for the optimizer" are good advice in isolation but don't go well together.

robust hill
#

interesting

#

i have never thought about this

#

what if i just make it so fsrs doesnt use leeches for the optimizer

bold terrace
#

Yeah very interesting, I also wondered about that

robust hill
#

well i already kinda split the leeches out in all my decks

#

probs shouldve left a control deck

bold terrace
#

To optimize WITH suspended, you change that to preset:"..." then ?

unique salmon
bold terrace
#

OKok

#

Something I also wonder a bit, is when a card was suspended, but now you want to give it a new try, ideally you'd like to reset it, but by reseting it I'm not entirely sure the past revlog will be used or not

#

the counter on the browse view say 0 reps, 0 lapse

#

but in the history you still see the reviews

#

so not sure how they get taken into account or not

robust hill
#

i am a bit confused

#

how do i get an extra 2,000 reviews

unique salmon
bold terrace
#

the suspended I guess no ?

bold terrace
robust hill
#

alright number adds up

#

you are right

#

very interesting scheme i have here

bold terrace
#

True that the default might be "Let's consider the suspended with those"

#

Sure you don't review them anymore, but they're still part of your well or not you review things

robust hill
#

very interesting

unique salmon
#

@quasi shadow once again: how does FSRS works with "Reset"? Does it only use the info after the card has been reset?
I promise I will make a card so that I don't ask again 🤣

robust hill
#

why due so far away

#

if so difficult

#

dr is 92%

bold terrace
#

Gimme a sec to think the shortest way to explain it laughcry

#

Basically, most of the time :

  • D : Lapse in disguise
  • D : goes up, D never goes down.
  • Splitting that deck into 2, one with "High D" and one with "Normal D" could benefit your parameters
#

For example, my "normal D" has very good logloss/rmse with even default FSRS parameters

robust hill
#

based on that screenshot how should i divide it

#

💀

#

i got like 3 sky highers and 4 wide ones

#

this is language learning

#

yk i probably neeed to divide them depending on the back to front

#

my brains gonna kaboom

bold terrace
#

What I did, is I did "prop:d" > 0.80 and played with it to see where I had relatively a good chunk in both (like half half), and that it was clear that NO cards with prop:d>0.80 had lapse under X (5, 6...)

#

At the end I did prop:d>0.9 in my case

#

but looking at you I think the 0.80 -> 1 might make more sense

#

Now in my "normal D" I have a lapse threshold of 6-7, and in my "high D" at 12-14. If I reach the first, I tag them and weekly I move them to the High D

#

in High D, at 12-14, it's auto-suspend

quiet saddle
#

I'm not sure I understand what you're doing with you decks @robust hill and @bold terrace, are you making subdecks depending on the difficulty of the cards?

bold terrace
bold terrace
#

The previous one become my "normal difficulty", and the new one gets all the difficult one

#

The RMSE and workload of the normal one was hugely improved

quiet saddle
#

but what happens for new cards then?

bold terrace
#

New Card still in the normal D

#

after 5-6 lapse they go in the difficult one

#

(as they would in the previous deck)

quiet saddle
#

so FSRS will be too optimistic for new cards, I don't see this as an improvement

bold terrace
#

Well, workload wise it's been a blessing

#

and my R is still at 90%

quasi shadow
bold terrace
robust hill
#

maybe it wont work for some decks

#

but i have a finished deck

#

i have a deck with 1156 cards, no longer new

bold terrace
#

In fact there is no many other options than :

  • Be too optimistic about new card
  • Be too pessimistic about new card
  • Be a bit of both
unique salmon
# quasi shadow It only uses the reviews after the reset.

I need some more specific info
Imagine a card with a history like this:
L R R | L R R
where L - "Learn", R - "Review" and | means "This is where the reset happened"

  1. Does FSRS only use the second half for optimization?
  2. Does FSRS only use the second half for scheduling?
quiet saddle
bold terrace
#

The Low->High D is quite clear

quasi shadow
quasi shadow
#

Yes

quiet saddle
robust hill
#

which language are you learning

quiet saddle
#

Korean

robust hill
#

i see

#

not sure if my advice would work

#

but at the moment i am splitting my decks into 2 ways

bold terrace
robust hill
#

I am learning Greek, so
1 deck with options that encompass anything that the question is English, and I have to say it in Greek
another deck with options that encompass the reverse, so question is Greek and i have to say it in English

bold terrace
#

The main initial motivation in my case was the fact I realized the "average stability by repetition" was a purely decreasing function

#

The more I reviewed card, the less their interval seems to be

#

So having more and more workload, didn't resulted really in better stability

quiet saddle
#

also I'm probably on the ADHD spectrum, so I need the scheduler to be a little bit pessimistic 😄

bold terrace
#

Just the same stability with higher workload

#

On the opposite, the card with long interval, had all at most 1-2 lapse

bold terrace
#

and a very few number of reps, something like 10-20

#

So while mass-repetition feels like "you don't allow it to be forgotten", I was in fact hiding the fact that I wasn't really helping them building higher stability

robust hill
#

4 deck options i am making

bold terrace
#

Question being : Why didn't they ? A lot of different factors, but it's not reping them every day that will help

robust hill
#

English -> Greek
English -> Greek - leeches
Greek -> English
Greek -> English - Leeches

#

🔥

quiet saddle
# robust hill English -> Greek English -> Greek - leeches Greek -> English Greek -> English - ...

I have:

  • Vocabulary: simple words, audio+written word in Korean-> French definition / audio+hint if needed->writting + French definition
  • Sentences or collocations: various note types including close, dictation, French->Korean for basic greattings, ..
    This second deck as 3 levels of priority: Essential, Normal, Optional

Leeches stay in their decks, suspended until I see them again in the wild and/or decide to try to learn them again

bold terrace
#

French is your mothertongue ? It's mine

quiet saddle
bold terrace
#

Front back for me 🙂 The reading is not shown in the preview, but I have to type it in the front card and it's highlighting mistake in the back

quiet saddle
bold terrace
#

I removed every single thing from the front because otherwise my brain would memorize words by silly things like the sentence shown, the color of the hint, etc etc

unique salmon
lapis hearth
# lapis hearth
poll_question_text

Vote you neeks

victor_answer_votes

2

total_votes

3

victor_answer_id

1

victor_answer_text

Decay -0.01 --> 130000 days at DR =80%

robust hill
#

this is not a good voting strategy

#

i voted for .001

unique salmon
#

Whatever, Jarrett already agreed to make it -0.1

#

So now nobody will ever get a first interval of a million years, yay!

polar maple
#

but idk how cpu friendly the training would be for a small nn, would need to investigate

polar maple
unique salmon
cursive badge
#

At 200 reviews/s RWKV would take ~6.5 mins to process my collection if it had to discard its cache. That's kind of a blocker if sync invalidating the cache cannot be worked around.

polar maple
#

or as i mentioned before, training a robust version of the nn by mangling the revlogs that it is trained on

polar maple
cursive badge
#

👍
I meant in response to Expertium that a small loss in accuracy may be worth it for a large gain in convenience.

unique salmon
#

The more I think about it, the more I think it's actually very desirable

  1. We can make R more accurate
  2. We won't have to show parameters, which means one less thing for users to worry about
  3. We can support proper same-day scheduling instead of the current mess
  4. We can throw in new input features, like time of the day, workload, etc. Not just interval lengths and grades
  5. We can remove "Optimize", which means even less stuff for users to worry about
robust hill
#

NO

#

do NOT remove optimize

cursive badge
robust hill
#

oh

ashen light
#

I would still like the optimize button to be there, if only as a placebo

bold terrace
#

Yeaaaaah I mean

#

People are worrying because they feel losing control with FSRS

#

It's not the existence of parameters that stress, it's the fact you don't understand them

#

so giving them 9000 params ?

#

See how people freak out about hard misuse

#

let's now make them stress about NN thinking on saturday every retention is -10% because they were drunk 2 saturdays in a row laughcry

#

So while I would be super super excited to try that NN

#

I don't think they will be less stressed lol

unique salmon
#

Hard misuse would still be a problem btw

#

It can't be solved "inside" the algorithm

bold terrace
#

but but but but

#

If NN has no rule about "Hard" being a good answer

unique salmon
#

Though, I imagine that for people who misuse Hard a NN would still do better than FSRS

bold terrace
#

it could theoritically infer that for some user, Hard might result in a reduced stability ?

#

just like an Again

#

You could almost create new buttons and have your own rules about them ;D

#

"Don't remember" "Misspelled" "Confused"

unique salmon
bold terrace
#

ah yeah

unique salmon
#

If Hard=1 but the user uses it as if it was 1, welp...

#

Actually, now I'm really curious if a NN would be better than FSRS in that case
Then again, how do you determine what is "better" if you can't tell corrupted labels apart from good ones?

bold terrace
#

I mean

#

sometimes I watch for a few seconds too long a video of Sabrina Carpenter

#

And then Facebook decide I'm her biggest fan

#

and I should have every single ads about her

ashen light
#

I will continue to advocate for an "almost" button that is just the again button (maybe +5minutes on the relearn step so people feel like its different)

bold terrace
#

But then it realize I'm infact a bit more inclined to watch videos about videogames, so it adapts again

#

Personally I use hard/good/easy based on speed of good answer

#

quick quick good answer, 1-2s -> easy

#

3-4 -> good

#

7 hard

#

who cares about 5-6

unique salmon
#

Ok, after thinking about it some more, I have no idea whether we would even be able to tell whether FSRS or a NN is better for people who misuse Hard, if we can't un-corrupt the labels aka if we can't confidently say "Here Hard=1 and here Hard=0"

bold terrace
#

Well at least it's the goal

#

In practice it's Good-Good-Good-Oshit

#

IMO who cares about misuse

#

it will fix itself with time

unique salmon
ashen light
bold terrace
#

I think I might have misused hard for maybe ˜500 reviews when I started Anki

#

I stopped worrying after the 50k one

bold terrace
ashen light
#

I have no idea who that is

bold terrace
#

Lucky guy

#

I didn't too, and now even fortnite ads are about her

#

she's like the female equivalent of bieber

south lodge
unique salmon
#

If you mean that people will hear about it from other people and be like "Oh, wow, I didn't know this was a problem, thanks my dude, you saved my Anki life!", I'm afraid that this will be the minority, and most people people who misuse Hard will keep misusing it

#

Not everyone browses r/Anki or watched youtube videos about Anki or whatever

ashen light
#

people don't want to fail so they need an almost-fail button

bold terrace
#

The most biased poll of all your history

bold terrace
unique salmon
#

Let's ask people on r/Anki whether they browse r/Anki

ashen light
#

placebo buttons are important I think

bold terrace
#

TBH

#

most normal people use Anki for 2 days and then never use it anymore

ashen light
#

yeah see everyone has those buttons

bold terrace
#

I know 3 colleagues that tried Anki. NEver more than for 2 days

ashen light
bold terrace
#

Main reason has nothing to do with scheduling, it's the clunky UI

#

They were amazed I had images, sound, I could type answer ...

ashen light
#

like I have a friend who married some chinese girl and got anki (of his own motivation) to learn chinese and spent maybe like a week on it before giving up

bold terrace
#

I mean, this is the first image you get from google image

unique salmon
#

Except for that one reddit guy who quit Anki because of interval lengths

bold terrace
#

I see that screenshot

#

FIrst thing I wonder is if I need a compatibility mode for windows 95 to run it

#

or some kind of DOS emulator

unique salmon
#

Lol
The article with that image is from 2015

ashen light
bold terrace
ashen light
#

honestly, any anki reddit complaint should be treated as an outlier

unique salmon
#

Did Anki really look like this in 2015? Or is the article using an even older screenshot?

ashen light
#

yeah I think it looked like that back then

bold terrace
#

The most good linking search tool ever existed

#

People on that page are sure to not download any of those deck, in fear to get russian malware

#

Look how user friendly it is to tweak your own cards

#

Even as a dev it took me 2-3 months before daring trying to do something myself

#

"Manage Note Type" > "Create Field"

#

So when we're talking "the average user", we should not imagine an "average normal human being"

cursive badge
#

Dae is interested in making a sharable template system so it might be easier one day. He wants to do it after Svelte migration so who knows when it will actually be started on though.

unique salmon
#

Oh, yeah, and apparently we're not actually getting a two-button mode any time soon

south lodge
# unique salmon ?

I guess what I'm thinking is something like:
Wouldn't it be nice if:

  • Users had a slider to decide their own interval length when grading
  • A dataset of many such gradings for specific cards existed
  • The model only needed to exist for those specific cards
unique salmon
south lodge
#

That would be for the training input, the later implementation would be just a 'okay I saw the card' 'next'

unique salmon
#

I still don't see how this would be beneficial

#

And it sounds impractical as heck

#

Ideally, we want to make it so that users don't have to think about intervals at all

south lodge
#

Yes (the end user would have only one button, next)

cursive badge
#

People are probably terrible at guessing intervals.

south lodge
#

True

unique salmon
#

Training a model to predict what intervals the user wants is an interesting idea, but most people probably want constant intervals and/or very short intervals

#

I imagine if you took a bunch of people and asked them to do this for a year, most of them would end up with intervals that are either constant or grow very slowly

cursive badge
#

I did previously think it might be interesting to let people grade cards on more than one axis to give the scheduler more info, but it would be terribly impractical and bad UX.

unique salmon
#

Though, it would be interesting just for research purposes, to see what intervals people like

#

Maybe it could help to pick better default FSRS parameters and a better default value of desired retention

south lodge
cursive badge
#

Anki currently records the time from first showing the card to you giving it a grade. I think ideally we would also have time to first input (for typed answers) and time to answer/card flip.

lapis hearth
#

Hey guys

#

Dont know if this is the appropriate time to mention this

#

When is it time that the R=100 for learning cards be changed

#

Should this not be also changed with FSRS 6

quasi shadow
#

R=100?

#

Do you mean the R column in the card browser?

lapis hearth
#

R is 100% every time for learning cards no matter what

#

which is simply not accurate

#

I know that Anki did it in the earlier days as a simplifiying method

#

There was no need for Anki to register it because there was no FSRS back then

#

But with the new decay thing, would it not be worth the short to try and see if it is advantageos

quasi shadow
#

The decay thing is still a long-term thing.

lapis hearth
quasi shadow
#

The short-term memory model is still inaccurate.

lapis hearth
#

Oh well....here is to waiting for however long...

lapis hearth
quasi shadow
#

I have caught a dozen of bugs since the last weekend.

unique salmon
lapis hearth
#

I am aware

#

But how long would it take to find a short-term memory model

#

Weeks, Fortnights, Months, Quartals, Tertials, Years..

#

Idk

unique salmon
#

Once I'm done with experimenting with D, I'll see if I can use a neural net for this

unique salmon
#

Actually, nvm. The idea with neural D didn't work. I was getting really shitty results and tried a bunch of things, but nothing worked. And on top of getting shit results, now I'm also getting errors sometimes

unique salmon
lapis hearth
#

Does this mean you will start experimenting with NN on short term memory

unique salmon
#

I could, but meh

#

It's probably not gonna work

bold terrace
unique salmon
# bold terrace What was the procedure ? You give the NN a shitton of revlog, you ask him to giv...

I tried two architectures:

  1. I give it the grade, last D (squished between 0 and 1) and R as input, and it predicts new D (also squished between 0 and 1)
  2. I give it the grade, last D (unlimited, from -inf to inf) and R as input, and it predicts the difference between new D (unlimited) and last D (unlimited), I add the difference to last D to obtain new D and then I squish it for it to be used in the S formulas

The first one was shit, the second one was shit AND was throwing errors sometimes for some reason

bold terrace
#

I see ! Thanks for the explanation

polar maple
#

freeze fsrs params, make only the D nn learn, overfit on a small amount of data

#

alternatively you should check that with the nn, you achieve a lower training loss than what FSRS-5 achieves

unique salmon
polar maple
#

at least print out the training loss in the Trainer class and make sure that the nn is training for long enough so that the training loss from the nn is lower than for FSRS-5

unique salmon
#

@quasi shadow https://github.com/open-spaced-repetition/srs-benchmark/blob/main/plots/w[11].png
I'm worried about w11, it seems like it barely changes. According to this graph, all values are close to the default value, which is close to 2. You could say "Well, maybe that's just a really good default value and this distribution is just very narrow", but I'm not so sure.
For example, I'm testing a new D function, and I changed the default value of w11 to 10, and I get this (based on 132 users so far):
5th percentile of w[11]=9.790
95th percentile of w[11]=10.020
This means that this parameter barely changes

Maybe it's just that my implementation is flawed, but I suggest you do some tests:

  1. Try different default values of w11, like 3-5 values, with any version of FSRS you want
  2. See how it affects the final distribution of w11 across all users
    If the distribution of values of w11 ALWAYS ends up being very narrow and centered around the default value, even if you change the default value by at least a factor of 2, then we have a problem
GitHub

A benchmark for spaced repetition schedulers/algorithms - open-spaced-repetition/srs-benchmark

quasi shadow
#

It is the diff between without and with L2 regularization.

unique salmon
# quasi shadow

Please try different default values, for example, 1, 2 and 4, and plot the resulting distributions

quasi shadow
#

I'm running the benchmark.

unique salmon
#

Well, we already have approximately 2, so try 1, 3 and 4, or 1, 3 and 5

quasi shadow
#

So my device is not available now.

#

Could you test it?

unique salmon
unique salmon
# quasi shadow You can modify the init_w in other.py.

That's how I got this

For example, I'm testing a new D function, and I changed the default value of w11 to 10, and I get this (based on 132 users so far):
5th percentile of w[11]=9.790
95th percentile of w[11]=10.020

quasi shadow
#

OK, maybe two or three days later

#

The distribution of w[14] is also very narrow, isn't it?

unique salmon
# quasi shadow

So I calculated the coefficient of variation, defined as std(x)/mean(x)
https://en.wikipedia.org/wiki/Coefficient_of_variation
I took the absolute value of the mean just because. I did it for FSRS-5 with an extra parameter for decay
Coef. of variation of w[0]=4.752
Coef. of variation of w[1]=2.254
Coef. of variation of w[2]=1.997
Coef. of variation of w[3]=1.066
Coef. of variation of w[4]=0.051
Coef. of variation of w[5]=0.441
Coef. of variation of w[6]=0.291
Coef. of variation of w[7]=1.333
Coef. of variation of w[8]=0.228
Coef. of variation of w[9]=0.952
Coef. of variation of w[10]=0.306
Coef. of variation of w[11]=0.100
Coef. of variation of w[12]=0.349
Coef. of variation of w[13]=0.420
Coef. of variation of w[14]=0.172
Coef. of variation of w[15]=0.805
Coef. of variation of w[16]=0.230
Coef. of variation of w[17]=0.445
Coef. of variation of w[18]=0.620
Coef. of variation of w[19]=0.387

w[4], w[11] and w[14] have the lowest coefficient of variation

In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation

...

#

This isn't inherently bad, but again, if their distributions end up being centered around the default value even when you change it by 2-4 times, that suggests that something is wrong with optimization

quasi shadow
#

@unique salmon I cannot find the source where you ask for it

#

Now the hist only shows values between 2%-ile and 98%-ile.

unique salmon
quasi shadow
#

😂 OK, I correct it.

#

Btw, it's an interesting post about improving performance.

cosmic hedge
quasi shadow
#

Do you know Gwern?

cosmic hedge
#

Nope who is he?

quasi shadow
#

He is the author of the best literature review of spaced repetition.

#

(16 years ago)

#
#

If you search spaced repetition in LessWrong, you will find his artical, lol.

cosmic hedge
quasi shadow
unique salmon
#

Mostly dedicated to obscure philosophical problems and "AI WILL KILL AS ALL, DOOM IS NIGH!"

cosmic hedge
quasi shadow
#

OK, it works🤣

#

But I don't know whether the decay is optimal.

#

...It is stuck...

#

The py version gets a better result.

#

If I provide the optimal parameters found by the py version, it will output it.

#

So... the py version is correct.

#

...I even cannot set boundaries for the parameters.

#

I need L-BFGS-B method which is the default method if the params have bounds in the python version.

#

@unique salmon man, the infrastructure of numerical optimization is not good enough in Rust.

unique salmon
quasi shadow
#

Maybe I can ask Claude to translate SciPy to Rust 😂

unique salmon
# quasi shadow

Idk, try particle swarm, apparently it's pretty easy to implement

quasi shadow
#

I give up

#

😅 I need to learn some math at first.

unique salmon
#

Give me some data and I'll try to implement particle swarm

quasi shadow
#

Good news: I figure it out how to add bounds to the NelderMead method.

#
impl CostFunction for OptimizationProblem {
    type Param = Vec<f64>;
    type Output = f64;

    fn cost(&self, param: &Self::Param) -> Result<Self::Output, Error> {
        let s = param[0];
        let decay = param[1];
        let y_pred = power_forgetting_curve(&self.delta_t, s, -decay);
        let logloss = (-(self.recall.clone() * y_pred.clone().mapv_into(|v| v.ln())
            + (1.0 - self.recall.clone()) * (1.0 - y_pred).mapv_into(|v| v.ln()))
            * self.count.clone())
        .sum();
        let l1 = ((s - self.default_s0).abs() + (decay - self.default_decay).abs()) / 16.0;
        let mut total = logloss + l1;
        if decay < 0.1 || decay > 0.8 || s < S_MIN.into() || s > INIT_S_MAX.into() {
            total *= 1000.0;
        }
        Ok(total)
    }
}
#

It's not elegant but it works.

#

Oh, wait.

#

we have four decays because there are four ratings.

unique salmon
#

Yep, just take a weighted average, weighted by the number of reviews for each first rating

unique salmon
quasi shadow
#
        let delta_t = Array1::from(vec![1.0, 2.0, 3.0, 4.0, 5.0]);
        let recall = Array1::from(vec![
            0.86684181, 0.90758192, 0.73348482, 0.76776996, 0.68769064,
        ]);
        let count = Array1::from(vec![435.0, 97.0, 63.0, 38.0, 28.0]);
#

here you are

#

wait

#

The four initial stabilities are optimal when the decay is optimal.

#

But if we use the weighted average, the stabilities are not optimal.

#

😅

#

For example:

[src/pre_training.rs:31:5] &stability_map = {
    3: 4.8061557,
    4: 13.298449,
    1: 0.6919513,
}
[src/pre_training.rs:32:5] &decay_map = {
    4: 0.1,
    1: 0.12588397,
    3: 0.1,
}
unique salmon
quasi shadow
#

The weighted average decay is 0.10524022.

#

OK

unique salmon
quasi shadow
#

What's the result?

unique salmon
#

It's slow as ass though

#

170 ms vs 5 ms

#

welp

#

It's kind of like an evolutionary algorithm, so no wonder

quasi shadow
#

---- pre_training::tests::test_search_parameters stdout ----
search_parameters took 253.875µs

#

😎 The speed of Rust.

unique salmon
#

It's extremely simple:

  1. Try reducing a parameter by step_size while holding all other parameters constant
  2. Try increasing a parameter by step_size while holding all other parameters constant
  3. If neither helps, reduce step size
#

@quasi shadow

quasi shadow
#

not more code (

#

I have implemented it via the lib.

unique salmon
#

That is easy to fix

#

Oh wait, no, hold up

#

Whatever, it's very fast anyway

unique salmon
cursive badge
#

Oh my. I just had a peek at L-BFGS-B to see what you were talking about.
Every time I look beneath the surface of ML stuff a part of me just immediately starts internally screaming.

unique salmon
#

Yep

#

Same

unique salmon
quasi shadow
#

You need.

#

I have run into some weird cases.

#

It's time to have a rest.

#

No more coding!

unique salmon
# quasi shadow You need.

God damn it
Still, I recommend coordinate descent (see my file above). It's so much faster than particle swarm that you can just run it from 10 different starting points (to worry less about local minima) and still beat particle swarm optimization in terms of speed

#

Actually, on second thought, if we do have to worry about local minima, particle swarm is probably better

unique salmon
#

Meanwhile, here are some graphs comparing metrics of different FSRS versions (the values for FSRS-6 may change)

#

(yes, FSRS v2 is better than v3 here)

#

While I don't have a really good estimate of the limits of accuracy on this dataset, I'd say:

  1. 0.27-0.26 is the limit of log-loss (aka no algorithm ever will achieve less than that on this dataset)
  2. 0.5%-1.0% is the limit of RMSE
  3. 0.83-0.84 is the limit of AUC
#

And if some algorithm will get close to these limits, it won't be FSRS 🤣
It will be a giant neural net

unique salmon
#

@quasi shadow NOW it's time for you to contact Duolingo and tell them "Your algorithm sucks, look at mine"
(well, maybe not now, but once FSRS-6 is done)

#

I got an idea for a very high effort and obscure meme
You know the scene with business cards from American Psycho? That, but with algorithm metrics 🤣