#FSRS Megathread

1 messages · Page 3 of 1

unique salmon
#

Yeah, I feel like we might as well merge them

#

They use the same underlying code anyway

#

CMRR is just less realistic for...no reason

#

Instead of using its own "spherical in vacuum" config, CMRR should copy the simulator config

cosmic hedge
cosmic hedge
# unique salmon Yeah

Ok I suggested that before i thought about where to put it 😂. Simulator modal is going to get even more crowded if we're not careful.

unique salmon
#

If it copies the configuration of the simulator, then nothing else is needed

cosmic hedge
unique salmon
#

Just move it where the simulator is

bold terrace
#

@cosmic hedge , do you think it would be possible to have a graph with Average Stabiliy over Time ? Right now we have the "Review Interval Time Machine" but it's not that practical to see if the trend goes left or right, and the time machine seems to not affect the default graph Anki has

#

For example here I'm at average stability 1.23

#

I think last week I was around 1.05 or something

#

Average and median would be even better 🥲 But I don't want to ask too much rooAngel rooAngel rooAngel rooAngel rooAngel rooAngel rooAngel rooAngel rooAngel

unique salmon
#

I actually wanted estimated total knowledge over time, but seems like it won't be implemented natively

#

Only in an add-on

bold terrace
#

Ah yes indeed in the addon it's quite nice

#

Sometimes I feel the default/native graph are a bit "Let's give you some graph" without really much thoughts about what kind of interpretation you can make

#

It's nice to have graphs that "answer questions"

#

If AVG stability over time is not possible (I don't think it's stored in the revlog, so it would have to be recomputed each time), there's always the option of doing AVG Interval over time, but that would not really be ideal since DR fluctuation could change it while having the same stability

cursive badge
#

On the topic of graphs, I found something interesting/weird when I make the bins really small on my SxR heatmap

#

I've not taken any days off since starting this deck, so it's not just a block of days where I did no reviews

bold terrace
bold terrace
cursive badge
bold terrace
#

Indeed

cursive badge
#

Maybe it was to do with a param change, or I had a period of struggling with cards so I never got new cards with those stabilities for at bit. It's a bit mysterious.

bold terrace
#

I don't know if it has any link, but I see something strange in my Stabiliy in CArd info :

<= 30d, I see the 30 days

1.03days, I see Month (days)

30 1.03 days, I see Month WITHOUT days aside

#

But it's probably just graphical

#

Like 1.03 is rounded to 30d, so the UI think there's no point showing (30d)

cosmic hedge
#

#1282005522513530952 message should be easy enough

bold terrace
#

And it doesn't really explain why your hole also goes up with higher stability for higher R

cosmic hedge
#

well "easy enough" as in i plan to XD

bold terrace
#

You would have to recompute it ?

#

For every review state ?

cosmic hedge
#

well yes but the memorised graph already does that

bold terrace
#

ah ok 🙂

#

Would be super nice, I'm really wondering how my stability evolve with time

hasty fractal
unique salmon
hasty fractal
#

That would be removed later ofc

#

let's just have it for one version

#

for smooth transition

polar maple
#

I made a new simple baseline for algorithms that can adapt on the spot immediately after each review

#

its similar to an exponential weighted average except i use backpropagation on the log loss

#

i hope this shows why RWKV-P could achieve such strong results

unique salmon
polar maple
#

its just that the other algorithms in the benchmark are missing out on a ton of information

#

look at AVG, it optimizes on the same 5-way split as FSRS

#

FSRS is so much stronger than AVG bc it uses an actual memory model

#

now if MOVING-AVG already surpasses FSRS, what happens when we add a memory model on top of it? that's what RWKV-P represents

unique salmon
#

That still doesn't explain the difference in optimization and testing and whatnot

#

Like, I still don't get why the moving average is better

polar maple
polar maple
polar maple
unique salmon
#

Oh, you mean optimizing after every review

#

Just call it that 🤣

polar maple
#

exactly, i've said that

unique salmon
#

Just say "Moving average optimizes after every review"

polar maple
#

i said something 99% similar to that

#

lol

hasty fractal
bold terrace
#

I mean, then it does not really show that it's better than FSRS, just that FSRS should maybe be more aggressive with how recency is weighted

unique salmon
#

We could benchmark optimizing FSRS after every single review, but it would take an eternity

#

Unless we cut down the dataset by a factor of 100 or something

bold terrace
#

I mean I hit the optimize button every day and it didn't change for the past 60 days

#

If every optimize change params, it's a bit strange no ? Or much more aggressive recency ?

unique salmon
bold terrace
#

But in this new way of doing things, you woudl still select this one even if the new RMSE is higher ?

unique salmon
#

@polar maple

bold terrace
unique salmon
polar maple
#

another thing is that the moving average does not try to predict the outcome of reviews ahead of time which would be important for scheduling purposes, it only tries to predict the outcome of a review immediately before it happens

#

FSRS mostly does do ahead of time predictions in the benchmark but due to how it was implemented, it can update its predictions 5 times

bold terrace
# polar maple a problem is just how FSRS is benchmarked. It only optimizes 5 times per each us...

But then is it even comparable ? I mean, if FSRS could achieve better result in the simulation with more optimization being done, we're not really comparing the precision of prediction/scheduling with the same constraints ?

Also, for an end user like me, even by triggering a re-optimize every day, my parameters stayed the same for the past 2 month. It's more in those situations, that it would be interesting to see if the moving avg perform really better, no ? Then you would really compare performance of both algos on a comparable basis

#

Otherwise it would mean we're trying to create the best algorithm for the simulator constraints, and not really for people that will actually use it.

polar maple
polar maple
#

a possible way to optimize FSRS on the spot is to do something like a gradient step over the last 50 reviews

#

this way it's efficient enough to be done after every review

hasty fractal
#

but still, people who care about benchmarks and numbers are optimising regularly so it's only fair that we try to see how well that performs (too).

unique salmon
polar maple
sonic forge
#

Auto-optimization was discussed in the past. It is not an option for people who have a need in tweaking the generated parameters after optimization. So if you want to implement it, there should be a toggle (Enable Auto-optimization).

#

Basically, with always enabled Auto-optimization user can't preserve tweaked weights after optimization, after each optimization it will result in weights, that need to be tweaked: cycle where you can't preserve tweaked weights.

unique salmon
#

Well, Alex has an interesting idea for optimizing parameters, so maybe you won't have issues in the future

#

I mean, maybe you won't need manual tweaking

polar maple
unique salmon
polar maple
#

i'll explain what the idea was here since it was discussed in dms, the idea is to make an RWKV model predict FSRS params on the spot after every review for potentially better params and also for speed (RWKV can run efficiently on a cpu)

but this would still be strictly worse than just letting RWKV do card predictions directly; imo there is no world where RWKV would be introduced into Anki as a way to predict FSRS params, rather than just doing the scheduling itself

now the other part of the idea is to see how far FSRS formulas can be pushed to the limits but we already know the result of this in a certain sense, if you let FSRS look at the answers by optimizing on the test set, you still get a model that does slightly worse than LSTM

#

and this idea would take a while to implement so its pretty much a waste of time

#

copying from a previous message and added RWKV :

I wanted to see how expressive is FSRS' formulas so I decided to train FSRS on the test set that it would be evaluated on (same 5-way split), and I did the same thing for LSTM

Total users: 100
Number of reviews: 2097825
LSTM (cheat): LogLoss (mean±std): 0.3303±0.1598
RWKV (normal): LogLoss mean: 0.3429
LSTM (normal): LogLoss (mean±std): 0.3546±0.1668
FSRS (cheat): LogLoss (mean±std): 0.3550±0.1698
FSRS (normal): LogLoss (mean±std): 0.3743±0.1767

unique salmon
#

Oh well

#

We could still try my teacher-student idea, though

#

Or you could find a way to make sure that RWKV can do scheduling properly and doesn't do anything weird, like p(recall) increasing as time passes or the interval for Good being longer than the interval for Easy, that kind of stuff

#

And then we could implement it in Anki instead of FSRS

#

Jarrett wouldn't be happy though 🤣

polar maple
#

RWKV (non-P) predicts monotonically decreasing forgetting curves so the first part is automatically satisfied

#

the second part could be a layer on top by the scheduler

#

i'm still hoping to find some simple insights as to where FSRS goes wrong, maybe small changes to FSRS can largely close the gap

unique salmon
#

We would have to remove S-related stats, I assume? Since it has 3-4 different S values

polar maple
#

you can still compute an S as the x where p(x) = 0.9

unique salmon
#

Would that be meaningful for RWKV, though?

polar maple
#

nope

unique salmon
#

welp

#

Then just remove S stats

unique salmon
#

To give some numbers, I would be very surprised if FSRS's RMSE can get below 4%

#

Maybe if you make some massive changes to D, idk

#

If we really, and I mean REALLY want every last bit of predictive accuracy, we're going to need a neural net, I'm certain

polar maple
#

i forgot to mention, RWKV as trained right now also does same-day review predictions

#

i only filter those out when finding the stats for the benchmark

unique salmon
#

I wonder what Jarrett's and other people's reactions would be like if we announced "Oh, yeah, remember FSRS? We're not going to be using that anymore, we're going full black-box neural net now"

polar maple
#

a nn is undesirable

#

lets try to improve FSRS first

unique salmon
polar maple
#

personally i wouldn't mind a nn, but people want to customize all sorts of things and it would be impossible to customize RWKV beyond setting a desired retention

unique salmon
#

(unironically)

polar maple
polar maple
#

actually one of the first experiments i did on srs-benchmark was to make GRU-P learn the decay exponent and as expected it gave much better results than a fixed -0.5

unique salmon
polar maple
#

yeah standard GRU, oops

#

ok nvm GRU-P does'nt use a forgetting curve so my whole point there is bogus, forget it

unique salmon
#

I'm not sure how you intend to use RWKV to improve FSRS. I mean, you can see that it's more accurate, and you can even look at specific cases where difference is especially large, but that doesn't tell you what a good formula should look like

polar maple
#

RMSE (bins) only measures the bias of each bin, it doesn't really care about the specifics of individual predictions

#

information is lost when taking averages

unique salmon
#

Still, it doesn't tell you what a new formula for FSRS looks like

#

How do you go from "For these reviews RMSE is particularly big" to "This is the new formula for FSRS"?

polar maple
#

i dont understand why you wonder that, this is just a normal part of data analysis, we just need to fit some curves

#

jarrett this whole time makes many plots of random things

#

i've made LSTM vs FSRS curve plots

#

make plot, make prediction, test it

cursive badge
#

For all we know, there may even be windows in which it is better to review. If you miss a window, it might be better to wait for another one than just do backlog as soon as possible.

unique salmon
unique salmon
#

And I will sooner eat chalk than believe in a non-monotonic forgetting curve

cursive badge
#

It could be all wiggly, not a nice slope.

unique salmon
#

That's a different question though, since now you're adding another variable - how tired you are

cursive badge
#

It's just really hard to capture the "true retrievability function"

unique salmon
#

Anyway, if we can make a NN that doesn't do anything weird with intervals, I'm on board with replacing FSRS, as long as no functionality is lost (minus D and S graphs, but whatever, that's not important)

#

I'm more worried about things like interval(Hard) < interval(Again) or the next interval being x100 times longer than the last one

#

Well, tbf, both are solvable

#

Just add some extra scheduling rules on top of the NN

unique salmon
# polar maple a nn is undesirable

The more I think about it, the more I think it's actually very desirable

  1. We can make R more accurate
  2. We won't have to show parameters, which means one less thing for users to worry about
  3. We can support proper same-day scheduling instead of the current mess
  4. We can throw in new input variables, like time of the day, workload, etc. Not just interval lengths and grades
  5. We can remove "Optimize", which means even less stuff for users to worry about
#

That's a whopping bonanza of advantages

cursive badge
unique salmon
#

Everyone is already used to Again < Hard < Good < Easy, and it makes sense intuitively

cursive badge
#

Sometimes intuition is wrong 🤷‍♂️

unique salmon
#

Sorry if that sounds mean, but no, it just means that the way you use answer buttons makes no sense

#

And you need to change your rating habits

cursive badge
#

I'm not saying I'm doing any weird manual adjustments to my rating right now. I'm just saying in some circumstances it may actually be more effective in terms of learning/study time to give a bigger interval to something you forgot, than something you remembered but struggled with.

#

I could also be completely wrong. I'm just saying I would not completely discard it as a possibility.

unique salmon
#

Even if it's better in some fringe cases, I still doubt that the gains from it are worth making buttons unintuitive

unique salmon
cursive badge
#

I mean if we are being radical you could argue that an ideal scheduling algorithm might not statically schedule a card at the time you review it because its retrievability could be affected by cards you review later.

polar maple
# unique salmon Anyway <@142448513622605824> thoughts on this?
  1. we could hide it right now if you want
  2. short term scheduling is still difficult, i don't think desired retention is the right metric for short term scheduling
  3. auto optimize is not even allowed for FSRS even if you could easily do something like the mini gradient step idea
polar maple
#

RWKV uses the entire sequence as well but it cannot reschedule cards. i can work on such rescheduling later

#

but RWKV-P represents the limit of what's possible when you use most available information

unique salmon
polar maple
#

RWKV right now maintains a hidden state at the card level, the siblings level, the deck level, the preset level, and the global level. the way it updates these is kind of equivalent to if we had auto-optimize in FSRS and I think this was unwanted due to syncing issues

unique salmon
#

Ah

#

Dang

cosmic hedge
#

b-w matrix should be under the memorised graph now if you update SSE

#

hope i didn't screw it up somehow 🙏

cursive badge
cosmic hedge
cursive badge
cosmic hedge
cursive badge
#

I mean Anki git main + SSE v1.10.0

cosmic hedge
#

After you've run it

cursive badge
cosmic hedge
# cursive badge

In search stats extended, not the simulator. The past memorised.

cursive badge
cosmic hedge
cursive badge
#

@cosmic hedge the curse continues. The Y axis is really blurry for me on the B-W matrix. The Y axis is still fine for me on the SR Heatmap though.

#

Maybe it's something to do with the viewBox starting at x=-40 instead of 0?

cosmic hedge
#

If thats the problem I'd rather not reprogram it 🥲

#

Google didn't work, chatgpt didnt work. I'm stumped.

cursive badge
#

It miraculously unblurs for me.

cosmic hedge
cursive badge
cosmic hedge
#

nope no filter: blur()

cursive badge
#

The raw (opacity="0.5") SVG renders just fine in Firefox, so it is something specific to Anki. It's just really mysterious.

quasi shadow
quasi shadow
#

Does it consider all reviews of the collection when it predicts the P(recall) of the next review?

polar maple
quasi shadow
#

Is it possible to draw the forgetting curve for a given card with RWKV-P?

polar maple
# quasi shadow Is it possible to draw the forgetting curve for a given card with RWKV-P?

nope. i have a version RWKV that does curve prediction but i haven't trained it to adapt its prediction over time; in theory I could've made RWKV-P predict a curve but it would still be equivalent to just predicting a P directly since it knows delta_t.

If I want to train a model that can update its curve predictions in a reasonable manner, I would likely have to sample a random point between the last review and the current review as a point to update the prediction. if we represent this time interval as [0, 1] then RWKV is directly at 0 and RWKV-P is at 1. So RWKV-P not predicting curves is not necessarily important, it just says what kind of performance is possible at the right endpoint of the interval

#

also another reasonable time is to make RWKV predict a curve at beginning of the day before the reviews start to lessen the impact of a user's mental s state affecting a day's reviews

quasi shadow
polar maple
#

but we can always go back to the LSTM nn for the simulation environment

#

LSTM works at a per-card level so it would work well

#

i could also train a RWKV model that locks in its internal parameters and then works as a per-card model afterwards

#

the only thing LSTM makes difficult right now is that it uses the duration of review as an input. I could just remove that feature and then we would have something that would work for the simulator right away

quasi shadow
#

Seems like the RWKV-P could capture the impact of interactions among cards.

#

Because it uses the entire sequence as input.

#

If we change the order of previous reviews of other cards before the next review, RWKV-P will give a different prediction, right?

polar maple
#

this was the goal of such a model, to use as much information as possible

#

the only information that I reject is the parent_id of the deck

#

rwkv uses note, deck, presets

quasi shadow
#

Could you draw the calibration graph?

#

I wonder the distribution of p of RWKV-P.

polar maple
polar maple
#

thanks

bold terrace
#

Personally I found the approach very nice if it's doable on every user computer ... I mean, having each anki cards having their own "little bubble of stability/difficult/retrievability" without considering others always felt like a limitation of all current schedulers. Having different cards potentially being dynamically linked is very very nice

grizzled cedar
#

Hi guys, I'm an FSRS noob. I have about ~1000 reviews and I've literally just activated FSRS and optimised it. I've set my learning steps to '10m 15m.'

So, with that context, my question is as follows:
I used to have 'easy decks.' These are collations of information I find easy to recall (to the point where I'm pressing easy for pretty much every card). I use Anki in preperation for my final exams in October so I make these decks just to ensure I remember all this content by the time that rolls around.
With the old SM-2, I simply just increased the graduating and easy intervals.

So, as you've probably guessed, my question is how can I do this on FSRS.

Side-note: my RMSE seems to be a 5.62%. Is this an issue? The guide says a lower value is better, and showcases a 2.03%.

Thanks in advance

unique salmon
#

Don't worry about RMSE btw

#

So yeah, desired retentino is the "lever" that you pull to steer FSRS

grizzled cedar
#

ahhh nevermind, i get it now

#

thanks for your help!

unique salmon
#

Desired retention is like "How many cards do you want to be able to recall when Anki shows them to you?"

grizzled cedar
#

yea lol it seems so obvious now

polar maple
#

but for random interactions like "on weekend X the user learnt some new cards that were particularly difficult", RWKV would struggle on this more

bold terrace
#

Still working on it, but plotting Average Stability over Time gives quite good insights ... For those past 90 days, I stopped adding words, and I increased my DR bit by bit, so my review count doesn't drop that much, but I still wanted to see if my stability was increasing or not.

#

It's also insightful because it shows that even if your R is not dropping much, and you're Memorised curve is growing quickly, adding more and more words still tend to lower the Card Stability (And maybe not just because of new words, but also older cards being replaced in memory)

#

Ofc, If R is similar, and Work Load is bigger, then you can infer that stability is probably dropping, but it's a bit more direct like this

cursive badge
bold terrace
#

That's a nice idea indeed. Having each vertical bar with a lighter and darker green if the stability contributions comes from mature or young

#

I need to refine a bit the graph too, right now I use the "buckets" of stability, so if a card has stability 1.2, it counts as a "1". It doesn't cause too much issues because the default Anki plugin gives me an average of 1.27 months and mine 36.5 (so I guess 1.27 month = 30*1.27 = 38.1d)

#

Ideally I'd like also to add the median because it can be quite different, I have a median stability of 20d while my average is 38d

Very easy card bend the avg ...

#

(Those are the default graph right now, which is nice but doesn't give you a sense of evolution)

cursive badge
#

@bold terrace BTW Luc merged my SR Heatmap and released an update to SSE yesterday if you hadn't noticed.

#

I advise Log(S) otherwise all the small S cards get clumped together.

bold terrace
#

Allowed me to detect that I have a huge bunch of reviews waiting for me in Mai-June 🙂 Those ~80S 94-98% S

polar maple
#

@quasi shadow

#

And i also made these where i aggregate users 1 to 500

quasi shadow
quasi shadow
quasi shadow
polar maple
#

but unexpectedly it predicts a wide range of probabilities like the other algorithms

#

so its not cheating in that way

bold terrace
polar maple
#

@quasi shadow bad news for SSP-MMC, if you give fixed DR the ability to make the last interval shorter to minimize effort to reach the arbitrary "3 years stability is treated as infinite stability" rule, then the the gap is closed

polar maple
#

but if SSP-MMC-FSRS cannot do better than a fixed DR then theres no point in adding a different memory model

quasi shadow
#

I guess SSP-MMC is still the best.

#

Btw, I have known that the optimization goal of SSP-MMC is not equivalent to maximize the retention at the end of simulation.

polar maple
quasi shadow
#

ANKIPOGGERS Weird. Maybe the eps is not small enough to find the optimal policy?

polar maple
#

i think if for the cost you only included cards that reached the target then SSP-MMC would do better

#

otherwise, maybe SSP-MMC is keeping many cards at low R and these cards dont reach the target

#

did i write the right expression to only include the costs for cards that reached the right target?

reached_target = card_table[col["stability"]] > s_max
memorized_cnt_per_day[today] = reached_target.sum()
cost_per_day[today] = card_table[col["cost"]][reached_target & (true_review | true_learn)].sum()```
#

there has to be a mistake, otherwise the fixed IVL doesn't make sense

#

i see, i think i was just looking at the cost for the last review that brings the card to the target stability

quasi shadow
#

cost_per_day[today] = card_table[col["cost"]][reached_target & (true_review | true_learn)].sum()
Yeah, this line has some problems.

#

We need to add a new col to the card table to record the total cost per card.

polar maple
#

also "knowledge per minute" is slightly inaccurate since its also multipled by the number of days in the simulation

#

it makes no sense that you would learn 863 items per minute

ashen light
#

hey guys I just had a cursed idea: if one of the blockers for fsrs auto-optimize is people who have custom modifications to their params, can we just add a filter hook for fsrs param optimization so an addon can do the modification for them?

cursive badge
#

Clobbering hand crafted params can be simply prevented with an Auto/Manual toggle.

quasi shadow
#

We will have native "Grade Now".

hasty fractal
#

jarrett is the best thing that happened to us recently

quasi shadow
#

I have replied just now.

sonic forge
#

Auto-optimization will not make scheduler any better, it as a copium.

lapis hearth
#

I was previously for auto optimisation but now since the 0 problem at params w18 w19 has not yet been resolved, I am reluctant. I don't want it to go back to having zeros there. This is still an issue with just some makeshift bandaid put on it.

cosmic hedge
#

I'm guessing someone might have solved this for you in the meantime. but I'm pretty sure you might just be looking at your RMSE instead of your log loss (what FSRS optimizes for)

#

with the defaults your log loss goes up both times

#

I'm not very well aquainted with your problem sorry 🤷‍♂️

unique salmon
hasty fractal
#

opinions on "true retention" versus "retention rate"?

#

I think true retention is a horrible name

#

it's just some historical relic, don't see any reason for naming the stat as such

#
  • the name is more confusing in some other languages which don't have the habit of adding weird adjectives to nouns to make a cool terminology
unique salmon
unique salmon
quasi shadow
#

Yeah, I will test my idea this week.

quasi shadow
unique salmon
quasi shadow
#

And it’s impossible to show the interval if you grade a bunch of cards.

sonic forge
# quasi shadow It has the same effect as a normal review.

Can you explain how it interacts with scheduler/load_balancer
Here it creates a new state: https://github.com/ankitects/anki/blob/eb1ed140223aca5fec34a2b6b821a9a93a5bf30c/rslib/src/scheduler/reviews.rs#L150
(let states = col.get_scheduling_states(card_id)?;) which contains all scheduler/load_balancer logic in the next_states methods. Then that new state goes to new_state: Some(new_state.into()), with new_state selected by grade rating and with new interval
Then this new state goes to revlog_partial:

        let revlog_partial = updater.apply_study_state(current_state, answer.new_state)?;
        self.add_partial_revlog(revlog_partial, usn, answer)?;

But where this new state/interval becomes the new due for the card?
In the fn maybe_requeue_learning_card the entry is created with card.due

let entry = LearningQueueEntry {
            due: TimestampSecs(card.due as i64),
            id: card.id,
            mtime: card.mtime,
        };

But when exactly the new state/interval get to the card.due?

quasi shadow
#

It’s hard to explain. The way I understand the code is inserting a lot of print into everywhere, doing a review and checking the log.

sonic forge
#

fn apply_<insert_state_here>_state, to be precise

unique salmon
#

Just debug by inserting print() everywhere 🤣

sonic forge
quasi shadow
#

I wonder how the stability of a given card predicted by moving average changes over reviews.

#

Does it follow an intuitive memory pattern?

polar maple
lapis hearth
#

After optimizing (notice the 0 at w18 and 19)

cosmic hedge
bold terrace
#

But maybe it has changed

#

There was some discussion about log loss being a better measurement

unique salmon
# bold terrace But maybe it has changed

It's a new thing. If the last two parameters result in a situation where after all same-day reviews after a lapse the next interval is shorter than before, the last two params are set to zero.
For example: the card had an interval of 10 days -> you forgot the card and pressed Again -> you did a same-day review and pressed Good -> you did a same-day review and pressed Good -> the next interval is 15 days

#

However, setting to 0 is overkill. There should always exist small non-zero values that don't cause this issue. Hopefully, Jarrett will work on it

bold terrace
#

@cursive badge @cosmic hedge , I've tried something for the Card Stability over Time with Young/Mature contribution ...

Basically, it's not that the Stability of Young is 5.79 and Mature 30.48 in this example, but it represents the ratio of the young average and the ratio of the mature average to represent the average.

For example : Let say your Young AVG is 10, and Mature AVG is 10. Your Total AVG would still be 10, but since they have both a ratio of 1/1 of contribution, you would have YOUNG Contribution (Ratio) = 5, MATURE Contribution (Ratio) = 5, Stability Total = 10

#

It's a bit confusing at first but it can be very insightful to see if your stability is driven by young or mature card

#

But at the same time, I'm not sure how much it bring info, since in general, younger card wiill have low stability anyway, so apart if you have a 2x more young than mature cards, young avg stability should not really matter much

#

Wait, nevermind ... since it's an AVG, the amount of young card won't change anything .......

#

So I think the Young/Mature split is just useless blobSweat

cosmic hedge
polar maple
#

RWKV curves tends to drop out quickly at the start. I believe this is from a failure to encode the card in memory properly. RWKV was trained to also predict same-day reviews so these could also be from needing to anticipate failed re-learning steps while on the other hand FSRS just assumes that some relearning steps has already happened

#

Regarding the asymptote behavior, we know from the aggregate calibration graphs that FSRS does underpredict for low R, so this suggests that more likely than not RWKV is correct here. RWKV has near-perfect calibration

quasi shadow
polar maple
# quasi shadow How many params does the RWKV forgetting curve use?

it uses a a weighted sum of 4 power curves

def forgetting_curve(self, w, s, d, label_elapsed_seconds):
        return 1e-5 + (1 - 2*1e-5) * torch.sum(w * (1 + torch.max(torch.tensor(1.0),     label_elapsed_seconds) / (1e-7 + s)) ** -d, dim=-1)

some of those numbers are for numerical stability

#
w: [0.01882943883538246, 0.10095643252134323, 0.4705328941345215, 0.4096812903881073], s: [2.923821449279785, 77058.578125, 32219614.0, 1296332928.0], d: [0.4262007474899292, 0.010785482823848724, 2.2498745918273926, 0.988442599773407]
w: [0.03417219966650009, 0.13965930044651031, 0.4500780999660492, 0.37609046697616577], s: [1.9724270105361938, 63864.07421875, 28334582.0, 950819136.0], d: [0.3487342596054077, 0.011508260853588581, 1.5801490545272827, 1.6873809099197388]
w: [0.004207565449178219, 0.24725909531116486, 0.34103667736053467, 0.40749669075012207], s: [5.856897830963135, 811459.875, 29277376.0, 7455286272.0], d: [0.36546364426612854, 0.020167982205748558, 3.5514843463897705, 0.28269582986831665]
w: [0.01884218119084835, 0.17774465680122375, 0.41338032484054565, 0.3900328278541565], s: [11.130600929260254, 97529.4140625, 25254084.0, 1843096832.0], d: [0.36693474650382996, 0.01851370558142662, 1.8968570232391357, 1.2613049745559692]
w: [0.015502666123211384, 0.2253720462322235, 0.25992316007614136, 0.4992022216320038], s: [7.5156378746032715, 88570.5625, 16159777.0, 4799276032.0], d: [0.3775652050971985, 0.017523538321256638, 2.102546215057373, 0.41862088441848755]```
here is a sample of these values, each row corresponds to the first review in a user's revlog that has ~30 days stability. the stabilities are in seconds
#

so to me it seems that w[0] is the immediate dropoff in the curve and w[3] is the asymptote

#

w[1] and w[2] control the main shape of the curve

quasi shadow
#

Btw, I added moving average into my metric comparison.

#

It doesn't perform well in random sampling data.

#

So, MOVING-AVG really learnt something from the data from real users...?

#

Otherwise, it cannot calibrate so well as that.

polar maple
polar maple
#

that, or the underlying scheduler (i assume SM-2) is able to consistently schedule similar cards together

#

and MOVING-AVG just becomes a calibration step

#

but i doubt sm-2 is any good so idk

#

https://pastebin.com/AAgBMbHK
i made this before, i think it is for user 58 for which RWKV/LSTM/FSRS does horribly at at 0.65+ log loss but RWKV-P/MOVING-AVG does well at 0.33/0.42

ahead refers to RWKV, it has to predict the outcome of the review ahead of time, right after the previous review of this card. imm refers to RWKV-P which predicts the outcome of the review immediately before it happens

you can see how this user has long strings of 0.0s or 1.0s. At the end of the file you can see the imm column creep up and up. MOVING-AVG would also be able to exploit this behavior. So, the performance of RWKV-P and MOVING-AVG is fake in this sense, i don't really think this kind of knowledge is useful for a scheduler

#

actually it just depends on if the user is truthful or not. If the user is truthful and there are long strings of 0s or 1s then the scheduler should be able to adapt on the fly

#

otherwise if the user is not truthful then yeah it is fake performance

#

some users just want to pass all the remaining cards to get their day over with

lapis hearth
unique salmon
quasi shadow
bold terrace
# cosmic hedge I'm just impressed you managed to pull it off with my crap code-base 😅

Your code is not crappy at all 🙂 And for now it's me that is adding a lot of plain-flat-logic when it should be refactored a bit haha. But I like to keep it that way until I'm happy with the result. I was thinkng, maybe the ratio should not be a ratio of the average but a simple ratio of young/mature dividing the avg stability, so you can see the "volume" of reviews potentially impacting how the stability evolve, I'll try that a bit later

#

I was even thinking earlier, how clean and even predictable the avg stability increse is, I'm wondering if that increase is not driven by sheer repetition volume, which means more repetition, even though it might not be "optimal" (in a review/time optimal way), might be how you build more quickly increasing stability

#

Which would then be another justification of why, higher retention than the theoritical optimal one (in terms of knowledge/review), can be a good thing

#

Because the "Memorised" is just a view of "how much words you can have right at a certain point of time". But it does not take in consideration "for how long you will be able to keep them memorised", where stability is exactly that

#

So an optimal scheduling, might more often than not, be not only related to how much you memorized, but how high you were able to build stability.

#

Now the question is, how much to sacrifice one for the other ? The R*S, R*log(S), ....

#

But avg stability and DR are not even sufficient to really determine this. Indeed, the number of new/day, also impact the rate of Increase and even decrease of avg stability over time

#

For example, in my case, I was able to recover my "old avg stability" ~29d, when I stopped adding new words after around 30d of stopping adding new cards

#

Which can be explained partly by the volume of very-low stability cards that had to build over, but still, it's still a long time to recover a stability that was not that crazy in the first place

lapis hearth
unique salmon
# lapis hearth Isnt there a way to not just make it happen in the first place. Why should there...

I've explained this 3 times already

Basically, it detects whether using same-day reviews to adjust memory stability could result in a situation where your next interval after a lapse is longer than before. For example: 10 days -> you press "Again" -> you have an insane number of re-learning steps -> you do them, S increases -> next interval is 15 days
If the optimizer detects that your re-learning steps and parameters would result in that kind of problem, the optimizer will run for the second time, but the last two parameters will be "frozen", meaning that same-day reviews will have no impact on S

So if something like 10 days -> Again -> (your re-learning steps) -> 15 days can happen, the last two params will be set to 0

For example: the card had an interval of 10 days -> you forgot the card and pressed Again -> you did a same-day review and pressed Good -> you did a same-day review and pressed Good -> the next interval is 15 days

hasty fractal
#

I call it the pass-fail-pass-fail trap

lapis hearth
#

I am saying that wasn't a problem beforehand

#

What made it into a problem. It was working just fine

#

It doesn't help like at all

unique salmon
lapis hearth
#

Now with 0 being practically set every time I optimize at w18, w19, FSRS 5 is basically switched off

cosmic hedge
bold terrace
#

But it's funny that your AVG stability is around 60 but Good are 30 and Easy 100

#

Would mean you would fail a lot of those after a few reviews

#

and strange that then FSRS optimizer doesn't learn from it and make the initial stability lower

#

Your DR is at how much ?

cosmic hedge
#

83% (my dr brother)

bold terrace
#

Strange strange that it's so high

#

Because if you really do succeed them 80% of the time

#

those stabilities would get even higher

#

so the average being at 60d feels super low

#

Can you test those sequences with your parameters (and desired retnetion ) ?

#

1331333
1313333
3333133
4313333

cosmic hedge
#

i think its to do with the fact that re-did all my cards a while ago (hence all the re-introduced (I massively regret not just making new notes))

#

so some of the info i already know

bold terrace
#

yes that could explain

bold terrace
#

Yeaaaah got it

#

Basically the model learnt that if you know it already, well, you might at well not review it anymore

#

Personally I also creaet sometimes card for what I already know, but I make sure when I review them I press Good for the one "I just knew based on inference" and Easy "the one I know very well from before"

cosmic hedge
#

yeah i've only ever failed 26 cards i initialy did good on

bold terrace
#

And I also have a very large stability for first-easy

cosmic hedge
#

which makes sense because if going in i knew it who cares

bold terrace
#

I know with time my parameters evolved to make it less optimistic

#

I think once you'll fail some of those it will adapt

#

but if you don't after such a long time, it's OK I guess

cosmic hedge
#

I'm going to assume when I start encounering the cards I don't know from the start I'm just going to hit "again"

#

so Idk if it will affect it too much

bold terrace
#

It's just strange I have a better stability for Good then you

#

But I don't have such big initial stabilities

#

If you want to test, I did the split Young/Mature, but be aware that it's looking at stability >= 21, not interval so the ratio will be different than the one presented by anki

#

I think it would be better if Young/Mature was computed on stability and not interval

#

21d is "nothing" with a DR to 70% compared to 90%

#

(also I did not implemented it for median, so only avg with this build)

lapis hearth
#

What is this

#

Is this the new helper addon

bold terrace
#

No it's the search stats extended from the almighty @cosmic hedge

hasty fractal
# lapis hearth But in what way is it a problem exactly. FSRS 5 was working just fine and still ...

it was a problem for my deck. here's a look into the issue:

  • I have a new card. I learn it. ivl = 3d.
  • I fail it after 3d. I relearn the card. ivl = 5d.
  • I fail it after 5d. I relearn the card. ivl = 10d.
  • I fail it after 10d. I relearn it. ivl = 14d.
  • I fail it after 14d. I relearn it. I see next ivl is 21d. I open discord and start spamming Expertium's DMs with a long rant. Then there is a issue opened in the repo. The Jarrett solves it. Yay! Happy ending.
#

I unconsciously wrote "The Jarrett" 🤣

unique salmon
#

Your example has nothing to do with same-day reviews, whereas the "set last two parameters to 0" thingy is specifically to deal with same-day reviews

#

Unless by "relearn" you mean "it goes through a bunch of re-learning steps", in which case yeah

hasty fractal
#

please enlighten me

#

(bruh there's only one relearn)

unique salmon
robust hill
#

am i winning chat 93% dr

cursive badge
#

I assume those really low R are suspended cards? You might want to set a custom search at the top to deck:current -is:suspended for nicer bins.

bold terrace
hasty fractal
#

After going through the steps...

bold terrace
#

Because if that release was multiple steps, at least in your case you always go to higher stability which is already nice

hasty fractal
#

The issue is solved though in the current ver.

bold terrace
#

Ah ok !

bold terrace
#

By fixed you mean what happens now ? You fail and the interval is not reduced ?

robust hill
cosmic hedge
# robust hill

Ross made it search if you click the squares so that's a quick way to see if you want 😄

hasty fractal
hasty fractal
robust hill
#

for my language learning

#

so what does this mean

#

am i winning

bold terrace
#

But it's not a bug in a sense that indeed, many cards in my deck behave like this

#

In my case, forgetting is a "very bad" incident 😄

hasty fractal
#

brother I have no idea what are you talking about but more importantly, I think u have no idea what I'm talking about either

bold terrace
#

relearning not in a anki term, but in a "lot of reviews"

hasty fractal
hasty fractal
#

if the interval keeps increasing after every relearning session, no way I'll ever pass such a card.

quasi shadow
#

I made a presentation to a group of researchers at Cognitive Computational Neuroscience two days ago. Now I know a fun fact: they didn't do any research about "long-term memory" in the sense that Anki users would understand. In their term, the scope of "long-term memory" is several minutes to hours.🤣

#

😅 Their long-term memory research is my short-term memory research.

spring adder
hasty fractal
quasi shadow
hasty fractal
# quasi shadow My papers.

how did they respond? were they interested, or they were like "nah our research is good. LTM is a few hours"

quasi shadow
#

Their professor is interested. The graduate students feel alien with my papers’ topic.

hasty fractal
#

urgh, they gotta use anki

#

it's perhaps more interesting then

quasi shadow
#

Anki is not very popular in China.

hasty fractal
#

yea, not here either

#

people are kinda stuck with coaching/school and stupid traditional methods

#

guess it's the same with China too

bold terrace
#

I mean, Anki is not that ground breaking in the first place. Anki without the full suite of addon/integration is quite ... bland. I have a few colleagues using Anki, when I discussed with them about it, they were surprised how rich my cards were, because they just did the "Anki the normal way" and they get extremely bland cards, at a very high human cost

#

As I was saying in the yomitan discord, it really feels sometimes the Open Source softwares is tools made by devs for devs

#

Which you can embrase and gives us all the shiny things (Like FSRS is doing with simulator, parameters optimizers, etc ...)

#

Or you can try to streamline into "One way of using it" (duolingo-like, you boot up anki, you import a deck, you review, no integration whatsoever, no knowledge about what the scheduler is doing, etc)

#

For example, my wife is a math teacher and we already discussed Anki but it's quite clear it's not something that would really appeal to her students or even her

bold terrace
#

@unique salmon / @quasi shadow : Shouldn't difficulty be a bit more reactive to Good/Easy reviews ? It almost seems like the "Ease Hell" is even more pronunced with FSRS. But maybe it's perfectly normal if the prediction is better like this ?

#

Feels like you need 3 Easy to compensate one Again, and ... infinite number of Good 😄

#

Wild idea but please don't kill me too quickly : What if, Difficulty would be outside the realm of otpimization, but more like an adjustement variable on a card-level basis ?

#

For example, if a card doesn't match the DR, the Difficulty would adjust to that, to bend the model for it ?

unique salmon
#

That's the weird part about difficulty - it works better like this for some incomprehensible reason. And making reversion to lower D more aggressive makes metrics worse

bold terrace
# unique salmon How exactly?

Let's try to build an example :

Your DR is 90%. You do 10 reviews, and you fail 3 of them. It means, your actual percentage of retention for that card is 70%. Which means, you have a delta of 20 over a margin of 30, so a 66% difficulty. (Harder than expected)
Your DR is 90%. You do 10 reviews, you fail 1. The delta is 0, the difficulty penalty is thus 0.
Your DR is 90%, you do 10, you fail 0. The delta is -10 over a margin of 10, so you have a -100% difficulty rating. (That word is easier than expected)

Then the question is, how to bend the stability based on that difficulty rating. I don't know 😄

#

Thus, FSRS optimize your "Average" Forgetting Curve, and Difficulty play the role of the "Case-by-Case adjustement variable"

#

(As intended)

#

The benefit is, instead of moving D at every review, it would be adjusted based on the full history of that card. If over 50 reviews you have a 60% success rate on that card instead of your expected DR, something is smelly, right ?

#

Of course, optimization like moving average has to be considered

unique salmon
#

@quasi shadow I don't think this is compatible with how FSRS works, but maybe you have something to say anyway

slim hollow
#

max D means you are answering thoughtfully, min D means you are cheating and you should be pressing easy; pretty much D in current form is not a parameter that humans would interpret as difficulty

unique salmon
#

Maybe I should go back to experimenting with adding R to the D formula, though last time I did that the metrics didn't budge even a bit

bold terrace
#

I mean, this is my data : https://docs.google.com/spreadsheets/d/1Eysl4bocAg9KD3YpVCjR28ACkvpMa3D8eMP2x6fWieE/edit?usp=sharing
For each card, I just counted the amount of Success, the amount of Error. I didn't do anything to filter out the good after bad the same day, so of course it wont match exactly "True" Retention.
But still, we see the distribution of success rate looks like a normal distribution.
So you would expect more or less to have a difficulty to follow something like that, with card a bit more problematic, and some a bit less

#

Now the standard deviation si 7.8% ... So indeed, maybe even with no Difficulty handling, you get something good enough, and since the RMSE/logloss seems already quite good (logloss 0.49 and RMSE 3.360), I guess "it won't change much"

#

But, I think right now nothing is really changing much, so maybe handling those outliers could help

bold terrace
#

I read your blog here @unique salmon https://expertium.github.io/Algorithm.html

If I understand correctly from the "Changes in FSRS-5", Difficulty is still not based on R but only G right ? Maybe that could be useful then, since a "Fail" doesn't necessarly mean the card was more difficult than previously right ?

I mean, if we take back my example, for DR=90%, a fail every 20 review should even be a sign that the card is easier than expected. So the Delta D would have to take in account R and DR right ?

unique salmon
#

We can't take DR into account, just R, btw

#

DR doesn't exist in the training data

bold terrace
#

Hmm indeed

#

And I guess even with Training Data coming from Anki user, the DR is not stored anywhere

#

I think it could be useful, because when you think about it, that R is relative to others card

#

Without DR, we can always look at R of a card, and the mean R of the dataset

#

Would not help if different decks has different DR though ....

quasi shadow
#
Jarrett Ye's Notes on Notion

An individual can only engage in a single review for a specific card at any given moment. It's impossible to conduct multiple reviews under identical conditions without parallel universes. This irreproducibility makes precise measurement of memory states impossible.

quasi shadow
#

Difficult cards are unlikely to become easy in most cases.

quasi shadow
bold terrace
# quasi shadow I guess leech is more common than the feeling of “ease hell”.

I think leeches might express themselves differently (Let say DR=80%) :

  • Leeches : Your Stability won't go very high, even though your DR might be respected with great accuracy. Basically, you HAVE those 80% perfectly predicted, but with still very low stability.
  • "Difficult" card : You seem to not be able to reach the DR, doing only 60-70% R instead of the Predicted R, and their stability will thus even be lower.

Then of course, saying that "Leeches are then also difficult" is also a valid way of expressing Difficulty.

Maybe the problem with "Difficulty" is how loosely defined it is (Is it related to stability ? inability to respect the DR ? Inability of having a stability converging, and having it all over the place ? etc etc)

#

Maybe Difficulty is then just a word we should stop using and instead refining it into different evaluation of why a card is not "satisfactory" 😅

unique salmon
#

In SuperMemo D is actually defined differently, based on "missed expectations" - difference between R and the real review outcome (with smoothing and shit, since the outcome is binary). I've tried that, but it didn't improve FSRS. I could try some more, but considering how many attempts at improving D have failed, I don't feel like doing it anymore, since 99% of the time my ideas don't work.

quasi shadow
#

😂I found more traditional models about the memory.

#

As in P&A, however, PPE does not require a successful retrieval attempt to receive these gains

#

OK, they both are shit.😅

quasi shadow
#

For anyone using the FSRS algorithm in Anki, I'd strongly advise against it because of multiple issues:

- Inescapable Ease Hell (default w[7] value is 0.0046, rendering mean reversion useless)
- Optimizing with bad learning habits will actually result in a HIGHER workload

#

Any thoughts?

#

Maybe we need to introduce something like momentum into the formula.

cosmic hedge
#

I always just blame my high difficulty cards on my card design, figured fsrs recognises them as "doomed".

slim hollow
#

with how currently difficulty is used in formulas:

  • user uses 2 buttons / content is normal or hard will lead to most card going into 10D
  • user uses 2 buttons / content is easy will lead to most card going as low as possible so ~5D
    These are the most common patterns and many people will fall into 1st tier which looks like difficulty hell, but really isn't as the variable is misnomer
bold terrace
hasty fractal
#

I guess if it's something like: 〇〇学園筆記試験過去問題集

bold terrace
bold terrace
hasty fractal
#

link ?

bold terrace
#

Scroll up 🙂

slim hollow
#

you can rescale the D in FSRS from 0-10 to 0-1 and it brings much more natural distribution but this doesn't improve the prediction

bold terrace
#

I might get crucified for saying that but maybe D should be more like a post-processing on top of the FSRS equation more than part of the FSRS equation.
Take a look at GPU, they'll often have different layers to be performant and precise instead of trying to have only one shader doing all the work

#

Difficulty could be like that, an variable not necessarly part of the optimized equation, but something that adjust realtime the prediction based on actual specific feedback

#

I mean, FSRS is already quite precise, RMSE around 3-5% for most of us. So sometimes it might just be a matter of slightly shifting the prediction on a lower side, or higher side, to get a perfect match

#

If I take my own distribution of Success Rate (R for one Card over the whole revlog), sure, most cards are within an acceptable range (.70-.75), but for all those that are around [.50,.65], they clearly deviate from the distribution and while it might not have a big RMSE impact, being able to detect them and adjust their Stability would help them be more centered

#

Because of course, if you optimize D behaviour for the whole set, it'll be optimized to have an average effect helping the whole logloss/RMSE minimization

#

But, is it really what you want ? Or instead, would you like D to be, a compensating variable on a card-by-card basis, quicker to react to what's actually happening right now (instead of what was planned in the training model)

hasty fractal
#

IMHO we should focus on having dynamically selected DRs

#

the predictions are already pretty good

#

well, I did see that ssp didn't interest anyone

quasi shadow
bold terrace
# quasi shadow I mean, something like Straight Reward.

Yup I see the idea. On the specific example of SRS Kai, it still means the ease "reward" would only be computed on a grade-level, instead of an "history-level".
I'm not sure for example an "Again" should always lead to a loss of ease. If you press Again exactly what was predicte by your DR, to me, your card has a neutral difficulty.

quasi shadow
#

I imitate Straight Reward in this PR.

bold terrace
#

Tell me if I read it wrong in the code, but isn't it only trying to compensate for cards with R>DR ?
Also, since you give a reward for succession of good reviews, it means someone with DR=95% might get a lot of rewards, when in fact, doing longer chain of 1 is not really him necessarly outperforming the prediction ?

#

I really do think what would lead someone to have a bonus/malus reward should be his actual performance (Retention) based on expected prediction (Desired Retention)

#

Doing 10 "Good" in a row doesn't mean you're really that good if your DR was 99%

#

Doing 1 fail every 4 reviews is outperforming if you had your DR at 50%

#

Now in terms of "momentum", the question would then be : How that actual R should be computed ? The full history ? Only the last portion ? Excluding Same-Day Reviews ? Basically, how to define a Good average to compute actual R (moving average, filtered, global ...)

#

You see, it can change quite quickly, but those phases are still somewhat pronounced in my experience

#

But problem is, to compute that, you'd need to stored the "Predicted R" and then, for each window of average, doing the average of the predicted R vs the actual observed R

#

In my example, you see that D more or less work, since the yellow phase ws at 99%, the red at 100%, the green at 96-97%

quasi shadow
#

(if the cards turn out to be easy)

#

It's not related to DR-stuff. I just want to test the idea of Straight Reward.

bold terrace
#

Sure, fine !

#

I'm doing a quick experiment with the visualizer :
I do a succession of Good/Fail, I plot the Difficulty, and I alter the Desired Retention

#

Increasing/Decreasing DR doesn't change how D move when I enter a 1 or a 3

#

Here, I failed enough time to go to ~97% D, which seems to be my "neutral point".
I do 9 "Good" and only then, I go back to the last Difficulty of 95%

#

No matter the Desired Retention, it stays the same

#

Which means : D is somewhat "locked" to measure my performance based on a DR~90% around 95-99% D

quasi shadow
#

Wow, the PR does improve the distribution of difficulty a little in my collection.

#

But the main metrics don't become better than before.

bold terrace
#

Yeaah and I'd also be cautious to look at what are those card with D<90%

#

in my case, it's a lot of very very very young card in terms of review number

#

This is mine for my main deck

#

If I check the 65-70D, I get this list of card :

#

Remark how they all have 0 lapses

#

BUT, having lapses should be perfectly fine in a model that predict your 80-90% DR

#

You should, lapse, 10-20% of your review count

#

Over 1350 cards with prop:lapses>3, I have only 2 with prop:d < 0.9

#

I have 180 cards with prop:lapses<3 and props:reps>20, the 46 cards have difficult >89%

#

Hmmmmmm

#

I'm doing
deck:current prop:lapses<=4 prop:reps>20
deck:current prop:lapses<=5 prop:reps>25
deck:current prop:lapses<=6 prop:reps>30

To find the "not too bad one".

#

The one with indeed, a number of lapse around ~ (1-DR) compared to my number of reps

#

And they all seems to be with difficulty 90-100%.

#

Which means D might worked as intended, but it's just that yeah, plotting it with bars of 10% width, won't give much info

#

After all, isn't it normal that Stability/Difficulty have the same kind of curve ?

#

Sorry for the monologue but I realize I might had false expectations 😂

#

Still, I think the current D won't work for DR too low. Your malus with Again answer will completely erase all D bonus you got with "Goods", since D variation is not related to DR

unique salmon
#

I've tried the idea with streaks before, so I will be very surprised if it improves metrics.

cosmic hedge
bold terrace
#

I know everyone dislike those perfect sequecnes of 3 then 1 fail, but I think it's quite insighftul here, a 80% DR and 90% DR perfect sequence

#

To me ... It seems... actually pretty good

#

The very non intuitive part is the fact that for lower DR, your D will be higher

#

Because since D variation is DR independant, you get "screwed" a bit more

#

BUT, there are many good points :

  • Lapse after lapse, you'll have shorter cycle, which might help you not fail the next one as fast as the previous
  • Yeah, everyhthing is still clamped up to 90-100%, but it's not like reviews just get "ignored".
  • If the D impact is small, maybe it's just because in my case, there's no that much variation about it. Also, it's just a value, maybe a small variation might lead to bigger stability change
#

Which is the case, since that Difficulty being different between cycles, is probably what explain why each new cycles would go to Stability lower than previous one, until a point where it would stagnate

#

Soooo ... Maybe we just want D to be super pretty, super centered around 50% D... but maybe we should trust the optimizer 😂

#

If I change w[4] to 99%, to start right away at 99% Difficulty, we can see the shape is completely different, and it will converge to a lower Difficulty with time and cycles

#

Sooooooooo, yeah, maybe the biggest culprist is the difficulty scale, NOT the difficulty itself

quasi shadow
#

So... is this case real?

bold terrace
quasi shadow
#

I think we should find a method to detect them and draw the calibration graph on them.

#

If the calibration is poor, we may find a systematic weakness of FSRS.

#

Then we can try to fix it via systematic methods.

bold terrace
#

Anomaly detection 😄 ?

#

to be fair, I think it might be very simple to detect though. For example, how fast stability grew on those

unique salmon
#

I've also tried making f(D) mostly linear, but switch to a power function for extremely hard cards. That didn't help either.
But I guess I'll try adding R to D and report the results

bold terrace
#

like w[2] (good initial stability) / w[0] (again initial stability), or very very low w[0]/w[1]

bold terrace
slim hollow
bold terrace
#

So adding graphs default and options to reflect the same kind of observation with difficulty would solve the misconception

#

Right now that graph is a kind of “all” one and in comparison to Card Stability one it is probably even more clear and well distributed

#

I think by filtering out the very young one and doing some kind of zoom on 80-100% difficulty people would have a better sense of their difficulty distribution

#

It’s something we can try to implement in the stats plugin before proposing a PR in Anki itself

#

I still do believe that ideally D should have the same “neutral point” between different DR but it’s not a game breaker at all, it’s something that can be explained in legends, docs, blogs

slim hollow
#

also a food for thought, instead of blending easy with current response in mean reversion add additional parameters for reversion so that easy/good/hard have their separate optimizable parameters for reduction

vital apex
# quasi shadow So... is this case real?

in the case of studying Japanese, this is a pretty large amount of people. they dive into Japanese and use a core vocabulary deck of the 2000 most common words for example, and then it's natural as a complete beginner they won't know any words

bold terrace
#

Yes and more "easy" card (With D <80%) only really happen when you start having words you can somewhat infer from others, which is absolutely not the case for your first 1-3K words

#

To me difficulty in this case represent more how atomic/disconnected the cards are. (Pure) Kanjis might have an even more abrupt curve then mine

hasty fractal
#

wdys

lapis hearth
unique salmon
lapis hearth
#

I mean pretend that it is not a problem that literally every other deck is rendering fsrs 5 useless post optimizing

#

(without manualling resetting values for 100s of decks which is furthest from being practical)

unique salmon
#

Jarrett is working on a fix

lapis hearth
#

Yes because my RMSE has definitely worsened. It has gotten from 4% to just shy of - %

#

6%

hasty fractal
#

and lower time

lapis hearth
#

It literally just gives me back the default value

#

Pre-whatever update that was-update that was absolutely not the case

unique salmon
hasty fractal
#

ah...

#

k

#

time isn't taken into account

lapis hearth
#

And as I was reviewing I became aware of a very noticeable drop in my reviewing performance prior to that update which is odd, since my reviewing habit is consistent for the past 2.5-3years on my Anki

hasty fractal
#

jarrett said he'll try something

#

so

unique salmon
#

@quasi shadow am I misunderstanding this code or does it only count "Good" and "Easy" for the streak?

#

This one is also strange - are you counting "Hard" as fail?

#

The initial value after the first review should be
new_streak = torch.where( X[:, 1] > 1, torch.ones_like(state[:, 2]), torch.zeros_like(state[:, 2]), )
And then for all other reviews it should be
new_streak = torch.where( X[:, 1] > 1, state[:, 2] + 1, torch.zeros_like(state[:, 2]), )

#

Oh, I see, Straight Reward was made by a madman who counts "Hard" as "not success" but at the same time not as "fail". So in Straight Reward:
Easy = success
Good = success
Hard = not success
Again = fail

#

So I guess you faithfully imitated that, which is not a good idea

#

i'm sorry what

#

Why on Earth do you need RELU here?

#

Oh, that's just the world's weirdest way to do "add a reward if the streak is >= some parameter, don't add anything otherwise"
Still don't get why you use leaky RELU though

#

Instead of the regular one

#

Using torch.maximum(new_streak - self.w[20], 0) instead of RELU would be a lot clearer, btw. Just to make the code more readable

unique salmon
# slim hollow also a food for thought, instead of blending easy with current response in mean ...

Some time ago me and Jarrett agreed to stick to "2% relative improvement per parameter" rule. In other words, if you tweak FSRS and add new parameters, logloss and/or RMSE have to decrease by at least 2% (relatively) per each new parameter. This is just to avoid adding a crapton of new parameters and bloating FSRS for extremely marginal improvements. And I really doubt that what you suggested would be anywhere near 2%

Trust me, D is just that much of a bitch 🤣

#

D doesn't care what you or me think makes sense

#

Even something really obvious, like using optimizable parameters instead of Again=1, Hard=2, Good=3, Easy=4 doesn't do jack shit to improve metrics

#

Btw, I'm currently benchmarking some stuff related to using R in D, will share the results later. I'll try a very simple approach first that doesn't even require adding new parameters, and then I'll try redesigning D entirely

#

I bet 20$ neither will help

bold terrace
#

@cosmic hedge / @cursive badge , I'm tinkering about "Lapses" and trying to extract something useful from it, I came up with a "Avg Repetitions / Lapse" that could be useful to detect if Higher Lapses not respect the DR anymore (because they are so difficult, you lapse them more than you need)

I came up with this (see attach)

I have the feeling the second view is more useful since you see how many repetitions you can do in each "lapse", so in my case I can see that the more lapse I have, the lesser the average retention (expressed here in repetitions/lapse)

Any hot ideas before I create a PR so we can improve it a bit more with time ?

#

Maybe a "Lapse Ratings" would be best suited, similarly to

hasty fractal
#

hope ross doesn't mind the ss

bold terrace
#

That's a good idea ! But I think it would have to be implemented in something else than simply a few computation/ratio based on card metric

#

I don't know much about anomaly detection too

#

But I see the point

quasi shadow
unique salmon
#

Ah, ok

quasi shadow
#

It's used to address issues of vanishing gradients.

bold terrace
# bold terrace <@388069992660205588> / <@347088848854974465> , I'm tinkering about "Lapses" and...

I created the PR here, I merged both graph in one with a toggle "Divide Avg Repetition by Lapse"

https://github.com/Luc-Mcgrady/Anki-Search-Stats-Extended/pull/33

I attach a local build for thos who want to try it (it includes also the Stability over Time)

GitHub

Note : This include right now the changes of #32. So we might need to merge that one first, but at least I don't have conflict between both in this one :)
I also added an option in the conf...

cosmic hedge
#

I'll hold off on the "make it redder!!!" till you're done XD

#

I also wanted that bar default feature for a while so feel free to default it to "bar" because people tend to like the bars more

polar maple
# quasi shadow So... is this case real?

could be one of those cases where MOVING-AVG does better than FSRS, but it's a real problem if it persists even after FSRS is optimized on these newer reviews

bold terrace
#

The average stability is already good enough I think, I'm working right now on making it more precise (for ex, using stability = 2.3 instead of 2 from the bin value)

#

buuut it won't change much

polar maple
cosmic hedge
unique salmon
#

(unless I'm bad a math)

polar maple
unique salmon
#

Maybe we can find leeches like that, actually. If a card has been failed n times in a row at some DR level, we can calculate how likely it is. And it's super unlikely (for example, <0.1%), then it's tagged as a leech

unique salmon
#

It's just that math is simpler if all failures are one after the other

#

This gets problematic if DR changes, though

#

Since Anki doesn't store it anywhere

#

It doesn't store DR at the time of the review in Card Info

#

And I'm not sure if binomial distribution math even works if p(success) changes

bold terrace
#

It might sounds/look prettier, but it's rewriting the story, different commit id, it's a mess

unique salmon
bold terrace
polar maple
unique salmon
#

Btw, any card that has been failed 6 times in a row would be considered a leech at any DR that is used in Anki, if we use 0.1% as a cutoff

#

BINOM.DIST(0;6;0.7;TRUE) = 0.073% (Excel)

#

0 successes, 6 trials, p=70%

unique salmon
polar maple
#

the user does 150 reviews at 90% DR.
the user does 20 reviews at 70% DR
the expected distribution is (Binomial(150, 90%) + Binomial(20, 70%)) / (150 + 20)

unique salmon
#

No, I mean, calculate the probability that a card has been failed k times out of n reviews at different DRs

polar maple
polar maple
# bold terrace Yup I see the idea. On the specific example of SRS Kai, it still means the ease ...

the models work by averages over a distribution of possible cards. The forgetting curve is just an average of individual forgetting curves drawn from this distribution, and success/failures should in theory update on these individual cards rather than the average. For example, maybe the model expects a user to learn 1 hard card for every 9 easy cards. Then even after just 3 successes for a new card, the model can update its belief; it's highly likely that this card is one of those easy cards and can schedule a longer interval accordingly. So yeah it's reasonable for ease to update after every review

#

RWKV curves at a fixed stability shows how much this underlying hidden distribution can affect the average distribution

hasty fractal
#

shouldn't this become a different channel already at this point. we will have more interaction from people this way. who knows some random passer-by might get interested.

#

(deleted and reposted cuz didn't want to destroy Alex's message)

#

mods noticed me :blushed:

bold terrace
polar maple
#

on another note, we must prioritize using log loss especially when developing models to account for average performance. e.g. if a card with 0.6 overall retention was just unlucky, we don't want to overcompensate when the 0.7 predictions were already correct. I've shown before that RSME (bins) encourages overcompensation for mistakes so we cannot use it to measure improvement benefits for this

polar maple
unique salmon
#

Well, if we're being realistic, 0.001 improvement in absolute terms, I guess. Like from 0.327 to 0.326

#

So 0.001 improvement in logloss per parameter

#

Maybe 0.002, if we're being conservative

polar maple
#

if you compare GRU-P-short to FSRS's RMSE difference and try to linearly map that to log loss then you get something like 0.00075 per 1% improvement

#

so i guess 0.0015 per 2% RMSE

unique salmon
#

Alright, let's say 0.0015 absolute improvement in logloss per new parameter
@quasi shadow new rule just dropped 🤣

bold terrace
#

Having said that ... If GRU-WHATEVER allow you to translate your DR=80% model to DR=70% without a big loss of prediction precision ...

#

... 😄

#

Would open a lot of doors for future algorithms 🙂

unique salmon
# bold terrace Having said that ... If GRU-WHATEVER allow you to translate your DR=80% model to...

Alex's RWKV would actually have a ton of advantages over FSRS

The more I think about it, the more I think it's actually very desirable

  1. We can make R more accurate
  2. We won't have to show parameters, which means one less thing for users to worry about
  3. We can support proper same-day scheduling instead of the current mess
  4. We can throw in new input features, like time of the day, workload, etc. Not just interval lengths and grades
  5. We can remove "Optimize", which means even less stuff for users to worry about
    RWKV would be pre-trained on 10k users, so it wouldn't need further optimization
#

In that sense, it would be more like ChatGPT - it Just Works™ out of the box

polar maple
#

reminder that it would still be optimizing, but it would just be optimizing on the spot

#

well it would be doing the equivalent to optimizing on the spot is what i should say

#

you wouldnt say that an llm is optimizing on the spot

#

and we don't know how well it translates DR=80% to DR=70%, the only thing we have is faith in its better log loss

bold terrace
#

Is it possible to test such a DR=80% into a DR=70% ?

#

I mean, evaluating how good an algorithm is at that

#

I never actually really understood how we actually test if a prediction is correct or not

#

I mean, you see the user entered "Again" when you predicted a 60% retention, how do you know if it's a good or not prediction ?

unique salmon
#

And then the optimizer tweaks parameters to minimize that "distance"

#

If the algorithm always outputs 100% for every non-Again and always outputs 0% for every Again, then it will have a "distance" of 0, since predictions are exactly equal to real data

bold terrace
#

Ok thanks ! I got it now 🙂 Couldn't guess that it's minimizing the cost between the prediction at again and 0%

polar maple
# bold terrace Is it possible to test such a DR=80% into a DR=70% ?

some things to try:

  1. find gaps in the review history. if someone took a 1 week break then you would expect the retentions to be lower and we can measure the performance of the curve this way. This could be how a 80% DR gets shifted to 70% DR. If someone took a break for the year we can measure the far end of the curve.
  2. trust that the underlying reviews have a high variance in retention. this is probably true since I expect the underlying scheduler to have been SM-2. To illustrate this, if a perfect scheduler produced the data at 90% DR then the data is not enlightening at all, the perfect model would in turn just predict 90% all the time with no curve. But if the underlying scheduler that produced the data sucks then there will be plenty of data to work with already.
#

but i havent tried 1) yet and 2) is just an assumption

bold terrace
#

The forgetting curve doesn't really match how fast I forget between 80 and 70

#

It does predict 80 very well, but it's too optimistic for 70, and too pessimistic for 90

#

I do my review by descending retrievability, in general, I have around ~95-98% for all my DR=90%

#

~60% for all my DR=70%

unique salmon
#

We actually found that different curves are better at different retentions. And now we have no idea what to do with that information 😅

polar maple
#

the best way of course is to run an actual experiment with users but we don't have the resources for that

bold terrace
#

Oh OK now I get the gap idea

#

But if your training set is with people that used SM2, don't you have "by default" gaps ? Since for example, some might have a too high ease factor and it creates gap that FSRS would have filled with reviews ?

unique salmon
polar maple
polar maple
#

and RWKV curves suggests that this might matter for the shape of the curve

bold terrace
#

It's not impossible that for some cases, you get more something like this

#

Typically, knowledge that "hold together" with 2-3 mnemo, that could hold together short term, but suddenly drop when they all dropped

unique salmon
#

We could try passing D into the forgetting curve directly instead of using it only to calculate S. But that would screw up a lot of things, especially how we calculate S0 and the interpretation of S itself

polar maple
#

interpret S as the point where retention is 0.9 and all problems are solved

polar maple
unique salmon
#

I mean this
We would no longer be able to do this since the forgetting curve would also depend on D, so there is another dimension now

polar maple
polar maple
unique salmon
#

The main issue is S0, btw

#

Since it requires choosing a fixed shape of the curve to estimate

polar maple
bold terrace
#

I mean, take this exampel :
You remember 駅 as "Station". Now, you have also 訳, as "Reason". Next time you see 駅, you hesitate between "Reason" and "Station", and you get it wrong.

#

Can we consider you have forgotten the word ?

#

Or should we instead just say you got it wrong

#

forgotten != wrong, my point is

#

In this case, you could have a "long stable knowledge" that becomes a "very short stability one", now because of how you forget, but how you had too much simplification of the knowledge in your brain, by not experiencing similarities

unique salmon
bold terrace
#

And even if we can't do anything about it, it's still something interesting, because it shows that if you see in practice that the "Forgotten Curve" is a lie, it might "seems like it", simply because "Forgotten" is not super well defined

polar maple
bold terrace
#

And maybe one day, the succcessor of Anki, instead of having "Again/Hard/Good/Easy", could have "Forgot/Confused/Recalled/Slow Recall"

unique salmon
bold terrace
#

For example, I use "Easy" as "I already knew it", and FSRS adapted its value quite well to it

#

The issue is that you still have to mark an option as "good or bad", like "Hard" that is for some people "good" and others "bad"

#

Or you have something so flexible like neural network that it will compute for you if "Hard" was used as an Again or a Good 😄

unique salmon
#

It would be cool if we didn't have to treat the grades as binary when calculating the loss, but idk how to do that

#

We could make a neural net that outputs 4 different probabilities (one for each button), but idk if it would be advantageous in any way, and I anticipate all sorts of issues

polar maple
#

RWKV-P predicts 4 probabilities, i believe it should help with improving gradient information

#

but idk how you would do this for a forgetting curve

unique salmon
#

Wait, really?

#

Huh

polar maple
#

yeah i'm even considering predicting the duration of the review

unique salmon
#

It's not that predicting multiple probabilities is impossible, it's that there is no way to combine that with the "forgetting curve" approach

bold terrace
#

To be fair right now, for me a simple linear regression would be a good enough forgetting curve hahahaha

#

I don't play with DR anymore, PTSD from last time

#

I do increment it 1% every month more or less

polar maple
# polar maple RWKV-P predicts 4 probabilities, i believe it should help with improving gradien...

https://arxiv.org/abs/1707.06887
the same idea is used in RL

unique salmon
#

What if we have 3 forgetting curves - one for Hard, one for Good, and one for Easy? 🤣
And p(Again) is just 1 - p(Hard) - p(Good) -p(Easy)

#

Idk how we would use that Frankenstein monstrosity for scheduling, though

#

Oh, and their sum must always add up to 1, which is problematic

polar maple
#

if we make certain assumptions then it should be doable

bold terrace
#

Yeaah ! .01 RMSE earned ! New params day baby !
😄

polar maple
#

nice, you will learn 0.01% faster now

bold terrace
#

Well it seems in the past a "hard" first was similar than a "again" first

#

Nows a Hard first is actually better !

unique salmon
#

Anki users tweaking 13456890 settings to learn flashcards 0.01% faster be like

#

(I am an Anki user)

bold terrace
#

I mean, we're geeks

#

I won't start to feel guilty for it at almost 34

#

(I had to double check, I thought I was already 34)

#

(I know my RMSE better than my age)

unique salmon
#

lol

#

Too bad no amount of Anki and parameter tweaking can help me get a gf

#

Well, I guess theoretically I could make my own dating app with my own algorithm, but that is not going to happen

bold terrace
#

We're together for now 10 years, she's a math teacher, and I can tell you this : even her doesn't care the slightliest about Anki or FSRS

#

It's made for the geeks, for the geeks

#

We should just celebrate it together

polar maple
unique salmon
#

And make them learnable on a per-user basis

#

Hmmm

ashen light
#

just find an anki deck for that topic

ashen light
ashen light
bold terrace
# ashen light alternatively, maybe tweaking the variables less would help 🍃

IMO, but I'll just send 1 message about this topic otherwise this will go from FSRS to "#date-advice" very quickly, being a geek/overthinker is not that much an issue, and as long as you build an "healthy sense of self confidence" it's always great to see passionate people (as long as they are also able to open up to other's passion lol). I always add the "healthy sense of" because you might land on r/TheRedPill, wearing long coats and stuff 😂 .

ashen light
#

maybe what he needs most is a long coat

unique salmon
bold terrace
#

Yeaaah but (OK let's go off-topic a bit :D), you CAN FEEL people when they identify to those exterior things. It's almost like you see someone from Peaky Blinders coming to you

#

And the light, and the posture, and ryan gosling

ashen light
#

the problem is you'd probably look like a dork wearing that 🍃

bold terrace
#

It's like those chinese clothes on Amazon model pictures vs on a random russian in the review section

unique salmon
#

kek

unique salmon
ashen light
#

none of us are

bold terrace
#
  • Do a bit of gym, but don't build your whole identity around it
  • Try to wear nice clothes, but don't become a parody of Karl Lagerfield
  • Try to be positive and nice, but don't a boot-licker
ashen light
#
  • become a monk and join a monastery
bold terrace
#

Available on
► Digital: http://smarturl.it/LynSkyFADigital
► DVD: http://smarturl.it/LynSkyFloridaDVD
► BR: http://smarturl.it/LynSkyFloridaBR
► CD: http://smarturl.it/LynSkyFloridaCD
Earlier this year Lynyrd Skynyrd performed their first two studio albums, “Pronounced 'Lĕh-'nérd 'Skin-'nérd” and “Second Helping”, live in their entirety for t...

▶ Play video
#

Be a simple man

#

WEll, maybe not like them

#

But you get the idea

bold terrace
#

"But not with the little ones"

unique salmon
#
  • Wait for AI to become advanced enough that I can have an AI gf
bold terrace
#

"Or to change church every year, you'll be doomed"

ashen light
bold terrace
#

I might have found on some private trackers some first AI porn

#

It might have been quite interesting

#

When I see post on linkedin about how AI will change the world, I just think about how much they were taking about VR during Covid

#

So many VR Headsets sold for "VR Experiences beyond imagination"

#

With more install of DeoVR than Metaverse

bold terrace
#

The "fsrs4anki_scheduler.js", is it a file from the playbook or the anki github ? can't find it somehow

bold terrace
#

Ok, the step 2.2 gave me something compatible with anki though

#

so probably not necessary anymore to do that step

#

and just cc-cv them in deck options

bold terrace
#

Difficulty with 1% granularity

#

Let's be honest ... It doesn't really bring much much more value at all 😂

cosmic hedge
cosmic hedge
bold terrace
#

Telepathy haha

#

I was thinking "Maybe a Time machine would be better"

#

Doing an average over time, except if you reaaaally zoom on 90-100, you won't see anything

cosmic hedge
#

Did you try it?

bold terrace
#

no

cosmic hedge
#

If you look at mine you can see difficulty really starts going up

#

Maybe i should add an "average difficulty%" just at the bottom instead of as a seperate graph

bold terrace
#

I kinda like being able to see trend

#

Especially like stability when you see very little steps compounding

#

But it should be elsewhere

#

I started as a Pie but then I was like let's use a ScrollBar to configure bins=100

cosmic hedge
#

While i was doing the pies I remember the d3 tutorial had something like "never use pie chats" in it

#

But hey they make the addon look a little less like the bar-chart-fest that it really is XD

bold terrace
#

Yup 🙂

#

Anyway, enough for today

#

No big eureka for this time

#

I think D is probably something better left unseen

#

lol

#

I mean Difficulty right

#

By the way, Github Copilot is atrocious from time to time for JS ...

#

Full full hallucinations

unique salmon
#

Wait, I have a genius idea
@quasi shadow
Right now we estimate S0 like this:
` def loss(stability):
y_pred = self.forgetting_curve(delta_t, stability)
logloss = sum(
-(recall * np.log(y_pred) + (1 - recall) * np.log(1 - y_pred))
* count
)
l1 = np.abs(stability - init_s0) / 16 if not SECS_IVL else 0
return logloss + l1

        res = minimize(
            loss,
            x0=init_s0,
            bounds=((S_MIN, INIT_S_MAX),),
            options={"maxiter": int(sum(count))},
        )`

If we want to make decay depend on D, we could either just assume some fixed value of D...or estimate it from the data!

  1. The formula for converting D into decay is very simple. decay=-0.1×D. That's it. When D=1, decay=-0.1, the curve is flat. When D=10, decay=-1.0, the curve is steep.
  2. The modified minimize function should look like this:
    res = minimize( loss, x0=[init_s0, init_d0], bounds=((S_MIN, INIT_S_MAX), (1, 10)), options={"maxiter": int(sum(count))}, )
    Now it will fit both S0 and D0 rather than just S0. So now we have a way to estimate D0 (kind of) directly from the data.
  3. In pretrain's loss you can do this:
    def loss(params): stability, difficulty = params[0], params[1] y_pred = self.forgetting_curve(delta_t, stability, difficulty)
  4. In the forgetting curve itself you can do this:
    def forgetting_curve(self, t, s, d): DECAY = -0.1*d FACTOR = 0.9 ** (1 / DECAY) - 1 return (1 + FACTOR * t / s) ** DECAY
  5. Then you remove D from the formulas of S and pass D into the forgetting curve instead.

So now we have a flexible curve that can adapt to difficult material. Now we aren't just adapting S for difficult material, we are adapting the curve itself.

bold terrace
#

And then you tell me I overthink 😄

unique salmon
#

Usually "we are not the same" is said ironically, but I am being 100% serious

#

You were getting into sorata level "energy and force are non-physical" crap

#

Actually, I'm not sure which one makes me want to say "shut up and calculate" more - your "forgotten != wrong" or sorata's "energy and force are non-physical"
Both make me want to say "Look guys, don't do this. Just don't. Stop. For your own good and for everyone else's good, stick to crunching numbers, please."

#

And before you say "But 'forgotten != wrong' actually makes sense because..." - stop. Sit down. Take a deep breath. Do it three times. Now...numbers. Focus on the numbers. Or do your reviews. Or go outside. Or do anything else but this.

bold terrace
#

I was just about to suggest you to take a deep breath ahah. Breath in, breath out, everything is fine 🙂

#

You're enough @unique salmon ❤️

unique salmon
# ashen light ...same to you?

I'm not the one who says "Force and energy are mystical" or "Forgetting the card and getting the card wrong are different things"

ashen light
#

you need to go outside though

unique salmon
#

Ok, fair 🤣

ashen light
#

I do agree with sound that forgetting something completely and "is it A or B" are two distinct things

bold terrace
#

Oh no

ashen light
#

like doesn't supermemo have like 3 grades of failure?

bold terrace
#

Don't trigger a new chain reaction 🥲

unique salmon
#

...I'm going to sleep

bold terrace
#

Good 😄

ashen light
#

I think anki needs a two types of failure buttons: "how did I even get here" and "in a 50/50 I am wrong"

bold terrace
#

Yeaaah but in the end it doesn't matter that much since FSRS will just predict when you were enable to get it right (including both situation)

#

The main point was to describe why sometime "forgetting" in Anki can be brutal

tepid spoke
#

I have so many bloody cards that are stuck in forever-leech-mode cause I consistently lose the 50/50

ashen light
#

me too!

#

hence why I say this is an important distinction!

bold terrace
#

Because it's not really forgetting, it's more like suddenly, you realize you learnt something not the "full way"

tepid spoke
#

99% of the time for me it's the fault of rendaku

#

Or the random lack thereof

bold terrace
#

It's also a danger of doing too much reviews I think, the more reviews you do, the more discriminant features will survive to recognize something, so if you reviewed only a subset of a learning domain, your brain will have reduced the patterns to something that is NOT sufficient

#

Once again, I remembered 駅 because it was "The R shape on the right"

#

Without even looking at the left after a few reviews

ashen light
#

the solution obviously is just to add 20 more cards with that word on it

bold terrace
#

But then 訳 comes and now, I built dozens of reviews "recognizing the R shape"

bold terrace
tepid spoke
#

ironically, just looking at the right side is how most people read :D

#

Cause the right side most of the time denotes the reading, and then you just read the words

bold terrace
#

I also something very interesting that japanese are able to recognize kanji even if the center is blurred

#

It's like in their brain, the outside shape of a kanji is sufficient to recognize them

tepid spoke
#

It's just generic pattern recognition

#

in the end confusing two kanji is also a non-issue, since you rarely ever read a Kanji in Isolation and out of Context

#

And then you get into "recognizing entire words by their shape" territory

bold terrace
#

Yeah sometimes it can be nasty things though

#

社会 vs 会社

#

If your brain learnt one by remembering the association of both

#

when the second comes, you're good to re-learn them a bit

#

That's why sometimes adding more cards can help remembering older ones

#

More rooms for connections I guess

#

And less room for too-simplistic/bad-pattern recognition

tepid spoke
#

I can pretty much just read them to their sounds, and then the words are obvious

bold terrace
#

You see

#

Basically you build another recognition pattern, based on pronunciation/reading/etc

tepid spoke
#

階段 vs. 段階 is a much meaner one imo

bold terrace
#

So learning is sometimes a lot of iteration over the same material

#

(But not simple bruteforced iteration, more like remodeling knowledge all the time, until you stabilize it)

tepid spoke
#

I feel like some of the words I'm grinding right now I'll never truely stabilize

#

since they're so incredibly rare, I might never see them outside of Anki

bold terrace
#

That might be a challenge indeed

ashen light
#

the solution obviously is just to add 20 more cards with that word on it

tepid spoke
#

Like, the latest levels of WaniKani have started teaching historical names, and Kanji that appear only in that single name

bold terrace
#

I know in the first month I add trouble remembering じょうきょう (状況) until ONE TIME, I heard a character say it in an anime, and since then, its voice is associated to that word and the meaning with it

#

Same with とにかく that I always hesitated between "BTW" and "Anyway"

#

Until that character in Violet Evergarden say it 20x times per episode

tepid spoke
#

yeah, having proper connections to the meaning of words it vitally important

#

And Anki can't easily provide that

#

though I do remember words I learned from my Grammar-Deck substantially better than the blank vocabs I learn

#

Cause I learned them in some other context...

bold terrace
tepid spoke
#

I also now have the reverse problem this causes on FSRS

#

Since I obviously know a lot of words by heart by now, those always get a good rating

#

But since those get optimized together with the random other ones, those get pushed away too quickly now

bold terrace
#

aaaah indeed

#

yeah

#

Basically, that's why even if it might sounds a bit philosohical, even if FSRS tomorrow had a perfect prediction function ... Won't really change much about Anki itself

tepid spoke
#

So I can either just accept that I will never properly learn those words cause of that, or make my parameters harder and be shown the easy ones way too often

#

Currently opting for the last one, since when I just let it optimize as it pleased, it was actually harmful

bold terrace
#

There's a Youtuber that always insist a lot on how SRS can be super tempting, but in long term fall short against more mind mapping/contextual learning of concepts

#
tepid spoke
#

SRS is just a helpful tool

bold terrace
#

Justin Sung

tepid spoke
#

But to actually acquire a language, it's just not enough on its own

bold terrace
#

Yup

tepid spoke
#

Though tbf, I did learn japanese entirely in Anki to a degree that I could watch native content

#

But that is very much just shoehorned into Anki, and kinda abuses the SRS system a bit

#

as indicated by this

bold terrace
#

For my vocab deck though

#

This is the whole jazz

#

But problem is, sure I know a few words, but how sentences are built, meanings, nuances, I just don't learn them well with Anki

tepid spoke
#

Well, you can

#

But it doesn't exactly fit into an SRS system :D

bold terrace
#

Those past weeks I've been dedicating 30-60min per day really pausing subs, analyzing, and my overall understanding really improved

tepid spoke
#

That's why my stats are so ridiculous on that deck

bold terrace
#

Sure you can !

#

But I think then you're basically trying to recreate the outside world in Anki

tepid spoke
#

The JLab deck really works well

bold terrace
#

Generating sentences, Generating Audio, randomizing cloze ...

tepid spoke
#

but you absolutely must stay on default parameters when using it with FSRS