#FSRS Megathread
1 messages · Page 3 of 1
They use the same underlying code anyway
CMRR is just less realistic for...no reason
Instead of using its own "spherical in vacuum" config, CMRR should copy the simulator config
Do you think it might be worth moving it to the simulator modal just so people can mess around with the settings for CMRR?
Yeah
Ok I suggested that before i thought about where to put it 😂. Simulator modal is going to get even more crowded if we're not careful.
Just add a single "Compute Minimum Recommended Retention" button
If it copies the configuration of the simulator, then nothing else is needed
Do you think we should leave the old one where it is as well to avoid confusing people?
No, then we would have two buttons. Now THAT would be confusing
Just move it where the simulator is
@cosmic hedge , do you think it would be possible to have a graph with Average Stabiliy over Time ? Right now we have the "Review Interval Time Machine" but it's not that practical to see if the trend goes left or right, and the time machine seems to not affect the default graph Anki has
For example here I'm at average stability 1.23
I think last week I was around 1.05 or something
Average and median would be even better 🥲 But I don't want to ask too much

I actually wanted estimated total knowledge over time, but seems like it won't be implemented natively
Only in an add-on
Ah yes indeed in the addon it's quite nice
Sometimes I feel the default/native graph are a bit "Let's give you some graph" without really much thoughts about what kind of interpretation you can make
It's nice to have graphs that "answer questions"
If AVG stability over time is not possible (I don't think it's stored in the revlog, so it would have to be recomputed each time), there's always the option of doing AVG Interval over time, but that would not really be ideal since DR fluctuation could change it while having the same stability
On the topic of graphs, I found something interesting/weird when I make the bins really small on my SxR heatmap
I've not taken any days off since starting this deck, so it's not just a block of days where I did no reviews
I'm not entirely sure why days off would impact this ? Stability something that won't change until you review it again, so having gaps in Stability might just means that based on your params, some stability values might not really be achievable.
For examples, there is not much sequence that would allow with default params to get those stabilities ;
If some stabilities are just not possible wouldn't it be a straight horizontal line on my chart?
I mentioned days off because you can see it has a shape similar to how cards decay each day.
Indeed
Maybe it was to do with a param change, or I had a period of struggling with cards so I never got new cards with those stabilities for at bit. It's a bit mysterious.
I don't know if it has any link, but I see something strange in my Stabiliy in CArd info :
<= 30d, I see the 30 days
1.03days, I see Month (days)
30 1.03 days, I see Month WITHOUT days aside
But it's probably just graphical
Like 1.03 is rounded to 30d, so the UI think there's no point showing (30d)
#1282005522513530952 message should be easy enough
And it doesn't really explain why your hole also goes up with higher stability for higher R
well "easy enough" as in i plan to XD
well yes but the memorised graph already does that
ah ok 🙂
Would be super nice, I'm really wondering how my stability evolve with time
IMO, leave a small-sized (grayed out?) text like "CMRR has been moved to simulator"
That would be annoying and confusing for new users
That would be removed later ofc
let's just have it for one version
for smooth transition
I made a new simple baseline for algorithms that can adapt on the spot immediately after each review
its similar to an exponential weighted average except i use backpropagation on the log loss
i hope this shows why RWKV-P could achieve such strong results
You're gonna need to explain it for mere mortals 😅
its just that the other algorithms in the benchmark are missing out on a ton of information
look at AVG, it optimizes on the same 5-way split as FSRS
FSRS is so much stronger than AVG bc it uses an actual memory model
now if MOVING-AVG already surpasses FSRS, what happens when we add a memory model on top of it? that's what RWKV-P represents
That still doesn't explain the difference in optimization and testing and whatnot
Like, I still don't get why the moving average is better
tbh its the other way around, you'd have to explain to the general audience what this 5-way split is and why we use it since the more natural benchmark is an n-way split
it's somewhat fake performance, occasionally certain users may decide that they're done for the day and just pass the remaining cards
...but what is n?
the size of the revlog
exactly, i've said that
Just say "Moving average optimizes after every review"
bro has mastered talking in memes
But then isn't it closer than FSRS with recency, just taht you're really super aggressive on recency ?
I mean, then it does not really show that it's better than FSRS, just that FSRS should maybe be more aggressive with how recency is weighted
Not quite
FSRS-5 recency doesn't optimize after every review, it just weighs reviews by recency. It's still "optimize on all reviews once". Alex's moving average optimizes after every single review
We could benchmark optimizing FSRS after every single review, but it would take an eternity
Unless we cut down the dataset by a factor of 100 or something
I mean I hit the optimize button every day and it didn't change for the past 60 days
If every optimize change params, it's a bit strange no ? Or much more aggressive recency ?
Sometimes new RMSE is worse than before, in which case the old parameters are kept
But in this new way of doing things, you woudl still select this one even if the new RMSE is higher ?
@polar maple
But question to you also : What explain that after an optimization you get worst RMSE ? Local Minimum vs Global Minimum ?
The optimizer being stochastic, I guess
a problem is just how FSRS is benchmarked. It only optimizes 5 times per each user rather than after every review due to computational constraints since we have 10k users to benchmark), whereas my moving average is not limited by such computational constraints so i might as well let it optimize after each and every review
another thing is that the moving average does not try to predict the outcome of reviews ahead of time which would be important for scheduling purposes, it only tries to predict the outcome of a review immediately before it happens
FSRS mostly does do ahead of time predictions in the benchmark but due to how it was implemented, it can update its predictions 5 times
But then is it even comparable ? I mean, if FSRS could achieve better result in the simulation with more optimization being done, we're not really comparing the precision of prediction/scheduling with the same constraints ?
Also, for an end user like me, even by triggering a re-optimize every day, my parameters stayed the same for the past 2 month. It's more in those situations, that it would be interesting to see if the moving avg perform really better, no ? Then you would really compare performance of both algos on a comparable basis
Otherwise it would mean we're trying to create the best algorithm for the simulator constraints, and not really for people that will actually use it.
that's right, i'd like to see either a smaller dataset for the benchmark where FSRS can be optimized more often or a version of FSRS that can update more often on the fly
many users already don't optimize often or at all, it is not necessarily unrepresentative
a possible way to optimize FSRS on the spot is to do something like a gradient step over the last 50 reviews
this way it's efficient enough to be done after every review
well, hopefully we have auto-optimisation in the future.
but still, people who care about benchmarks and numbers are optimising regularly so it's only fair that we try to see how well that performs (too).
Make a PR plz
still working on RWKV and some other stuff
Auto-optimization was discussed in the past. It is not an option for people who have a need in tweaking the generated parameters after optimization. So if you want to implement it, there should be a toggle (Enable Auto-optimization).
Basically, with always enabled Auto-optimization user can't preserve tweaked weights after optimization, after each optimization it will result in weights, that need to be tweaked: cycle where you can't preserve tweaked weights.
Well, Alex has an interesting idea for optimizing parameters, so maybe you won't have issues in the future
I mean, maybe you won't need manual tweaking
actually im not too interested in it anymore
Why?
i'll explain what the idea was here since it was discussed in dms, the idea is to make an RWKV model predict FSRS params on the spot after every review for potentially better params and also for speed (RWKV can run efficiently on a cpu)
but this would still be strictly worse than just letting RWKV do card predictions directly; imo there is no world where RWKV would be introduced into Anki as a way to predict FSRS params, rather than just doing the scheduling itself
now the other part of the idea is to see how far FSRS formulas can be pushed to the limits but we already know the result of this in a certain sense, if you let FSRS look at the answers by optimizing on the test set, you still get a model that does slightly worse than LSTM
and this idea would take a while to implement so its pretty much a waste of time
copying from a previous message and added RWKV :
I wanted to see how expressive is FSRS' formulas so I decided to train FSRS on the test set that it would be evaluated on (same 5-way split), and I did the same thing for LSTM
Total users: 100
Number of reviews: 2097825
LSTM (cheat): LogLoss (mean±std): 0.3303±0.1598
RWKV (normal): LogLoss mean: 0.3429
LSTM (normal): LogLoss (mean±std): 0.3546±0.1668
FSRS (cheat): LogLoss (mean±std): 0.3550±0.1698
FSRS (normal): LogLoss (mean±std): 0.3743±0.1767
Man...
Oh well
We could still try my teacher-student idea, though
Or you could find a way to make sure that RWKV can do scheduling properly and doesn't do anything weird, like p(recall) increasing as time passes or the interval for Good being longer than the interval for Easy, that kind of stuff
And then we could implement it in Anki instead of FSRS
Jarrett wouldn't be happy though 🤣
RWKV (non-P) predicts monotonically decreasing forgetting curves so the first part is automatically satisfied
the second part could be a layer on top by the scheduler
i'm still hoping to find some simple insights as to where FSRS goes wrong, maybe small changes to FSRS can largely close the gap
We would have to remove S-related stats, I assume? Since it has 3-4 different S values
you can still compute an S as the x where p(x) = 0.9
Would that be meaningful for RWKV, though?
nope
Probably not that much
To give some numbers, I would be very surprised if FSRS's RMSE can get below 4%
Maybe if you make some massive changes to D, idk
If we really, and I mean REALLY want every last bit of predictive accuracy, we're going to need a neural net, I'm certain
i forgot to mention, RWKV as trained right now also does same-day review predictions
i only filter those out when finding the stats for the benchmark
I wonder what Jarrett's and other people's reactions would be like if we announced "Oh, yeah, remember FSRS? We're not going to be using that anymore, we're going full black-box neural net now"
https://expertium.github.io/Algorithm.html
Btw, have you read either my article or the FSRS wiki's "Algorithm" entry?
personally i wouldn't mind a nn, but people want to customize all sorts of things and it would be impossible to customize RWKV beyond setting a desired retention
Desired Retention Is All You Need
(unironically)
i never read the specifics of FSRS-5, i did read Jarrett's paper that i assume is similar to FSRS though. I should probably finally learn what FSRS does lol
since RWKV has a similar RMSE to LSTM but a much better log loss, maybe the largest difference is just the shape of the forgetting curve. GRU-P was forced to use the same forgetting curve shape as FSRS
actually one of the first experiments i did on srs-benchmark was to make GRU-P learn the decay exponent and as expected it gave much better results than a fixed -0.5
You mean standard GRU? GRU-P doesn't have a fixed curve
yeah standard GRU, oops
ok nvm GRU-P does'nt use a forgetting curve so my whole point there is bogus, forget it
I'm not sure how you intend to use RWKV to improve FSRS. I mean, you can see that it's more accurate, and you can even look at specific cases where difference is especially large, but that doesn't tell you what a good formula should look like
the goal is to find systematic errors in FSRS. RMSE is actually a decent metric of this
RMSE (bins) only measures the bias of each bin, it doesn't really care about the specifics of individual predictions
information is lost when taking averages
Still, it doesn't tell you what a new formula for FSRS looks like
How do you go from "For these reviews RMSE is particularly big" to "This is the new formula for FSRS"?
i dont understand why you wonder that, this is just a normal part of data analysis, we just need to fit some curves
jarrett this whole time makes many plots of random things
i've made LSTM vs FSRS curve plots
make plot, make prediction, test it
Is it thought? A NN might learn something more complex and schedule things at seemingly random R, but be more effective than a static or monotonically increasing DR.
For all we know, there may even be windows in which it is better to review. If you miss a window, it might be better to wait for another one than just do backlog as soon as possible.
We can add a "Dynamic desired retention" toggle, so that the user can switch manual control on/off
I meant that I can't think of anything that is both unrelated to desired retention and equally important
That's not possible with a monotonic forgetting curve
And I will sooner eat chalk than believe in a non-monotonic forgetting curve
That's what I mean. We make that assumption, but is it strictly true? If I don't do my reviews in the morning when I am awake, is it better to do them tired or wait until I am fresh the next day?
It could be all wiggly, not a nice slope.
That's a different question though, since now you're adding another variable - how tired you are
It's just really hard to capture the "true retrievability function"
Anyway, if we can make a NN that doesn't do anything weird with intervals, I'm on board with replacing FSRS, as long as no functionality is lost (minus D and S graphs, but whatever, that's not important)
I'm more worried about things like interval(Hard) < interval(Again) or the next interval being x100 times longer than the last one
Well, tbf, both are solvable
Just add some extra scheduling rules on top of the NN
The more I think about it, the more I think it's actually very desirable
- We can make R more accurate
- We won't have to show parameters, which means one less thing for users to worry about
- We can support proper same-day scheduling instead of the current mess
- We can throw in new input variables, like time of the day, workload, etc. Not just interval lengths and grades
- We can remove "Optimize", which means even less stuff for users to worry about
That's a whopping bonanza of advantages
I could see a situation where that might be desirable. Sometimes I accidentally reinforce an incorrect memory. You might want a quite long Again interval to deliberately forget a bit and make the memory more malleable.
It would be confusing, since Again is supposed to be "Fail" and Hard is supposed to be "Pass", so Again > Hard would be weird
Everyone is already used to Again < Hard < Good < Easy, and it makes sense intuitively
Sometimes intuition is wrong 🤷♂️
Sorry if that sounds mean, but no, it just means that the way you use answer buttons makes no sense
And you need to change your rating habits
I'm not saying I'm doing any weird manual adjustments to my rating right now. I'm just saying in some circumstances it may actually be more effective in terms of learning/study time to give a bigger interval to something you forgot, than something you remembered but struggled with.
I could also be completely wrong. I'm just saying I would not completely discard it as a possibility.
Even if it's better in some fringe cases, I still doubt that the gains from it are worth making buttons unintuitive
Anyway @polar maple thoughts on this?
I mean if we are being radical you could argue that an ideal scheduling algorithm might not statically schedule a card at the time you review it because its retrievability could be affected by cards you review later.
- we could hide it right now if you want
- short term scheduling is still difficult, i don't think desired retention is the right metric for short term scheduling
- auto optimize is not even allowed for FSRS even if you could easily do something like the mini gradient step idea
yes, RWKV-P the entire sequence as input
RWKV uses the entire sequence as well but it cannot reschedule cards. i can work on such rescheduling later
but RWKV-P represents the limit of what's possible when you use most available information
- I meant that since RWKV would be pre-trained on the 10k dataset, there would be no need to optimize it
RWKV right now maintains a hidden state at the card level, the siblings level, the deck level, the preset level, and the global level. the way it updates these is kind of equivalent to if we had auto-optimize in FSRS and I think this was unwanted due to syncing issues
b-w matrix should be under the memorised graph now if you update SSE
hope i didn't screw it up somehow 🙏
Where exactly is it meant to be? I might be dumb, but I cannot see it.
SSE v1.10.0 from source. I've tried both Anki 25.02 and Anki git main
Under a dropdown under the memorised graph
On Anki git main?
Nope in search stats extended
I mean Anki git main + SSE v1.10.0
Go to the memorised graph and it should be under there.
After you've run it
In search stats extended, not the simulator. The past memorised.
Ok, it's there. I was just confused when you said simulator earlier because I am dumb, and no other reason. ;p
I've edited it now no one has to know 🤫
@cosmic hedge the curse continues. The Y axis is really blurry for me on the B-W matrix. The Y axis is still fine for me on the SR Heatmap though.
Maybe it's something to do with the viewBox starting at x=-40 instead of 0?
Nope that works fine for the other graphs i think
If thats the problem I'd rather not reprogram it 🥲
Google didn't work, chatgpt didnt work. I'm stumped.
Intriguing. Try setting opacity="1" on the<g> for that axis.
It miraculously unblurs for me.
???????
I guess I'll just make those axis not translucent then 😂
The only sensible thing I can think of is somewhere there is a globally scoped bit of CSS with filter: blur() that matched your axis. I have no idea where it would be or why though.
That or just a really weird bug in chromium.
nope no filter: blur()
The raw (opacity="0.5") SVG renders just fine in Firefox, so it is something specific to Anki. It's just really mysterious.
It's great because it's also open-source.😎 Then I can try to use it in my work.
What's RWKV-P?
Does it consider all reviews of the collection when it predicts the P(recall) of the next review?
if you iterate over the revlog, RWKV-P predicts the outcome of the row that we are looking at and it has as input all the previous rows
btw i have uploaded MOVING-AVG here
https://github.com/1DWalker/srs-benchmark/blob/705bb5084a12f402722c14ea1d02b07d1ce135cf/other.py#L2646
Is it possible to draw the forgetting curve for a given card with RWKV-P?
nope. i have a version RWKV that does curve prediction but i haven't trained it to adapt its prediction over time; in theory I could've made RWKV-P predict a curve but it would still be equivalent to just predicting a P directly since it knows delta_t.
If I want to train a model that can update its curve predictions in a reasonable manner, I would likely have to sample a random point between the last review and the current review as a point to update the prediction. if we represent this time interval as [0, 1] then RWKV is directly at 0 and RWKV-P is at 1. So RWKV-P not predicting curves is not necessarily important, it just says what kind of performance is possible at the right endpoint of the interval
also another reasonable time is to make RWKV predict a curve at beginning of the day before the reviews start to lessen the impact of a user's mental s state affecting a day's reviews
Fine. What about the simulator? Could we use RWKV-P as the simulation environment?
probably not well. the RWKV models rn use global-level information, akin to if we optimize FSRS params after every review. The simulation does not support this sort of thing and it would be very out of distribution if we try to mock it, i think
but we can always go back to the LSTM nn for the simulation environment
LSTM works at a per-card level so it would work well
i could also train a RWKV model that locks in its internal parameters and then works as a per-card model afterwards
the only thing LSTM makes difficult right now is that it uses the duration of review as an input. I could just remove that feature and then we would have something that would work for the simulator right away
Seems like the RWKV-P could capture the impact of interactions among cards.
Because it uses the entire sequence as input.
If we change the order of previous reviews of other cards before the next review, RWKV-P will give a different prediction, right?
yes
this was the goal of such a model, to use as much information as possible
the only information that I reject is the parent_id of the deck
rwkv uses note, deck, presets
sure, do you have any code for it that you used for fsrs, please link it
thanks
Maybe a dumb question, but since you have a finite number of weights in neural networks, does it mean that with too much cards in the revlog, the model would be worst to predict the new rows based on previous, if the number of previous rows is so huge that the number of weights don't allow to map everything ?
Personally I found the approach very nice if it's doable on every user computer ... I mean, having each anki cards having their own "little bubble of stability/difficult/retrievability" without considering others always felt like a limitation of all current schedulers. Having different cards potentially being dynamically linked is very very nice
Based
Hi guys, I'm an FSRS noob. I have about ~1000 reviews and I've literally just activated FSRS and optimised it. I've set my learning steps to '10m 15m.'
So, with that context, my question is as follows:
I used to have 'easy decks.' These are collations of information I find easy to recall (to the point where I'm pressing easy for pretty much every card). I use Anki in preperation for my final exams in October so I make these decks just to ensure I remember all this content by the time that rolls around.
With the old SM-2, I simply just increased the graduating and easy intervals.
So, as you've probably guessed, my question is how can I do this on FSRS.
Side-note: my RMSE seems to be a 5.62%. Is this an issue? The guide says a lower value is better, and showcases a 2.03%.
Thanks in advance
If you want to increase/decrease interval lengths, adjust desired retention. Higher DR = shorter intervals
Also, I recommend reading the manual: https://docs.ankiweb.net/deck-options.html#fsrs
Don't worry about RMSE btw
So yeah, desired retentino is the "lever" that you pull to steer FSRS
DR should be according to how well you want to recall the card, right? I still want to recall these easy decks well, so will changing it lower affect anything other than the interval length?
ahhh nevermind, i get it now
thanks for your help!
You'll be able to recall fewer cards when they are due
Desired retention is like "How many cards do you want to be able to recall when Anki shows them to you?"
yea lol it seems so obvious now
yeah it's possible, just like how LLMs don't do well with very long contexts. But Anki reviews has nowhere near the complexity of human language so it might not apply here. Instead i guess that RWKV just aggregates statistics and having more and more reviews would only benefit more
but for random interactions like "on weekend X the user learnt some new cards that were particularly difficult", RWKV would struggle on this more
Still working on it, but plotting Average Stability over Time gives quite good insights ... For those past 90 days, I stopped adding words, and I increased my DR bit by bit, so my review count doesn't drop that much, but I still wanted to see if my stability was increasing or not.
It's also insightful because it shows that even if your R is not dropping much, and you're Memorised curve is growing quickly, adding more and more words still tend to lower the Card Stability (And maybe not just because of new words, but also older cards being replaced in memory)
Ofc, If R is similar, and Work Load is bigger, then you can infer that stability is probably dropping, but it's a bit more direct like this
Maybe an "only mature cards" or "only cards first reviewed before ..." option would be interesting if you wanted to see how new cards affect overall stability.
That's a nice idea indeed. Having each vertical bar with a lighter and darker green if the stability contributions comes from mature or young
I need to refine a bit the graph too, right now I use the "buckets" of stability, so if a card has stability 1.2, it counts as a "1". It doesn't cause too much issues because the default Anki plugin gives me an average of 1.27 months and mine 36.5 (so I guess 1.27 month = 30*1.27 = 38.1d)
Ideally I'd like also to add the median because it can be quite different, I have a median stability of 20d while my average is 38d
Very easy card bend the avg ...
(Those are the default graph right now, which is nice but doesn't give you a sense of evolution)
@bold terrace BTW Luc merged my SR Heatmap and released an update to SSE yesterday if you hadn't noticed.
I advise Log(S) otherwise all the small S cards get clumped together.
I saw it indeed 🙂 Wanted to share earlier a screenshot of mine just to show the same kind of holes appearing at some places 🙂
Allowed me to detect that I have a huge bunch of reviews waiting for me in Mai-June 🙂 Those ~80S 94-98% S
@quasi shadow
And i also made these where i aggregate users 1 to 500
I participate in a new repo today: https://github.com/asukaminato0721/visual_novel_recommendation_engine
😎
Wow, RWKV-P calibrates perfectly.
I wonder why the MOVING-AVG performs so well even if it doesn't have a model about memory.
same, but i expect it to do well in a calibration standpoint since i thought it would only predict a narrow range of values and it would exploit the naive RMSE pre-bins formula
but unexpectedly it predicts a wide range of probabilities like the other algorithms
so its not cheating in that way
For those who want to already test the Card Stability over Time, I have a PR and a local build
PR : https://github.com/Luc-Mcgrady/Anki-Search-Stats-Extended/pull/32
local build : (attachment)
I'd like to do the median, color-code based on how the avg is coming from young/mature later on, but I won't have time until probably next week-end
@quasi shadow bad news for SSP-MMC, if you give fixed DR the ability to make the last interval shorter to minimize effort to reach the arbitrary "3 years stability is treated as infinite stability" rule, then the the gap is closed
is this with RWKV?
no this is with the FSRS memory model
but if SSP-MMC-FSRS cannot do better than a fixed DR then theres no point in adding a different memory model
What if we count the number of cards which reach the target stability at the end of simulation?
I guess SSP-MMC is still the best.
Btw, I have known that the optimization goal of SSP-MMC is not equivalent to maximize the retention at the end of simulation.
for this version i set memorized_cnt_per_day[today] = (card_table[col["stability"]] > s_max).sum() which i think should be counting what you want
Weird. Maybe the eps is not small enough to find the optimal policy?
i think if for the cost you only included cards that reached the target then SSP-MMC would do better
otherwise, maybe SSP-MMC is keeping many cards at low R and these cards dont reach the target
did i write the right expression to only include the costs for cards that reached the right target?
reached_target = card_table[col["stability"]] > s_max
memorized_cnt_per_day[today] = reached_target.sum()
cost_per_day[today] = card_table[col["cost"]][reached_target & (true_review | true_learn)].sum()```
there has to be a mistake, otherwise the fixed IVL doesn't make sense
i see, i think i was just looking at the cost for the last review that brings the card to the target stability
cost_per_day[today] = card_table[col["cost"]][reached_target & (true_review | true_learn)].sum()
Yeah, this line has some problems.
We need to add a new col to the card table to record the total cost per card.
Btw, Expertium mentioned you here: https://github.com/open-spaced-repetition/fsrs-optimizer/pull/166
also "knowledge per minute" is slightly inaccurate since its also multipled by the number of days in the simulation
it makes no sense that you would learn 863 items per minute
hey guys I just had a cursed idea: if one of the blockers for fsrs auto-optimize is people who have custom modifications to their params, can we just add a filter hook for fsrs param optimization so an addon can do the modification for them?
A hook could be useful for automated twiddling.
I was under the impression that Dae's main worry was about there possibly being lots of sync conflicts caused by concurrent optimisation on different devices.
Clobbering hand crafted params can be simply prevented with an Auto/Manual toggle.
We will have native "Grade Now".
jarrett is the best thing that happened to us recently
I have replied just now.
It is a horrible idea. C'mon, Just add the Toggle Switch that enables Auto-optimization and grays out the weight input field (makes it inactive), so that user can not change values in it, but still can copy the weights.
Also, as rossgb mentioned, the main concern with Auto-optimization is sync conflicts.
Auto-optimization will not make scheduler any better, it as a copium.
I was previously for auto optimisation but now since the 0 problem at params w18 w19 has not yet been resolved, I am reluctant. I don't want it to go back to having zeros there. This is still an issue with just some makeshift bandaid put on it.
I'm guessing someone might have solved this for you in the meantime. but I'm pretty sure you might just be looking at your RMSE instead of your log loss (what FSRS optimizes for)
with the defaults your log loss goes up both times
I'm not very well aquainted with your problem sorry 🤷♂️
Yeah, we need something more like total knowledge divided by average time per card, not per day
opinions on "true retention" versus "retention rate"?
I think true retention is a horrible name
it's just some historical relic, don't see any reason for naming the stat as such
- the name is more confusing in some other languages which don't have the habit of adding weird adjectives to nouns to make a cool terminology
@quasi shadow I believe you said you will look into setting the last two params to some small values instead of zeros
So does it change the interval? Because if it does, the UI should reflect it
Yeah, I will test my idea this week.
It has the same effect as a normal review.
The UI doesn't show intervals
You can see the interval in the browser after you grade it. And it’s very easy to undo this action.
And it’s impossible to show the interval if you grade a bunch of cards.
Can you explain how it interacts with scheduler/load_balancer
Here it creates a new state: https://github.com/ankitects/anki/blob/eb1ed140223aca5fec34a2b6b821a9a93a5bf30c/rslib/src/scheduler/reviews.rs#L150
(let states = col.get_scheduling_states(card_id)?;) which contains all scheduler/load_balancer logic in the next_states methods. Then that new state goes to new_state: Some(new_state.into()), with new_state selected by grade rating and with new interval
Then this new state goes to revlog_partial:
let revlog_partial = updater.apply_study_state(current_state, answer.new_state)?;
self.add_partial_revlog(revlog_partial, usn, answer)?;
But where this new state/interval becomes the new due for the card?
In the fn maybe_requeue_learning_card the entry is created with card.due
let entry = LearningQueueEntry {
due: TimestampSecs(card.due as i64),
id: card.id,
mtime: card.mtime,
};
But when exactly the new state/interval get to the card.due?
It’s hard to explain. The way I understand the code is inserting a lot of print into everywhere, doing a review and checking the log.
Ah, it is likely to be in the fn apply_review_state: https://github.com/ankitects/anki/blob/63c2a09ef6760890c03be4bd83f613c03c512d1f/rslib/src/scheduler/answering/review.rs#L12
fn apply_<insert_state_here>_state, to be precise
Same
Just debug by inserting print() everywhere 🤣
Classic 😄
I don't have access to my working (for development) machine at the moment, so I am reviewing the code directly in the GitHub 💀
I was checking the PR for disabled load_balancer support. It seems to be ok, because scheduling logic happens in the (let states = col.get_scheduling_states(card_id)?;)
And after that new entry is just saved in the queue
I have an idea. What if we use the p of moving average and the forgetting curve to calculate the “stability”?
I wonder how the stability of a given card predicted by moving average changes over reviews.
Does it follow an intuitive memory pattern?
I don't think there's anything interesting to find, it really is just looking at the average of the recent reviews. I uploaded the raw file for the first 100 users, you can get a sense of how quickly/slowly the moving average changes to new reviews
https://raw.githubusercontent.com/1DWalker/srs-benchmark/refs/heads/moving-avg/MOVING-AVG_raw.jsonl
Yes this problem is extremely annoying. I thought Optimize should always keep params with better RMSE and not change it to worse
After optimizing (notice the 0 at w18 and 19)
but (forgive me for repeating myself) isn't the log loss the one that matters?
that at least explains to me why the last 2 values would optimize to 0.
Normally there is a check that if the new RMSE is lower it won't use the new parameters, if I recall correctly
But maybe it has changed
There was some discussion about log loss being a better measurement
It's a new thing. If the last two parameters result in a situation where after all same-day reviews after a lapse the next interval is shorter than before, the last two params are set to zero.
For example: the card had an interval of 10 days -> you forgot the card and pressed Again -> you did a same-day review and pressed Good -> you did a same-day review and pressed Good -> the next interval is 15 days
However, setting to 0 is overkill. There should always exist small non-zero values that don't cause this issue. Hopefully, Jarrett will work on it
@cursive badge @cosmic hedge , I've tried something for the Card Stability over Time with Young/Mature contribution ...
Basically, it's not that the Stability of Young is 5.79 and Mature 30.48 in this example, but it represents the ratio of the young average and the ratio of the mature average to represent the average.
For example : Let say your Young AVG is 10, and Mature AVG is 10. Your Total AVG would still be 10, but since they have both a ratio of 1/1 of contribution, you would have YOUNG Contribution (Ratio) = 5, MATURE Contribution (Ratio) = 5, Stability Total = 10
It's a bit confusing at first but it can be very insightful to see if your stability is driven by young or mature card
But at the same time, I'm not sure how much it bring info, since in general, younger card wiill have low stability anyway, so apart if you have a 2x more young than mature cards, young avg stability should not really matter much
Wait, nevermind ... since it's an AVG, the amount of young card won't change anything .......
So I think the Young/Mature split is just useless 
I'm just impressed you managed to pull it off with my crap code-base 😅
RWKV curves tends to drop out quickly at the start. I believe this is from a failure to encode the card in memory properly. RWKV was trained to also predict same-day reviews so these could also be from needing to anticipate failed re-learning steps while on the other hand FSRS just assumes that some relearning steps has already happened
Regarding the asymptote behavior, we know from the aggregate calibration graphs that FSRS does underpredict for low R, so this suggests that more likely than not RWKV is correct here. RWKV has near-perfect calibration
How many params does the RWKV forgetting curve use?
it uses a a weighted sum of 4 power curves
def forgetting_curve(self, w, s, d, label_elapsed_seconds):
return 1e-5 + (1 - 2*1e-5) * torch.sum(w * (1 + torch.max(torch.tensor(1.0), label_elapsed_seconds) / (1e-7 + s)) ** -d, dim=-1)
some of those numbers are for numerical stability
w: [0.01882943883538246, 0.10095643252134323, 0.4705328941345215, 0.4096812903881073], s: [2.923821449279785, 77058.578125, 32219614.0, 1296332928.0], d: [0.4262007474899292, 0.010785482823848724, 2.2498745918273926, 0.988442599773407]
w: [0.03417219966650009, 0.13965930044651031, 0.4500780999660492, 0.37609046697616577], s: [1.9724270105361938, 63864.07421875, 28334582.0, 950819136.0], d: [0.3487342596054077, 0.011508260853588581, 1.5801490545272827, 1.6873809099197388]
w: [0.004207565449178219, 0.24725909531116486, 0.34103667736053467, 0.40749669075012207], s: [5.856897830963135, 811459.875, 29277376.0, 7455286272.0], d: [0.36546364426612854, 0.020167982205748558, 3.5514843463897705, 0.28269582986831665]
w: [0.01884218119084835, 0.17774465680122375, 0.41338032484054565, 0.3900328278541565], s: [11.130600929260254, 97529.4140625, 25254084.0, 1843096832.0], d: [0.36693474650382996, 0.01851370558142662, 1.8968570232391357, 1.2613049745559692]
w: [0.015502666123211384, 0.2253720462322235, 0.25992316007614136, 0.4992022216320038], s: [7.5156378746032715, 88570.5625, 16159777.0, 4799276032.0], d: [0.3775652050971985, 0.017523538321256638, 2.102546215057373, 0.41862088441848755]```
here is a sample of these values, each row corresponds to the first review in a user's revlog that has ~30 days stability. the stabilities are in seconds
so to me it seems that w[0] is the immediate dropoff in the curve and w[3] is the asymptote
w[1] and w[2] control the main shape of the curve
Btw, I added moving average into my metric comparison.
It doesn't perform well in random sampling data.
So, MOVING-AVG really learnt something from the data from real users...?
Otherwise, it cannot calibrate so well as that.
lol i also tried this to see if MOVING-AVG was somehow doing well by definition
i think that somehow user's momentum or mood matters a lot
that, or the underlying scheduler (i assume SM-2) is able to consistently schedule similar cards together
and MOVING-AVG just becomes a calibration step
but i doubt sm-2 is any good so idk
https://pastebin.com/AAgBMbHK
i made this before, i think it is for user 58 for which RWKV/LSTM/FSRS does horribly at at 0.65+ log loss but RWKV-P/MOVING-AVG does well at 0.33/0.42
ahead refers to RWKV, it has to predict the outcome of the review ahead of time, right after the previous review of this card. imm refers to RWKV-P which predicts the outcome of the review immediately before it happens
you can see how this user has long strings of 0.0s or 1.0s. At the end of the file you can see the imm column creep up and up. MOVING-AVG would also be able to exploit this behavior. So, the performance of RWKV-P and MOVING-AVG is fake in this sense, i don't really think this kind of knowledge is useful for a scheduler
actually it just depends on if the user is truthful or not. If the user is truthful and there are long strings of 0s or 1s then the scheduler should be able to adapt on the fly
otherwise if the user is not truthful then yeah it is fake performance
some users just want to pass all the remaining cards to get their day over with
And what is soooooo problematic abou this behaviour that it is suddenly decided that when that happens, the last 2 should be set to 0
I cannot find the issue about it.
Could you open one for that?
I have drafted up a PR: https://github.com/open-spaced-repetition/fsrs-rs/pull/297
I don't think there is an issue specifically about it, just a comment somewhere
I hope you elaborate the current problem in a new issue or comment it below the PR.
Done
Your code is not crappy at all 🙂 And for now it's me that is adding a lot of plain-flat-logic when it should be refactored a bit haha. But I like to keep it that way until I'm happy with the result. I was thinkng, maybe the ratio should not be a ratio of the average but a simple ratio of young/mature dividing the avg stability, so you can see the "volume" of reviews potentially impacting how the stability evolve, I'll try that a bit later
I was even thinking earlier, how clean and even predictable the avg stability increse is, I'm wondering if that increase is not driven by sheer repetition volume, which means more repetition, even though it might not be "optimal" (in a review/time optimal way), might be how you build more quickly increasing stability
Which would then be another justification of why, higher retention than the theoritical optimal one (in terms of knowledge/review), can be a good thing
Because the "Memorised" is just a view of "how much words you can have right at a certain point of time". But it does not take in consideration "for how long you will be able to keep them memorised", where stability is exactly that
So an optimal scheduling, might more often than not, be not only related to how much you memorized, but how high you were able to build stability.
Now the question is, how much to sacrifice one for the other ? The R*S, R*log(S), ....
But avg stability and DR are not even sufficient to really determine this. Indeed, the number of new/day, also impact the rate of Increase and even decrease of avg stability over time
For example, in my case, I was able to recover my "old avg stability" ~29d, when I stopped adding new words after around 30d of stopping adding new cards
Which can be explained partly by the volume of very-low stability cards that had to build over, but still, it's still a long time to recover a stability that was not that crazy in the first place
Isnt there a way to not just make it happen in the first place. Why should there ever be a need for small values instead of 0s. I am reviewing my cards just fine with default values there instead of 0s
I've explained this 3 times already
Basically, it detects whether using same-day reviews to adjust memory stability could result in a situation where your next interval after a lapse is longer than before. For example: 10 days -> you press "Again" -> you have an insane number of re-learning steps -> you do them, S increases -> next interval is 15 days
If the optimizer detects that your re-learning steps and parameters would result in that kind of problem, the optimizer will run for the second time, but the last two parameters will be "frozen", meaning that same-day reviews will have no impact on S
So if something like 10 days -> Again -> (your re-learning steps) -> 15 days can happen, the last two params will be set to 0
For example: the card had an interval of 10 days -> you forgot the card and pressed Again -> you did a same-day review and pressed Good -> you did a same-day review and pressed Good -> the next interval is 15 days
bro 🤣
Yes that part I get that.
I call it the pass-fail-pass-fail trap
I am saying that wasn't a problem beforehand
What made it into a problem. It was working just fine
It doesn't help like at all
It was a problem for as long as FSRS-5 exists
But in what way is it a problem exactly. FSRS 5 was working just fine and still is
Now with 0 being practically set every time I optimize at w18, w19, FSRS 5 is basically switched off
somehow my new cards actually increased my mean stability 😂 (probably because my initial stabilities are 31.3 for good and 100 for easy)
The kind of situation where the median can help a bit 🙂
But it's funny that your AVG stability is around 60 but Good are 30 and Easy 100
Would mean you would fail a lot of those after a few reviews
and strange that then FSRS optimizer doesn't learn from it and make the initial stability lower
Your DR is at how much ?
Strange strange that it's so high
Because if you really do succeed them 80% of the time
those stabilities would get even higher
so the average being at 60d feels super low
Can you test those sequences with your parameters (and desired retnetion ) ?
1331333
1313333
3333133
4313333
i think its to do with the fact that re-did all my cards a while ago (hence all the re-introduced (I massively regret not just making new notes))
so some of the info i already know
yes that could explain
here
Yeaaaah got it
Basically the model learnt that if you know it already, well, you might at well not review it anymore
Personally I also creaet sometimes card for what I already know, but I make sure when I review them I press Good for the one "I just knew based on inference" and Easy "the one I know very well from before"
yeah i've only ever failed 26 cards i initialy did good on
And I also have a very large stability for first-easy
which makes sense because if going in i knew it who cares
I know with time my parameters evolved to make it less optimistic
I think once you'll fail some of those it will adapt
but if you don't after such a long time, it's OK I guess
I'm going to assume when I start encounering the cards I don't know from the start I'm just going to hit "again"
so Idk if it will affect it too much
It's just strange I have a better stability for Good then you
But I don't have such big initial stabilities
If you want to test, I did the split Young/Mature, but be aware that it's looking at stability >= 21, not interval so the ratio will be different than the one presented by anki
I think it would be better if Young/Mature was computed on stability and not interval
21d is "nothing" with a DR to 70% compared to 90%
(also I did not implemented it for median, so only avg with this build)
(Hop quick implementation for the contribution ratio with median)
I updated the PR with this build https://github.com/Luc-Mcgrady/Anki-Search-Stats-Extended/pull/32
No it's the search stats extended from the almighty @cosmic hedge
it was a problem for my deck. here's a look into the issue:
- I have a new card. I learn it. ivl = 3d.
- I fail it after 3d. I relearn the card. ivl = 5d.
- I fail it after 5d. I relearn the card. ivl = 10d.
- I fail it after 10d. I relearn it. ivl = 14d.
- I fail it after 14d. I relearn it. I see next ivl is 21d. I open discord and start spamming Expertium's DMs with a long rant. Then there is a issue opened in the repo. The Jarrett solves it. Yay! Happy ending.
I unconsciously wrote "The Jarrett" 🤣
honestly agreed
That's not the same though
Your example has nothing to do with same-day reviews, whereas the "set last two parameters to 0" thingy is specifically to deal with same-day reviews
Unless by "relearn" you mean "it goes through a bunch of re-learning steps", in which case yeah
what other "relearn" do you know of?
please enlighten me
(bruh there's only one relearn)
Idk, I was thinking of something like what Sound plots
Woooo! Ross' graph mentioned ;p
I assume those really low R are suspended cards? You might want to set a custom search at the top to deck:current -is:suspended for nicer bins.
You mean you fail and right away the interval is longer than the previous one ? What’s your params ?
After going through the steps...
Because if that release was multiple steps, at least in your case you always go to higher stability which is already nice
The issue is solved though in the current ver.
Ah ok !
lets find out
By fixed you mean what happens now ? You fail and the interval is not reduced ?
Ross made it search if you click the squares so that's a quick way to see if you want 😄
The next interval is reduced.
we should probably think of it as a miscalculation on FSRS's part. plus, the changes made actually improved the metrics for my deck (although it got slightly worse for the 20k dataset).
i see
for my language learning
so what does this mean
am i winning
I see ! because indeed, based on params a failure can sometimes be drastic. In my case, for example, It can be easily a 18-30 factor
But it's not a bug in a sense that indeed, many cards in my deck behave like this
In my case, forgetting is a "very bad" incident 😄
brother I have no idea what are you talking about but more importantly, I think u have no idea what I'm talking about either
Maybe haha. I think you were complaining that you had to redo a lot of relearning steps after a fail
relearning not in a anki term, but in a "lot of reviews"
Not really. I had two steps for relearning. Making a relearning card go through them with two good ratings meant the stability got higher than it already was. Which meant now I had to recall a card 13 days later that I already just failed with a 7 day interval.
ye
if the interval keeps increasing after every relearning session, no way I'll ever pass such a card.
I made a presentation to a group of researchers at Cognitive Computational Neuroscience two days ago. Now I know a fun fact: they didn't do any research about "long-term memory" in the sense that Anki users would understand. In their term, the scope of "long-term memory" is several minutes to hours.🤣
😅 Their long-term memory research is my short-term memory research.
I thought my stability was low, but it turns out my memory is long as fuck boi. 😎
what was the presentation about?
My papers.
how did they respond? were they interested, or they were like "nah our research is good. LTM is a few hours"
Their professor is interested. The graduate students feel alien with my papers’ topic.
Anki is not very popular in China.
yea, not here either
people are kinda stuck with coaching/school and stupid traditional methods
guess it's the same with China too
I mean, Anki is not that ground breaking in the first place. Anki without the full suite of addon/integration is quite ... bland. I have a few colleagues using Anki, when I discussed with them about it, they were surprised how rich my cards were, because they just did the "Anki the normal way" and they get extremely bland cards, at a very high human cost
As I was saying in the yomitan discord, it really feels sometimes the Open Source softwares is tools made by devs for devs
Which you can embrase and gives us all the shiny things (Like FSRS is doing with simulator, parameters optimizers, etc ...)
Or you can try to streamline into "One way of using it" (duolingo-like, you boot up anki, you import a deck, you review, no integration whatsoever, no knowledge about what the scheduler is doing, etc)
For example, my wife is a math teacher and we already discussed Anki but it's quite clear it's not something that would really appeal to her students or even her
@unique salmon / @quasi shadow : Shouldn't difficulty be a bit more reactive to Good/Easy reviews ? It almost seems like the "Ease Hell" is even more pronunced with FSRS. But maybe it's perfectly normal if the prediction is better like this ?
Feels like you need 3 Easy to compensate one Again, and ... infinite number of Good 😄
Wild idea but please don't kill me too quickly : What if, Difficulty would be outside the realm of otpimization, but more like an adjustement variable on a card-level basis ?
For example, if a card doesn't match the DR, the Difficulty would adjust to that, to bend the model for it ?
That's the weird part about difficulty - it works better like this for some incomprehensible reason. And making reversion to lower D more aggressive makes metrics worse
How exactly?
Let's try to build an example :
Your DR is 90%. You do 10 reviews, and you fail 3 of them. It means, your actual percentage of retention for that card is 70%. Which means, you have a delta of 20 over a margin of 30, so a 66% difficulty. (Harder than expected)
Your DR is 90%. You do 10 reviews, you fail 1. The delta is 0, the difficulty penalty is thus 0.
Your DR is 90%, you do 10, you fail 0. The delta is -10 over a margin of 10, so you have a -100% difficulty rating. (That word is easier than expected)
Then the question is, how to bend the stability based on that difficulty rating. I don't know 😄
Thus, FSRS optimize your "Average" Forgetting Curve, and Difficulty play the role of the "Case-by-Case adjustement variable"
(As intended)
The benefit is, instead of moving D at every review, it would be adjusted based on the full history of that card. If over 50 reviews you have a 60% success rate on that card instead of your expected DR, something is smelly, right ?
Of course, optimization like moving average has to be considered
@quasi shadow I don't think this is compatible with how FSRS works, but maybe you have something to say anyway
max D means you are answering thoughtfully, min D means you are cheating and you should be pressing easy; pretty much D in current form is not a parameter that humans would interpret as difficulty
Maybe I should go back to experimenting with adding R to the D formula, though last time I did that the metrics didn't budge even a bit
I mean, this is my data : https://docs.google.com/spreadsheets/d/1Eysl4bocAg9KD3YpVCjR28ACkvpMa3D8eMP2x6fWieE/edit?usp=sharing
For each card, I just counted the amount of Success, the amount of Error. I didn't do anything to filter out the good after bad the same day, so of course it wont match exactly "True" Retention.
But still, we see the distribution of success rate looks like a normal distribution.
So you would expect more or less to have a difficulty to follow something like that, with card a bit more problematic, and some a bit less
Now the standard deviation si 7.8% ... So indeed, maybe even with no Difficulty handling, you get something good enough, and since the RMSE/logloss seems already quite good (logloss 0.49 and RMSE 3.360), I guess "it won't change much"
But, I think right now nothing is really changing much, so maybe handling those outliers could help
I read your blog here @unique salmon https://expertium.github.io/Algorithm.html
If I understand correctly from the "Changes in FSRS-5", Difficulty is still not based on R but only G right ? Maybe that could be useful then, since a "Fail" doesn't necessarly mean the card was more difficult than previously right ?
I mean, if we take back my example, for DR=90%, a fail every 20 review should even be a sign that the card is easier than expected. So the Delta D would have to take in account R and DR right ?
Theoretically, yes. I'm just saying that it doesn't seem to matter in practice. But it could also be that my implementation is bad
We can't take DR into account, just R, btw
DR doesn't exist in the training data
Hmm indeed
And I guess even with Training Data coming from Anki user, the DR is not stored anywhere
I think it could be useful, because when you think about it, that R is relative to others card
Without DR, we can always look at R of a card, and the mean R of the dataset
Would not help if different decks has different DR though ....
As I said before, the difficulty is just a mean value for a distribution.
https://l-m-sherlock.notion.site/Personal-spaced-repetition-systems-cannot-eliminate-heterogeneity-135c250163a180e09d3dd605fc095e5e
I guess leech is more common than the feeling of “ease hell”.
Difficult cards are unlikely to become easy in most cases.
The distribution of param of mean reversion also supports this statement.
I think leeches might express themselves differently (Let say DR=80%) :
- Leeches : Your Stability won't go very high, even though your DR might be respected with great accuracy. Basically, you HAVE those 80% perfectly predicted, but with still very low stability.
- "Difficult" card : You seem to not be able to reach the DR, doing only 60-70% R instead of the Predicted R, and their stability will thus even be lower.
Then of course, saying that "Leeches are then also difficult" is also a valid way of expressing Difficulty.
Maybe the problem with "Difficulty" is how loosely defined it is (Is it related to stability ? inability to respect the DR ? Inability of having a stability converging, and having it all over the place ? etc etc)
Maybe Difficulty is then just a word we should stop using and instead refining it into different evaluation of why a card is not "satisfactory" 😅
In SuperMemo D is actually defined differently, based on "missed expectations" - difference between R and the real review outcome (with smoothing and shit, since the outcome is binary). I've tried that, but it didn't improve FSRS. I could try some more, but considering how many attempts at improving D have failed, I don't feel like doing it anymore, since 99% of the time my ideas don't work.
😂I found more traditional models about the memory.
As in P&A, however, PPE does not require a successful retrieval attempt to receive these gains
OK, they both are shit.😅
For anyone using the FSRS algorithm in Anki, I'd strongly advise against it because of multiple issues:
- Inescapable Ease Hell (default w[7] value is 0.0046, rendering mean reversion useless)
- Optimizing with bad learning habits will actually result in a HIGHER workload
Any thoughts?
Maybe we need to introduce something like momentum into the formula.
As good an excuse as any to post my "difficulty time machine"
I always just blame my high difficulty cards on my card design, figured fsrs recognises them as "doomed".
with how currently difficulty is used in formulas:
- user uses 2 buttons / content is normal or hard will lead to most card going into 10D
- user uses 2 buttons / content is easy will lead to most card going as low as possible so ~5D
These are the most common patterns and many people will fall into 1st tier which looks like difficulty hell, but really isn't as the variable is misnomer
When human tries to even rationalize words in some weird way
is that saying longer words are easier?
I guess if it's something like: 〇〇学園筆記試験過去問題集
What do you mean by momentum ? The previous R observed on a card-by-card basis or something else ?
No it seems they count char in the romaji
link ?
Scroll up 🙂
you can rescale the D in FSRS from 0-10 to 0-1 and it brings much more natural distribution but this doesn't improve the prediction
I might get crucified for saying that but maybe D should be more like a post-processing on top of the FSRS equation more than part of the FSRS equation.
Take a look at GPU, they'll often have different layers to be performant and precise instead of trying to have only one shader doing all the work
Difficulty could be like that, an variable not necessarly part of the optimized equation, but something that adjust realtime the prediction based on actual specific feedback
I mean, FSRS is already quite precise, RMSE around 3-5% for most of us. So sometimes it might just be a matter of slightly shifting the prediction on a lower side, or higher side, to get a perfect match
If I take my own distribution of Success Rate (R for one Card over the whole revlog), sure, most cards are within an acceptable range (.70-.75), but for all those that are around [.50,.65], they clearly deviate from the distribution and while it might not have a big RMSE impact, being able to detect them and adjust their Stability would help them be more centered
Because of course, if you optimize D behaviour for the whole set, it'll be optimized to have an average effect helping the whole logloss/RMSE minimization
But, is it really what you want ? Or instead, would you like D to be, a compensating variable on a card-by-card basis, quicker to react to what's actually happening right now (instead of what was planned in the training model)
I mean, something like Straight Reward.
IMHO we should focus on having dynamically selected DRs
the predictions are already pretty good
well, I did see that ssp didn't interest anyone
Inspired by https://kuroahna.github.io/anki_srs_kai/guide/easeReward.html#algorithm
I added two extra params:
w[19] is the step ease reward
w[20] is the minimum consecutive successful reviews requ...
Yup I see the idea. On the specific example of SRS Kai, it still means the ease "reward" would only be computed on a grade-level, instead of an "history-level".
I'm not sure for example an "Again" should always lead to a loss of ease. If you press Again exactly what was predicte by your DR, to me, your card has a neutral difficulty.
I imitate Straight Reward in this PR.
Tell me if I read it wrong in the code, but isn't it only trying to compensate for cards with R>DR ?
Also, since you give a reward for succession of good reviews, it means someone with DR=95% might get a lot of rewards, when in fact, doing longer chain of 1 is not really him necessarly outperforming the prediction ?
I really do think what would lead someone to have a bonus/malus reward should be his actual performance (Retention) based on expected prediction (Desired Retention)
Doing 10 "Good" in a row doesn't mean you're really that good if your DR was 99%
Doing 1 fail every 4 reviews is outperforming if you had your DR at 50%
Now in terms of "momentum", the question would then be : How that actual R should be computed ? The full history ? Only the last portion ? Excluding Same-Day Reviews ? Basically, how to define a Good average to compute actual R (moving average, filtered, global ...)
You see, it can change quite quickly, but those phases are still somewhat pronounced in my experience
But problem is, to compute that, you'd need to stored the "Predicted R" and then, for each window of average, doing the average of the predicted R vs the actual observed R
In my example, you see that D more or less work, since the yellow phase ws at 99%, the red at 100%, the green at 96-97%
For difficult cards, the interval is short, so the user could reach a high streak in a month.
(if the cards turn out to be easy)
It's not related to DR-stuff. I just want to test the idea of Straight Reward.
Sure, fine !
I'm doing a quick experiment with the visualizer :
I do a succession of Good/Fail, I plot the Difficulty, and I alter the Desired Retention
Increasing/Decreasing DR doesn't change how D move when I enter a 1 or a 3
Here, I failed enough time to go to ~97% D, which seems to be my "neutral point".
I do 9 "Good" and only then, I go back to the last Difficulty of 95%
No matter the Desired Retention, it stays the same
Which means : D is somewhat "locked" to measure my performance based on a DR~90% around 95-99% D
Wow, the PR does improve the distribution of difficulty a little in my collection.
But the main metrics don't become better than before.
Yeaah and I'd also be cautious to look at what are those card with D<90%
in my case, it's a lot of very very very young card in terms of review number
This is mine for my main deck
If I check the 65-70D, I get this list of card :
Remark how they all have 0 lapses
BUT, having lapses should be perfectly fine in a model that predict your 80-90% DR
You should, lapse, 10-20% of your review count
Over 1350 cards with prop:lapses>3, I have only 2 with prop:d < 0.9
I have 180 cards with prop:lapses<3 and props:reps>20, the 46 cards have difficult >89%
Hmmmmmm
I'm doing
deck:current prop:lapses<=4 prop:reps>20
deck:current prop:lapses<=5 prop:reps>25
deck:current prop:lapses<=6 prop:reps>30
To find the "not too bad one".
The one with indeed, a number of lapse around ~ (1-DR) compared to my number of reps
And they all seems to be with difficulty 90-100%.
Which means D might worked as intended, but it's just that yeah, plotting it with bars of 10% width, won't give much info
After all, isn't it normal that Stability/Difficulty have the same kind of curve ?
Sorry for the monologue but I realize I might had false expectations 😂
Still, I think the current D won't work for DR too low. Your malus with Again answer will completely erase all D bonus you got with "Goods", since D variation is not related to DR
I've tried the idea with streaks before, so I will be very surprised if it improves metrics.
probably a stupid idea but what if instead of again resetting a streak, it multiplied it by (1-R) or something idk
I know everyone dislike those perfect sequecnes of 3 then 1 fail, but I think it's quite insighftul here, a 80% DR and 90% DR perfect sequence
To me ... It seems... actually pretty good
The very non intuitive part is the fact that for lower DR, your D will be higher
Because since D variation is DR independant, you get "screwed" a bit more
BUT, there are many good points :
- Lapse after lapse, you'll have shorter cycle, which might help you not fail the next one as fast as the previous
- Yeah, everyhthing is still clamped up to 90-100%, but it's not like reviews just get "ignored".
- If the D impact is small, maybe it's just because in my case, there's no that much variation about it. Also, it's just a value, maybe a small variation might lead to bigger stability change
Which is the case, since that Difficulty being different between cycles, is probably what explain why each new cycles would go to Stability lower than previous one, until a point where it would stagnate
Soooo ... Maybe we just want D to be super pretty, super centered around 50% D... but maybe we should trust the optimizer 😂
If I change w[4] to 99%, to start right away at 99% Difficulty, we can see the shape is completely different, and it will converge to a lower Difficulty with time and cycles
Sooooooooo, yeah, maybe the biggest culprist is the difficulty scale, NOT the difficulty itself
So... is this case real?
This is absolutely my case yes
I think we should find a method to detect them and draw the calibration graph on them.
If the calibration is poor, we may find a systematic weakness of FSRS.
Then we can try to fix it via systematic methods.
Anomaly detection 😄 ?
to be fair, I think it might be very simple to detect though. For example, how fast stability grew on those
I've also tried making f(D) mostly linear, but switch to a power function for extremely hard cards. That didn't help either.
But I guess I'll try adding R to D and report the results
Could you provide a dataset?
like w[2] (good initial stability) / w[0] (again initial stability), or very very low w[0]/w[1]
i'll DM you 🙂
This issue is mostly about the user perception where they expect difficulty to lower with time and the distribution of difficulty to be centered at ~50%. It doesn't matter if the prediction and optimization is best as is if the user feels that it looks wrong 😔
Sure but then it’s something that can be improved not by changing the scheduling but by improving how people can interpret and plot that value.
For example, everyone knows that looking at Card Stability with the “all” toggle will also lead to something very abrupt
So adding graphs default and options to reflect the same kind of observation with difficulty would solve the misconception
Right now that graph is a kind of “all” one and in comparison to Card Stability one it is probably even more clear and well distributed
I think by filtering out the very young one and doing some kind of zoom on 80-100% difficulty people would have a better sense of their difficulty distribution
It’s something we can try to implement in the stats plugin before proposing a PR in Anki itself
I still do believe that ideally D should have the same “neutral point” between different DR but it’s not a game breaker at all, it’s something that can be explained in legends, docs, blogs
also a food for thought, instead of blending easy with current response in mean reversion add additional parameters for reversion so that easy/good/hard have their separate optimizable parameters for reduction
in the case of studying Japanese, this is a pretty large amount of people. they dive into Japanese and use a core vocabulary deck of the 2000 most common words for example, and then it's natural as a complete beginner they won't know any words
Yes and more "easy" card (With D <80%) only really happen when you start having words you can somewhat infer from others, which is absolutely not the case for your first 1-3K words
To me difficulty in this case represent more how atomic/disconnected the cards are. (Pure) Kanjis might have an even more abrupt curve then mine
toggle button: "show full scale" with the default state being turned off.
wdys
Are we going to still gloss over this?
?
I mean pretend that it is not a problem that literally every other deck is rendering fsrs 5 useless post optimizing
(without manualling resetting values for 100s of decks which is furthest from being practical)
Jarrett is working on a fix
Yes because my RMSE has definitely worsened. It has gotten from 4% to just shy of - %
6%
have u tried a lower number of step?
and lower time
It literally just gives me back the default value
Pre-whatever update that was-update that was absolutely not the case
Only the total number of steps matters
And as I was reviewing I became aware of a very noticeable drop in my reviewing performance prior to that update which is odd, since my reviewing habit is consistent for the past 2.5-3years on my Anki
@quasi shadow am I misunderstanding this code or does it only count "Good" and "Easy" for the streak?
This one is also strange - are you counting "Hard" as fail?
The initial value after the first review should be
new_streak = torch.where( X[:, 1] > 1, torch.ones_like(state[:, 2]), torch.zeros_like(state[:, 2]), )
And then for all other reviews it should be
new_streak = torch.where( X[:, 1] > 1, state[:, 2] + 1, torch.zeros_like(state[:, 2]), )
Oh, I see, Straight Reward was made by a madman who counts "Hard" as "not success" but at the same time not as "fail". So in Straight Reward:
Easy = success
Good = success
Hard = not success
Again = fail
So I guess you faithfully imitated that, which is not a good idea
i'm sorry what
Why on Earth do you need RELU here?
Oh, that's just the world's weirdest way to do "add a reward if the streak is >= some parameter, don't add anything otherwise"
Still don't get why you use leaky RELU though
Instead of the regular one
Using torch.maximum(new_streak - self.w[20], 0) instead of RELU would be a lot clearer, btw. Just to make the code more readable
Some time ago me and Jarrett agreed to stick to "2% relative improvement per parameter" rule. In other words, if you tweak FSRS and add new parameters, logloss and/or RMSE have to decrease by at least 2% (relatively) per each new parameter. This is just to avoid adding a crapton of new parameters and bloating FSRS for extremely marginal improvements. And I really doubt that what you suggested would be anywhere near 2%
Trust me, D is just that much of a bitch 🤣
D doesn't care what you or me think makes sense
Even something really obvious, like using optimizable parameters instead of Again=1, Hard=2, Good=3, Easy=4 doesn't do jack shit to improve metrics
Btw, I'm currently benchmarking some stuff related to using R in D, will share the results later. I'll try a very simple approach first that doesn't even require adding new parameters, and then I'll try redesigning D entirely
I bet 20$ neither will help
@cosmic hedge / @cursive badge , I'm tinkering about "Lapses" and trying to extract something useful from it, I came up with a "Avg Repetitions / Lapse" that could be useful to detect if Higher Lapses not respect the DR anymore (because they are so difficult, you lapse them more than you need)
I came up with this (see attach)
I have the feeling the second view is more useful since you see how many repetitions you can do in each "lapse", so in my case I can see that the more lapse I have, the lesser the average retention (expressed here in repetitions/lapse)
Any hot ideas before I create a PR so we can improve it a bit more with time ?
Maybe a "Lapse Ratings" would be best suited, similarly to
not really something useful but
hope ross doesn't mind the ss
That's a good idea ! But I think it would have to be implemented in something else than simply a few computation/ratio based on card metric
I don't know much about anomaly detection too
But I see the point
I followed this guide.
So as I said here
Without leaky_relu, if new_streak is less than self.w[20], the gradient is alway zero for it.
Ah, ok
It's used to address issues of vanishing gradients.
I created the PR here, I merged both graph in one with a toggle "Divide Avg Repetition by Lapse"
https://github.com/Luc-Mcgrady/Anki-Search-Stats-Extended/pull/33
I attach a local build for thos who want to try it (it includes also the Stability over Time)
Note : This include right now the changes of #32. So we might need to merge that one first, but at least I don't have conflict between both in this one :)
I also added an option in the conf...
I'm going to have to rebase this aren't I 😅
I'll hold off on the "make it redder!!!" till you're done XD
I also wanted that bar default feature for a while so feel free to default it to "bar" because people tend to like the bars more
could be one of those cases where MOVING-AVG does better than FSRS, but it's a real problem if it persists even after FSRS is optimized on these newer reviews
Normally merging should be fine since the PR point to the main 🙂
The average stability is already good enough I think, I'm working right now on making it more precise (for ex, using stability = 2.3 instead of 2 from the bin value)
buuut it won't change much
even if FSRS is perfect wouldn't you still expect a healthy spread of values due to randomness? So it's hard to tell from just this
yeah but i squash merge so i think that turns out poorly (?)
To some degree, yes. You can easily calculate the probability that a card will be failed n times in a row at DR=x%, it's just (1-x)^n.
For example, at DR=90%, the probability of failing a card 3 times in a row =(1-0.9)^3=0.1%. So if you have 1000 cards and DR=90%, on average one card will be failed 3 times in a row
(unless I'm bad a math)
pretty much just look at the binomial distribution (divided by n) for what the spread is expected to be
Maybe we can find leeches like that, actually. If a card has been failed n times in a row at some DR level, we can calculate how likely it is. And it's super unlikely (for example, <0.1%), then it's tagged as a leech
And yeah, we can do binomial distribution shenanigans to extend this to failures that didn't happen in a row
It's just that math is simpler if all failures are one after the other
This gets problematic if DR changes, though
Since Anki doesn't store it anywhere
It doesn't store DR at the time of the review in Card Info
And I'm not sure if binomial distribution math even works if p(success) changes
Yep, never use rebase or anything using rebase on a shared branch 😄
It might sounds/look prettier, but it's rewriting the story, different commit id, it's a mess
It's even called the "Golden Rule of Git" https://www.atlassian.com/git/tutorials/merging-vs-rebasing#the-golden-rule-of-rebasing
if the user changes DR then you just add up the different distributions per DR
Btw, any card that has been failed 6 times in a row would be considered a leech at any DR that is used in Anki, if we use 0.1% as a cutoff
BINOM.DIST(0;6;0.7;TRUE) = 0.073% (Excel)
0 successes, 6 trials, p=70%
Can elaborate how you would calculate it?
the user does 150 reviews at 90% DR.
the user does 20 reviews at 70% DR
the expected distribution is (Binomial(150, 90%) + Binomial(20, 70%)) / (150 + 20)
No, I mean, calculate the probability that a card has been failed k times out of n reviews at different DRs
can compute this by iterating over the exact number of failures at each specific DR
the models work by averages over a distribution of possible cards. The forgetting curve is just an average of individual forgetting curves drawn from this distribution, and success/failures should in theory update on these individual cards rather than the average. For example, maybe the model expects a user to learn 1 hard card for every 9 easy cards. Then even after just 3 successes for a new card, the model can update its belief; it's highly likely that this card is one of those easy cards and can schedule a longer interval accordingly. So yeah it's reasonable for ease to update after every review
RWKV curves at a fixed stability shows how much this underlying hidden distribution can affect the average distribution
shouldn't this become a different channel already at this point. we will have more interaction from people this way. who knows some random passer-by might get interested.
(deleted and reposted cuz didn't want to destroy Alex's message)
mods noticed me :blushed:
I'm not arguing about updating after every review, I'm saying that the overall performance of that specific card might be interested to take into account (more than just the previous grade)
on another note, we must prioritize using log loss especially when developing models to account for average performance. e.g. if a card with 0.6 overall retention was just unlucky, we don't want to overcompensate when the 0.7 predictions were already correct. I've shown before that RSME (bins) encourages overcompensation for mistakes so we cannot use it to measure improvement benefits for this
time to come up with an equivalent for log loss?
Sure. 0.02% improvement 🤣
Well, if we're being realistic, 0.001 improvement in absolute terms, I guess. Like from 0.327 to 0.326
So 0.001 improvement in logloss per parameter
Maybe 0.002, if we're being conservative
if you compare GRU-P-short to FSRS's RMSE difference and try to linearly map that to log loss then you get something like 0.00075 per 1% improvement
so i guess 0.0015 per 2% RMSE
Alright, let's say 0.0015 absolute improvement in logloss per new parameter
@quasi shadow new rule just dropped 🤣
Having said that ... If GRU-WHATEVER allow you to translate your DR=80% model to DR=70% without a big loss of prediction precision ...
... 😄
Would open a lot of doors for future algorithms 🙂
Alex's RWKV would actually have a ton of advantages over FSRS
The more I think about it, the more I think it's actually very desirable
- We can make R more accurate
- We won't have to show parameters, which means one less thing for users to worry about
- We can support proper same-day scheduling instead of the current mess
- We can throw in new input features, like time of the day, workload, etc. Not just interval lengths and grades
- We can remove "Optimize", which means even less stuff for users to worry about
RWKV would be pre-trained on 10k users, so it wouldn't need further optimization
In that sense, it would be more like ChatGPT - it Just Works™ out of the box
reminder that it would still be optimizing, but it would just be optimizing on the spot
well it would be doing the equivalent to optimizing on the spot is what i should say
you wouldnt say that an llm is optimizing on the spot
and we don't know how well it translates DR=80% to DR=70%, the only thing we have is faith in its better log loss
Is it possible to test such a DR=80% into a DR=70% ?
I mean, evaluating how good an algorithm is at that
I never actually really understood how we actually test if a prediction is correct or not
I mean, you see the user entered "Again" when you predicted a 60% retention, how do you know if it's a good or not prediction ?
If it's "Again", predicted value should be as close to 0% as possible
If it's not "Again", predicted value should be as close to 100% as possible
And you use some math function to calculate the "distance" between the real review outcome (0 or 1) and the predicted value
And then the optimizer tweaks parameters to minimize that "distance"
If the algorithm always outputs 100% for every non-Again and always outputs 0% for every Again, then it will have a "distance" of 0, since predictions are exactly equal to real data
Ok thanks ! I got it now 🙂 Couldn't guess that it's minimizing the cost between the prediction at again and 0%
some things to try:
- find gaps in the review history. if someone took a 1 week break then you would expect the retentions to be lower and we can measure the performance of the curve this way. This could be how a 80% DR gets shifted to 70% DR. If someone took a break for the year we can measure the far end of the curve.
- trust that the underlying reviews have a high variance in retention. this is probably true since I expect the underlying scheduler to have been SM-2. To illustrate this, if a perfect scheduler produced the data at 90% DR then the data is not enlightening at all, the perfect model would in turn just predict 90% all the time with no curve. But if the underlying scheduler that produced the data sucks then there will be plenty of data to work with already.
but i havent tried 1) yet and 2) is just an assumption
Oh wait I think there is a miscommunication from my part, what I mean is :
FSRS is good in my case at predicting my 80% DR. Very, very good. But if I ask FSRS to predict my 70%, in fact I'll be at 60%.
The forgetting curve doesn't really match how fast I forget between 80 and 70
It does predict 80 very well, but it's too optimistic for 70, and too pessimistic for 90
I do my review by descending retrievability, in general, I have around ~95-98% for all my DR=90%
~60% for all my DR=70%
We actually found that different curves are better at different retentions. And now we have no idea what to do with that information 😅
i think i'm talking about the same/similar thing. When you set a lower DR in FSRS it just multiplies all intervals by a constant factor, and to measure how well this does using the current data we can find gaps in the review history to simulate this effect
the best way of course is to run an actual experiment with users but we don't have the resources for that
Oh OK now I get the gap idea
But if your training set is with people that used SM2, don't you have "by default" gaps ? Since for example, some might have a too high ease factor and it creates gap that FSRS would have filled with reviews ?
Btw, we have no clue whether:
- We are misunderstanding our own data
- This is a quirk of the optimizer
- This is a fact about the forgetting curve: there is no universal shape, the shape has to depend on retention
yeah that's related to point 2), if the underlying scheduler that produced the data produces enough varied data when we don't have a problem at all
if we assume that the underlying schedule was SM-2 and not people choosing a lower DR with FSRS then a lower average retention speaks more as to the difficulty of the cards and the distribution of cards
and RWKV curves suggests that this might matter for the shape of the curve
Now that you mention it, it's true that the whole "Forgetting Curve drops faster in the beginning and slower at the end" might just not be true for some knowledge
It's not impossible that for some cases, you get more something like this
Typically, knowledge that "hold together" with 2-3 mnemo, that could hold together short term, but suddenly drop when they all dropped
We could try passing D into the forgetting curve directly instead of using it only to calculate S. But that would screw up a lot of things, especially how we calculate S0 and the interpretation of S itself
interpret S as the point where retention is 0.9 and all problems are solved
i could try to come up with formulas that could allow this behaviour, but currently the nn models all cannot possibly replicate such a forgetting curve
I mean this
We would no longer be able to do this since the forgetting curve would also depend on D, so there is another dimension now
but i also don't think this is true, but possible in the sense that a user might continuously review their new cards outside of Anki in a way that keeps the memory strength high
can change that to an average instead
The interval here would also depend on D is what I'm trying to say
The main issue is S0, btw
Since it requires choosing a fixed shape of the curve to estimate
i wonder if this is due to mental fatigue as well. It seems that RWKV could in fact give longer intervals than FSRS would if you set the DR to be very low, but in practice i'm sure that you will run out of mental strength first. Mental energy might be very important as is seen by how MOVING-AVG is better than FSRS for more users than not
Another theory I had was the fact that, by reducing DR, and keeping the same load, you start to crank up the new/day, which create a lot of interference knowledge, which is also a part of how you "forget things" (in this case, why you get them wrong)
I mean, take this exampel :
You remember 駅 as "Station". Now, you have also 訳, as "Reason". Next time you see 駅, you hesitate between "Reason" and "Station", and you get it wrong.
Can we consider you have forgotten the word ?
Or should we instead just say you got it wrong
forgotten != wrong, my point is
In this case, you could have a "long stable knowledge" that becomes a "very short stability one", now because of how you forget, but how you had too much simplification of the knowledge in your brain, by not experiencing similarities
You know, I tend to say that there is no such thing as "overthinking", but you are very good at convincing me that there is...
I mean, in #memes I would agree, but here I think we're not here for the memes haha
And even if we can't do anything about it, it's still something interesting, because it shows that if you see in practice that the "Forgotten Curve" is a lie, it might "seems like it", simply because "Forgotten" is not super well defined
this reminds me that i want to test RWKV for this sort of behavior. Give a new card and look at its forgetting curve. If I add a couple of new cards, will the first card's forgetting curve significantly change? I'm sure that it will
And maybe one day, the succcessor of Anki, instead of having "Again/Hard/Good/Easy", could have "Forgot/Confused/Recalled/Slow Recall"
I'm sure users will find it intuitive to use and there will be no misudnerstandings whatsoever
I think as long as the user keep a consistent behaviour, it should be fine in thise case
For example, I use "Easy" as "I already knew it", and FSRS adapted its value quite well to it
The issue is that you still have to mark an option as "good or bad", like "Hard" that is for some people "good" and others "bad"
Or you have something so flexible like neural network that it will compute for you if "Hard" was used as an Again or a Good 😄
It would be cool if we didn't have to treat the grades as binary when calculating the loss, but idk how to do that
We could make a neural net that outputs 4 different probabilities (one for each button), but idk if it would be advantageous in any way, and I anticipate all sorts of issues
RWKV-P predicts 4 probabilities, i believe it should help with improving gradient information
but idk how you would do this for a forgetting curve
yeah i'm even considering predicting the duration of the review
Yeah, that's the thing
It's not that predicting multiple probabilities is impossible, it's that there is no way to combine that with the "forgetting curve" approach
To be fair right now, for me a simple linear regression would be a good enough forgetting curve hahahaha
I don't play with DR anymore, PTSD from last time
I do increment it 1% every month more or less
https://arxiv.org/abs/1707.06887
the same idea is used in RL
In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying...
What if we have 3 forgetting curves - one for Hard, one for Good, and one for Easy? 🤣
And p(Again) is just 1 - p(Hard) - p(Good) -p(Easy)
Idk how we would use that Frankenstein monstrosity for scheduling, though
Oh, and their sum must always add up to 1, which is problematic
if we make certain assumptions then it should be doable
Yeaah ! .01 RMSE earned ! New params day baby !
😄
nice, you will learn 0.01% faster now
Well it seems in the past a "hard" first was similar than a "again" first
Nows a Hard first is actually better !
Anki users tweaking 13456890 settings to learn flashcards 0.01% faster be like
(I am an Anki user)
I mean, we're geeks
I won't start to feel guilty for it at almost 34
(I had to double check, I thought I was already 34)
(I know my RMSE better than my age)
lol
Too bad no amount of Anki and parameter tweaking can help me get a gf
Well, I guess theoretically I could make my own dating app with my own algorithm, but that is not going to happen
We're together for now 10 years, she's a math teacher, and I can tell you this : even her doesn't care the slightliest about Anki or FSRS
It's made for the geeks, for the geeks
We should just celebrate it together
give weights to hard, good, easy, such that the sum of the weight is <= 1
Me who has never kissed at the age of 28
just find an anki deck for that topic
gonna make my own anki, with blackjack and hookers
alternatively, maybe tweaking the variables less would help 🍃
IMO, but I'll just send 1 message about this topic otherwise this will go from FSRS to "#date-advice" very quickly, being a geek/overthinker is not that much an issue, and as long as you build an "healthy sense of self confidence" it's always great to see passionate people (as long as they are also able to open up to other's passion lol). I always add the "healthy sense of" because you might land on r/TheRedPill, wearing long coats and stuff 😂 .
maybe what he needs most is a long coat
Unironically this coat would go hard af
Yeaaah but (OK let's go off-topic a bit :D), you CAN FEEL people when they identify to those exterior things. It's almost like you see someone from Peaky Blinders coming to you
And the light, and the posture, and ryan gosling
the problem is you'd probably look like a dork wearing that 🍃
It's like those chinese clothes on Amazon model pictures vs on a random russian in the review section
kek
Fair enough. I'm not Ryan Gosling
none of us are
- Do a bit of gym, but don't build your whole identity around it
- Try to wear nice clothes, but don't become a parody of Karl Lagerfield
- Try to be positive and nice, but don't a boot-licker
- become a monk and join a monastery
Available on
► Digital: http://smarturl.it/LynSkyFADigital
► DVD: http://smarturl.it/LynSkyFloridaDVD
► BR: http://smarturl.it/LynSkyFloridaBR
► CD: http://smarturl.it/LynSkyFloridaCD
Earlier this year Lynyrd Skynyrd performed their first two studio albums, “Pronounced 'Lĕh-'nérd 'Skin-'nérd” and “Second Helping”, live in their entirety for t...
Be a simple man
WEll, maybe not like them
But you get the idea
"Have fun between boys"
"But not with the little ones"
- Wait for AI to become advanced enough that I can have an AI gf
"Or to change church every year, you'll be doomed"
you can do that today
I might have found on some private trackers some first AI porn
It might have been quite interesting
When I see post on linkedin about how AI will change the world, I just think about how much they were taking about VR during Covid
So many VR Headsets sold for "VR Experiences beyond imagination"
With more install of DeoVR than Metaverse
The "fsrs4anki_scheduler.js", is it a file from the playbook or the anki github ? can't find it somehow
Ok, the step 2.2 gave me something compatible with anki though
so probably not necessary anymore to do that step
and just cc-cv them in deck options
Difficulty with 1% granularity
Let's be honest ... It doesn't really bring much much more value at all 😂
Guess it does show that theyre not all just 100%
I'm doing a difficulty time machine but because of ts-fsrs its not 100% accurate
Telepathy haha
I was thinking "Maybe a Time machine would be better"
Doing an average over time, except if you reaaaally zoom on 90-100, you won't see anything
Did you try it?
no
If you look at mine you can see difficulty really starts going up
Maybe i should add an "average difficulty%" just at the bottom instead of as a seperate graph
I kinda like being able to see trend
Especially like stability when you see very little steps compounding
But it should be elsewhere
I started as a Pie but then I was like let's use a ScrollBar to configure bins=100
While i was doing the pies I remember the d3 tutorial had something like "never use pie chats" in it
But hey they make the addon look a little less like the bar-chart-fest that it really is XD
Yup 🙂
Anyway, enough for today
No big eureka for this time
I think D is probably something better left unseen
lol
I mean Difficulty right
By the way, Github Copilot is atrocious from time to time for JS ...
Full full hallucinations
Wait, I have a genius idea
@quasi shadow
Right now we estimate S0 like this:
` def loss(stability):
y_pred = self.forgetting_curve(delta_t, stability)
logloss = sum(
-(recall * np.log(y_pred) + (1 - recall) * np.log(1 - y_pred))
* count
)
l1 = np.abs(stability - init_s0) / 16 if not SECS_IVL else 0
return logloss + l1
res = minimize(
loss,
x0=init_s0,
bounds=((S_MIN, INIT_S_MAX),),
options={"maxiter": int(sum(count))},
)`
If we want to make decay depend on D, we could either just assume some fixed value of D...or estimate it from the data!
- The formula for converting D into decay is very simple. decay=-0.1×D. That's it. When D=1, decay=-0.1, the curve is flat. When D=10, decay=-1.0, the curve is steep.
- The modified minimize function should look like this:
res = minimize( loss, x0=[init_s0, init_d0], bounds=((S_MIN, INIT_S_MAX), (1, 10)), options={"maxiter": int(sum(count))}, )
Now it will fit both S0 and D0 rather than just S0. So now we have a way to estimate D0 (kind of) directly from the data. - In pretrain's
lossyou can do this:
def loss(params): stability, difficulty = params[0], params[1] y_pred = self.forgetting_curve(delta_t, stability, difficulty) - In the forgetting curve itself you can do this:
def forgetting_curve(self, t, s, d): DECAY = -0.1*d FACTOR = 0.9 ** (1 / DECAY) - 1 return (1 + FACTOR * t / s) ** DECAY - Then you remove D from the formulas of S and pass D into the forgetting curve instead.
So now we have a flexible curve that can adapt to difficult material. Now we aren't just adapting S for difficult material, we are adapting the curve itself.
And then you tell me I overthink 😄
I think about technical things that improve FSRS. You think about "forgot != wrong". We are not the same.
Usually "we are not the same" is said ironically, but I am being 100% serious
You were getting into sorata level "energy and force are non-physical" crap
Actually, I'm not sure which one makes me want to say "shut up and calculate" more - your "forgotten != wrong" or sorata's "energy and force are non-physical"
Both make me want to say "Look guys, don't do this. Just don't. Stop. For your own good and for everyone else's good, stick to crunching numbers, please."
And before you say "But 'forgotten != wrong' actually makes sense because..." - stop. Sit down. Take a deep breath. Do it three times. Now...numbers. Focus on the numbers. Or do your reviews. Or go outside. Or do anything else but this.
I was just about to suggest you to take a deep breath ahah. Breath in, breath out, everything is fine 🙂
You're enough @unique salmon ❤️
...same to you?
I'm not the one who says "Force and energy are mystical" or "Forgetting the card and getting the card wrong are different things"
you need to go outside though
Ok, fair 🤣
I do agree with sound that forgetting something completely and "is it A or B" are two distinct things
Oh no
like doesn't supermemo have like 3 grades of failure?
Don't trigger a new chain reaction 🥲
...I'm going to sleep
Good 😄
I think anki needs a two types of failure buttons: "how did I even get here" and "in a 50/50 I am wrong"
Yeaaah but in the end it doesn't matter that much since FSRS will just predict when you were enable to get it right (including both situation)
The main point was to describe why sometime "forgetting" in Anki can be brutal
I have so many bloody cards that are stuck in forever-leech-mode cause I consistently lose the 50/50
Because it's not really forgetting, it's more like suddenly, you realize you learnt something not the "full way"
It's also a danger of doing too much reviews I think, the more reviews you do, the more discriminant features will survive to recognize something, so if you reviewed only a subset of a learning domain, your brain will have reduced the patterns to something that is NOT sufficient
Once again, I remembered 駅 because it was "The R shape on the right"
Without even looking at the left after a few reviews
the solution obviously is just to add 20 more cards with that word on it
But then 訳 comes and now, I built dozens of reviews "recognizing the R shape"
Yes basically that's when I realized that core decks based on words frequency are not always that that smart
ironically, just looking at the right side is how most people read :D
Cause the right side most of the time denotes the reading, and then you just read the words
I also something very interesting that japanese are able to recognize kanji even if the center is blurred
It's like in their brain, the outside shape of a kanji is sufficient to recognize them
It's just generic pattern recognition
in the end confusing two kanji is also a non-issue, since you rarely ever read a Kanji in Isolation and out of Context
And then you get into "recognizing entire words by their shape" territory
Yeah sometimes it can be nasty things though
社会 vs 会社
If your brain learnt one by remembering the association of both
when the second comes, you're good to re-learn them a bit
That's why sometimes adding more cards can help remembering older ones
More rooms for connections I guess
And less room for too-simplistic/bad-pattern recognition
I can pretty much just read them to their sounds, and then the words are obvious
You see
Basically you build another recognition pattern, based on pronunciation/reading/etc
階段 vs. 段階 is a much meaner one imo
So learning is sometimes a lot of iteration over the same material
(But not simple bruteforced iteration, more like remodeling knowledge all the time, until you stabilize it)
I feel like some of the words I'm grinding right now I'll never truely stabilize
since they're so incredibly rare, I might never see them outside of Anki
That might be a challenge indeed
the solution obviously is just to add 20 more cards with that word on it
Like, the latest levels of WaniKani have started teaching historical names, and Kanji that appear only in that single name
I know in the first month I add trouble remembering じょうきょう (状況) until ONE TIME, I heard a character say it in an anime, and since then, its voice is associated to that word and the meaning with it
Same with とにかく that I always hesitated between "BTW" and "Anyway"
Until that character in Violet Evergarden say it 20x times per episode
yeah, having proper connections to the meaning of words it vitally important
And Anki can't easily provide that
though I do remember words I learned from my Grammar-Deck substantially better than the blank vocabs I learn
Cause I learned them in some other context...
I know some people do some contest of Kanji recognition, for those it's a bit difficult because it's really standalone stuff
I also now have the reverse problem this causes on FSRS
Since I obviously know a lot of words by heart by now, those always get a good rating
But since those get optimized together with the random other ones, those get pushed away too quickly now
aaaah indeed
yeah
Basically, that's why even if it might sounds a bit philosohical, even if FSRS tomorrow had a perfect prediction function ... Won't really change much about Anki itself
So I can either just accept that I will never properly learn those words cause of that, or make my parameters harder and be shown the easy ones way too often
Currently opting for the last one, since when I just let it optimize as it pleased, it was actually harmful
There's a Youtuber that always insist a lot on how SRS can be super tempting, but in long term fall short against more mind mapping/contextual learning of concepts
Hey there! I'm Justin Sung, a learning coach (for the last decade), former doctor, top 1% TEDx speaker, education author, and social entrepreneur.💡
I'm also the co-founder and Head of Learning at iCanStudy, where we've pioneered the world's first cognitive retraining program, focusing on self-regulated higher-order learning (i.e. learning to le...
SRS is just a helpful tool
Justin Sung
But to actually acquire a language, it's just not enough on its own
An extremely helpful one ! .... But still just a tool 😄
Yup
Though tbf, I did learn japanese entirely in Anki to a degree that I could watch native content
But that is very much just shoehorned into Anki, and kinda abuses the SRS system a bit
as indicated by this
hehe
For my vocab deck though
This is the whole jazz
But problem is, sure I know a few words, but how sentences are built, meanings, nuances, I just don't learn them well with Anki
Those past weeks I've been dedicating 30-60min per day really pausing subs, analyzing, and my overall understanding really improved
That's why my stats are so ridiculous on that deck
Sure you can !
But I think then you're basically trying to recreate the outside world in Anki
The JLab deck really works well
Generating sentences, Generating Audio, randomizing cloze ...
but you absolutely must stay on default parameters when using it with FSRS