FSRS Megathread | Anki | Page 15

polar maple Apr 27, 2025, 8:19 PM

#

mb the young cards have high RMSE and the right side of the graph corresponds to more young cards

unique salmon Apr 27, 2025, 8:21 PM

#

Also, how do you calculate RMSE? It's super high

#

80% RMSE can't possibly be correct

cosmic hedge Apr 27, 2025, 8:23 PM

#

I made it ages ago idk 😂

#

Maybe you can see wherever i screwed up

unique salmon Apr 27, 2025, 8:34 PM

#

cosmic hedge Maybe you can see wherever i screwed up

I would if I could find the code that calculates RMSE in your repo, but I can't 🤔

#

df["loss"] = df.apply(lambda df: (df["y"] - df["r"]) ** 2, axis=1)
Ah, I see. You're calculating it in a very different way, the "normal" way

#

I suggest you to do this instead:

https://github.com/open-spaced-repetition/srs-benchmark/blob/main/result/FSRS-6-recency.jsonl
Fetch RMSE(bins) based on user id
Use that instead

GitHub

srs-benchmark/result/FSRS-6-recency.jsonl at main · open-spaced-re...

A benchmark for spaced repetition schedulers/algorithms - open-spaced-repetition/srs-benchmark

cosmic hedge Apr 27, 2025, 8:38 PM

#

unique salmon `df["loss"] = df.apply(lambda df: (df["y"] - df["r"]) ** 2, axis=1)` Ah, I see. ...

square the root, find the mean. thats all i figured it was 😭

#

    loss = df_filtered["loss"].mean() ** 0.5
``` i find the mean here

cosmic hedge Apr 27, 2025, 8:38 PM

#

unique salmon I suggest you to do this instead: 1) https://github.com/open-spaced-repetition/s...

how would i get the fatigue values from that?

unique salmon Apr 27, 2025, 8:39 PM

#

cosmic hedge square the root, find the mean. thats all i figured it was 😭

Well, that's how normal people do it 🤣
https://github.com/open-spaced-repetition/fsrs4anki/wiki/The-Metric

GitHub

The Metric

A modern Anki custom scheduling based on Free Spaced Repetition Scheduler algorithm - open-spaced-repetition/fsrs4anki

unique salmon Apr 27, 2025, 8:39 PM

#

cosmic hedge how would i get the fatigue values from that?

You don't. I'm just telling you where to get RMSE

cosmic hedge Apr 27, 2025, 8:39 PM

#

I might look at it later.

unique salmon Apr 27, 2025, 8:40 PM

#

https://tenor.com/view/perfectlysplendid-goodanswer-clapping-hands-excited-gif-18827657

Tenor

cosmic hedge Apr 27, 2025, 8:40 PM

#

i did it the "simple" way in SSE as well btw so idk where the discrepancy from that might emerge

unique salmon Apr 27, 2025, 8:41 PM

#

cosmic hedge i did it the "simple" way in SSE as well btw so idk where the discrepancy from t...

Well, then you definitely should remove it from the add-on

cosmic hedge Apr 27, 2025, 9:06 PM

#

unique salmon Well, then you **definitely** should remove it from the add-on

"bad graph" is good enough for me

#

no ones ever gonna see that XD

#

well if they do they've been warned

#

I'll do this before i do any weird fatigue stuff if i even do any weird fatigue stuff

unique salmon Apr 27, 2025, 9:13 PM

#

cosmic hedge I'll do this before i do any weird fatigue stuff if i even do any weird fatigue ...

Btw, please show the code so that I can verify that the integral is calculated as intended

#

For example, in Python it looks kinda like this
sum_of_avg_r_over_a_year[today] = average_f_power_forgetting_curve(card_table[col["delta_t"]], card_table[col["delta_t"]] + 365, card_table[col["stability"]], DECAY).sum()

cosmic hedge Apr 27, 2025, 9:17 PM

#

unique salmon For example, in Python it looks kinda like this ` sum_of_avg_r_over_a_yea...

https://github.com/open-spaced-repetition/fsrs-rs/blob/092c20bac7d9239a991ae5b561556ad34c706c16/src/optimal_retention.rs#L577
https://github.com/open-spaced-repetition/fsrs-rs/blob/092c20bac7d9239a991ae5b561556ad34c706c16/src/optimal_retention.rs#L26
You can change these formulas where "cards" is an array of cards at the end of the simulation

GitHub

fsrs-rs/src/optimal_retention.rs at 092c20bac7d9239a991ae5b561556ad...

FSRS for Rust, including Optimizer and Scheduler. Contribute to open-spaced-repetition/fsrs-rs development by creating an account on GitHub.

#

if you psudocode it or chatgpt it it might save me a job 😂

#

as in save me the entire job

unique salmon Apr 27, 2025, 9:22 PM

#

cosmic hedge https://github.com/open-spaced-repetition/fsrs-rs/blob/092c20bac7d9239a991ae5b56...

Oh lord, Rust... 😭

            for i in 0..delta_t {
                memorized_cnt_per_day[last_date_index + i] +=
                    power_forgetting_curve(w, (pre_sim_days + i) as f32, last_stability);
            }
        }```
This is the part that needs to be changed...I think.
Instead of using "instant" R from the forgetting curve, we need to use average R over some period of time, aka the integral thingy.
I'm guessing `pre_sim_days` is delta_t?

cosmic hedge Apr 27, 2025, 9:26 PM

#

unique salmon Oh lord, Rust... 😭 ```rust for i in 0..delta_t { ...

nope you need to change it here
https://github.com/open-spaced-repetition/fsrs-rs/blob/092c20bac7d9239a991ae5b561556ad34c706c16/src/optimal_retention.rs#L577

unique salmon Apr 27, 2025, 9:26 PM

#

def average_f_power_forgetting_curve(t1, t2, s, decay):

    def integral_power_forgetting_curve(t, s, decay):
        factor = 0.9 ** (1 / decay) - 1
        return (s / (factor * (decay + 1))) * np.power((1 + factor * t / s), (decay + 1))

    # Calculate F(t2) - F(t1) where F is the antiderivative
    integral = integral_power_forgetting_curve(t2, s, decay) - integral_power_forgetting_curve(t1, s, decay)

    # Divide it by the difference in time to get the average
    return integral / (t2 - t1)```
Port that to Rust (and add an assertion that t2 > t1).
Then do this:
```rust
            for i in 0..delta_t {
                memorized_cnt_per_day[last_date_index + i] +=
                    average_f_power_forgetting_curve(w, (pre_sim_days + i) as f32, (pre_sim_days + i + time_offset) as f32, last_stability);
            }
        }```
`time_offset` is 1/2/3/5/10/50 years, except in days

cosmic hedge Apr 27, 2025, 9:26 PM

#

you can caluclate R using the stabilitys of the cards

unique salmon Apr 27, 2025, 9:27 PM

#

cosmic hedge nope you need to change it here https://github.com/open-spaced-repetition/fsrs-r...

But that calls simulate, so I'm changing simulate

cosmic hedge Apr 27, 2025, 9:27 PM

#

unique salmon But that calls `simulate`, so I'm changing `simulate`

you don't need to change simulate it gives you a list of the states of the cards at the end

#

you're going for retention in the future from the end of the simulation right?

unique salmon Apr 27, 2025, 9:28 PM

#

cosmic hedge you're going for retention in the future from the end of the simulation right?

Yep

unique salmon Apr 27, 2025, 9:28 PM

#

cosmic hedge you don't need to change simulate it gives you a list of the states of the cards...

In cards?

cosmic hedge Apr 27, 2025, 9:28 PM

#

unique salmon In `cards`?

yep

#

also we already have the forgetting curve
https://github.com/open-spaced-repetition/fsrs-rs/blob/092c20bac7d9239a991ae5b561556ad34c706c16/src/optimal_retention.rs#L234-L238

unique salmon Apr 27, 2025, 9:30 PM

#

cosmic hedge also we already have the forgetting curve https://github.com/open-spaced-repetit...

Yeah, but we need the integral

cosmic hedge Apr 27, 2025, 9:31 PM

#

unique salmon Yeah, but we need the integral

ah that's true

unique salmon Apr 27, 2025, 9:31 PM

#

https://github.com/ankitects/anki/issues/3926#issuecomment-2833639346
Btw, Dae approved "health check", but said that if it has a diagram, it should be embedded in deck options itself, rather than in a pop up

GitHub

Improve feedback from "Evaluate" and potentially hide/move it (FSRS...

Dae, despite what the screenshot above shows, I think we should disregard that poll and remove "Evaluate" anyway. David agrees, btw. "Evaluate" gives the user a bunch of numbers...

cosmic hedge Apr 27, 2025, 9:31 PM

#

unique salmon Yeah, but we need the integral

maybe do it with that function as a placeholder just until you get the maths worked out

unique salmon Apr 27, 2025, 9:32 PM

#

The math is worked out, just not ported to Rust

cosmic hedge Apr 27, 2025, 9:33 PM

#

unique salmon The math **is** worked out, just not ported to Rust

well then the placeholder will be easy to implement

polar maple Apr 27, 2025, 9:34 PM

#

unique salmon https://github.com/ankitects/anki/issues/3926#issuecomment-2833639346 Btw, Dae a...

any initial ideas for where we get the health check values from?

cosmic hedge Apr 27, 2025, 9:34 PM

#

polar maple any initial ideas for where we get the health check values from?

#1282005522513530952 message any thoughts on "loss vs trend line"

#

Idk if i like it but it might be ok XD

unique salmon Apr 27, 2025, 9:36 PM

#

polar maple any initial ideas for where we get the health check values from?

From the 10k benchmark, of course. Just have to bother Jarrett to implement the train set = test set version

cosmic hedge Apr 27, 2025, 9:38 PM

#

unique salmon From the 10k benchmark, of course. Just have to bother Jarrett to implement the ...

#1282005522513530952 message I'm assuming we would have to solve this problem 🤔

unique salmon Apr 27, 2025, 9:40 PM

#

cosmic hedge https://discord.com/channels/368267295601983490/1282005522513530952/136198620943...

We'll use RMSE, I'll make a sample size correction

polar maple Apr 27, 2025, 9:40 PM

#

hmm a problem with using average retention is that user A might have a bunch of cards at 25% and another bunch at 75% so it averages to 50%, and user B just has 50%, but user B's log loss will be expected to be higher just because that's the way it is

cosmic hedge Apr 27, 2025, 9:41 PM

#

polar maple hmm a problem with using average retention is that user A might have a bunch of ...

any ideas?

cosmic hedge Apr 27, 2025, 9:41 PM

#

unique salmon We'll use RMSE, I'll make a sample size correction

let me check if RMSE has this problem

polar maple Apr 27, 2025, 9:41 PM

#

i think a health check should be to check the difference between train/test scores based on something like the 5-way split

#

so if FSRS trained on the training set does not generalize well to the test set then this would be a problem that we can indicate to the user

unique salmon Apr 27, 2025, 9:41 PM

#

cosmic hedge let me check if RMSE has this problem

Log-loss is correlated with retention, RMSE (the FSRS kind) is correlated with n(reviews)

unique salmon Apr 27, 2025, 9:41 PM

#

polar maple i think a health check should be to check the difference between train/test scor...

No 5-way split in Anki

polar maple Apr 27, 2025, 9:42 PM

#

unique salmon No 5-way split in Anki

we should add it

unique salmon Apr 27, 2025, 9:42 PM

#

Nah

#

Screw it

polar maple Apr 27, 2025, 9:42 PM

#

the metrics are already meaningless without a train/test set

cosmic hedge Apr 27, 2025, 9:42 PM

#

unique salmon Log-loss is correlated with retention, RMSE (the FSRS kind) is correlated with n...

it does

polar maple Apr 27, 2025, 9:42 PM

#

i can write an algorithm that just memorizes the training data to get nearly perfect on the current metrics, yet this algorithm would be useless

unique salmon Apr 27, 2025, 9:42 PM

#

cosmic hedge it does

Are you sure you are using the right RMSE? From the .jsonl file?

cosmic hedge Apr 27, 2025, 9:43 PM

#

unique salmon Are you sure you are using the right RMSE? From the .jsonl file?

x = [user[1]["true_retention"] for user in users]
y = [user[0]["metrics"]["RMSE"] for user in users]

unique salmon Apr 27, 2025, 9:43 PM

#

nope

#

Actually, we should remove that

#

It serves zero purpose

cosmic hedge Apr 27, 2025, 9:44 PM

#

unique salmon nope

what should i use?

unique salmon Apr 27, 2025, 9:44 PM

#

RMSE(bins)

cosmic hedge Apr 27, 2025, 9:44 PM

#

fsrs_6 = load_jsonl("../srs-benchmark/result/FSRS-6.jsonl")
button_usage = load_jsonl("button_usage.jsonl")

users = list(zip(fsrs_6, button_usage))

cosmic hedge Apr 27, 2025, 9:44 PM

#

unique salmon RMSE(bins)

still a problem

unique salmon Apr 27, 2025, 9:45 PM

#

Well fuck me upside down and sideways then

polar maple Apr 27, 2025, 9:45 PM

#

@quasi shadow why no 5-way split in anki? Evaluate means nothing without a proper train/test split. If we have a train/test split then an idea for a health check would be to compare the metrics between the train set and the test set to directly evaluate for generalization

unique salmon Apr 27, 2025, 9:45 PM

#

polar maple <@449662392314494987> why no 5-way split in anki? Evaluate means nothing without...

Probably just for speed

#

Of Evaluate

polar maple Apr 27, 2025, 9:45 PM

#

we can make this tradeoff to make Evaluate actually mean something

unique salmon Apr 27, 2025, 9:46 PM

#

polar maple <@449662392314494987> why no 5-way split in anki? Evaluate means nothing without...

Also, if the health check is not based on the values that Evaluate actually displays, what's the point?

polar maple Apr 27, 2025, 9:46 PM

#

evaluate could include the train/test values, it doesn't have to remain as it is

unique salmon Apr 27, 2025, 9:47 PM

#

Oh yeah, let's include four values
Logloss (train)
RMSE bins (train)
Logloss (test)
RMSE bins (test)
Surely that will be less confusing and less information overload

#

Come one man, we're not trying to make it good for data scientists, we're trying to make it good for the kind of person who thinks that "log-loss" means "lost reviews"

polar maple Apr 27, 2025, 9:48 PM

#

we can show only the test version

#

this makes the information actually accurate for once

unique salmon Apr 27, 2025, 9:49 PM

#

sigh
@quasi shadow do you want to implement the 5-way split in Anki?

#

Before the release of Anki 25.05

#

I have a feeling the answer is "no"

unique salmon Apr 27, 2025, 9:50 PM

#

cosmic hedge still a problem

I'll make a correction that depends on both retention and n(reviews)

polar maple Apr 27, 2025, 9:51 PM

#

imo it should be either remove Evaluate or add a 5-way split because right now the numbers shown on Evaluate are unreliable

cosmic hedge Apr 27, 2025, 9:51 PM

#

unique salmon I'll make a correction that depends on both retention and n(reviews)

have you looked at the notebook 😭 thats exactly what i'm trying to do

#

well you can build on it if you want

unique salmon Apr 27, 2025, 9:52 PM

#

back to this junk
Ok(SimulationResult { memorized_cnt_per_day, review_cnt_per_day, learn_cnt_per_day, cost_per_day, correct_cnt_per_day, cards, }) }
I have no idea what cards look like, so I can't help here

cosmic hedge Apr 27, 2025, 9:52 PM

#

unique salmon I'll make a correction that depends on both retention and n(reviews)

heres the n(reviews) graph if your intrested

cosmic hedge Apr 27, 2025, 9:53 PM

#

unique salmon back to this junk ` Ok(SimulationResult { memorized_cnt_per_day, ...

https://github.com/open-spaced-repetition/fsrs-rs/blob/092c20bac7d9239a991ae5b561556ad34c706c16/src/optimal_retention.rs#L247-L257 card is here

unique salmon Apr 27, 2025, 9:54 PM

#

This is extremely awkward

I know how to implement the integral in Python
I don't know Rust
I don't know the simulator code very well
You don't know how to implement the integral in Python
You know Rust
You know the simulator code

unique salmon Apr 27, 2025, 9:54 PM

#

cosmic hedge heres the n(reviews) graph if your intrested

Is there some sort of smoothing?

cosmic hedge Apr 27, 2025, 9:54 PM

#

unique salmon Is there some sort of smoothing?

yeah

cosmic hedge Apr 27, 2025, 9:54 PM

#

unique salmon Is there some sort of smoothing?

w/o smoothing

unique salmon Apr 27, 2025, 9:55 PM

#

cosmic hedge w/o smoothing

Yep, that's more like what I've seen when plotting it 😄

#

from statsmodels.nonparametric.smoothers_lowess import lowess
Take this

#

` lowess_smooth = lowess(RMSE, sizes, it=3, frac=0.1, return_sorted=False)
lowess_smooth = np.asarray(lowess_smooth)

new_sizes = new_sizes[sorter]
RMSE = RMSE[sorter]
lowess_smooth = lowess_smooth[sorter]

plt.figure(figsize=(16, 8))
plt.scatter(sizes, RMSE, s=30, color="#1f77b4")
plt.plot(new_sizes, lowess_smooth, linewidth=5, label="LOWESS", color="darkorange")`

Something like this

#

sizes is n(reviews)

cosmic hedge Apr 27, 2025, 10:01 PM

#

unique salmon `from statsmodels.nonparametric.smoothers_lowess import lowess` Take this

there you go

#

i did this

vals = lowess(x, y)
ax.plot([x[0] for x in vals], [x[1] for x in vals])

unique salmon Apr 27, 2025, 10:01 PM

#

cosmic hedge there you go

Plot both unsmoothed and smoothed data

#

Also, you got your axes wrong, lol

#

Axises

#

Erm, whatever

cosmic hedge Apr 27, 2025, 10:03 PM

#

unique salmon Also, you got your axes wrong, lol

yeah i did 😂

#

i'm just plotting what the function spits out though

#

and the shape looks right enough?

unique salmon Apr 27, 2025, 10:04 PM

#

cosmic hedge yeah i did 😂

https://tenor.com/view/u-wot-m8-gif-20998179

Tenor

cosmic hedge Apr 27, 2025, 10:05 PM

#

unique salmon https://tenor.com/view/u-wot-m8-gif-20998179

oh yeah my axes were flipped XD

#

flipped colours sorry

unique salmon Apr 27, 2025, 10:05 PM

#

cosmic hedge oh yeah my axes were flipped XD

For unsmoothed data remove lines, leaving only dots
And bring the smoothed version to the front

unique salmon Apr 27, 2025, 10:06 PM

#

cosmic hedge flipped colours sorry

Yeah, just remove the lines for unsmoothed

#

Make them dots

cosmic hedge Apr 27, 2025, 10:06 PM

#

im feeling like luc gpt rn XD

unique salmon Apr 27, 2025, 10:06 PM

#

Good bot

cosmic hedge Apr 27, 2025, 10:06 PM

#

#

dots too big XD

unique salmon Apr 27, 2025, 10:07 PM

#

cosmic hedge

Make the LOWESS line thick and orange

#

And add these settings to lowess: it=3, frac=0.1

#

And just to make it look a little better
plt.ylim([0, max(RMSE) * 1.025]) plt.xlim([0, max(sizes) * 1.025])

cosmic hedge Apr 27, 2025, 10:09 PM

#

unique salmon And just to make it look a little better ` plt.ylim([0, max(RMSE) * 1.025]) ...

cosmic hedge Apr 27, 2025, 10:10 PM

#

unique salmon And just to make it look a little better ` plt.ylim([0, max(RMSE) * 1.025]) ...

i think matplotlib pads it automatically

unique salmon Apr 27, 2025, 10:10 PM

#

cosmic hedge

https://tenor.com/view/yippee-happy-celebration-joy-confetti-gif-25557730

Tenor

unique salmon Apr 27, 2025, 10:10 PM

#

unique salmon And add these settings to lowess: `it=3, frac=0.1`

Try these settings

cosmic hedge Apr 27, 2025, 10:11 PM

#

unique salmon https://tenor.com/view/yippee-happy-celebration-joy-confetti-gif-25557730

i wonder if this is what chatgpt feels like when i'm done with it

cosmic hedge Apr 27, 2025, 10:11 PM

#

unique salmon Try these settings

that is with those settings

unique salmon Apr 27, 2025, 10:11 PM

#

Ah, ok

#

Anyway, here's what Gemini spat out

📎 message.txt

#

For the integral

#

def average_f_power_forgetting_curve(t1, t2, s, decay):

    def integral_power_forgetting_curve(t, s, decay):
        factor = 0.9 ** (1 / decay) - 1
        return (s / (factor * (decay + 1))) * np.power((1 + factor * t / s), (decay + 1))

    # Calculate F(t2) - F(t1) where F is the antiderivative
    integral = integral_power_forgetting_curve(t2, s, decay) - integral_power_forgetting_curve(t1, s, decay)

    # Divide it by the difference in time to get the average
    return integral / (t2 - t1)

This is like 6 lines of code

#

God damn Gemini

cosmic hedge Apr 27, 2025, 10:13 PM

#

unique salmon Anyway, here's what Gemini spat out

- comments this would also be like 6 lines of code XD

unique salmon Apr 27, 2025, 10:14 PM

#

use ndarray::Array1;

pub fn average_f_power_forgetting_curve(
    t1: &Array1<f64>,
    t2: &Array1<f64>,
    s: &Array1<f64>,
    decay: f64,
) -> Array1<f64> {
    let factor = 0.9_f64.powf(1.0 / decay) - 1.0;
    let exp = decay + 1.0;
    let den_factor = factor * exp;

    // Closure equivalent to the inner integral function
    let integral_calc = |t: &Array1<f64>| -> Array1<f64> {
        // Performs element-wise: (s / den_factor) * (1.0 + factor * t / s).powf(exp)
        (&s / den_factor) * (1.0 + factor * t / s).mapv(|base| base.powf(exp))
    };

    // Calculate integral difference and divide by time difference element-wise
    (integral_calc(t2) - integral_calc(t1)) / (t2 - t1)
}```

cosmic hedge Apr 27, 2025, 10:15 PM

#

unique salmon ```rust use ndarray::Array1; pub fn average_f_power_forgetting_curve( t1: &...

yeah i can probably use this

#

not rn but provided it works

unique salmon Apr 27, 2025, 10:16 PM

#

You can verify it by trying t2 that is only slightly larger than t1, like 100.0001 and 100.0. It should give you a value close to the original forgetting curve

#

If you plug t1 into the forgetting curve function

#

I did this in Python

integral_avg = average_f_power_forgetting_curve(t1, t2, s, decay)
print(f'Average R within the [t1, t2] range: {integral_avg:5f}')

# Brute force check that the integral version is correct
n_values = 500_000  # number of data points between t1 and t2 to be used for averaging
t_range = np.linspace(t1, t2, n_values)
r_values = power_forgetting_curve(t_range, s, decay)
brute_force_avg = np.mean(r_values)
print(f'Brute force calculation of average R within the [t1, t2] range: {brute_force_avg:5f}')
print(f'Brute force calculation agrees with integral calculation: {abs(brute_force_avg - integral_avg) < 1e-7}')```

#

Just brute-force calculated the average of 500k points between t1 and t2

cosmic hedge Apr 27, 2025, 10:19 PM

#

I'll try just forgetting curve it into the future as well

#

seems like a proxy for maximising stability though

unique salmon Apr 27, 2025, 10:21 PM

#

cosmic hedge I'll try just forgetting curve it into the future as well

?

cosmic hedge Apr 27, 2025, 10:22 PM

#

unique salmon ?

just the non intergral version

unique salmon Apr 27, 2025, 10:22 PM

#

But that just gives you 70%

cosmic hedge Apr 27, 2025, 10:22 PM

#

maximise the cards for memorised as if you stopped reviewing on the last day of the simulator, for memorised in a years time

#

like that

unique salmon Apr 27, 2025, 10:23 PM

#

Ah

#

Nah, just use the integral

#

I specifically made it to calculate average R over time without brute-force calculating the average using a loop and a ton of data points

cosmic hedge Apr 27, 2025, 10:24 PM

#

unique salmon ```rust use ndarray::Array1; pub fn average_f_power_forgetting_curve( t1: &...

so t1 is an array of the last day that the cards were reviewed and t2 is an array of say [365]?

#

as in the days in the future to measure?

unique salmon Apr 27, 2025, 10:25 PM

#

cosmic hedge so t1 is an array of the last day that the cards were reviewed and t2 is an arra...

t1 = how many days have passed since this card's last review by the time the simulation ended
t2 = how many days have passed since this card's last review by the time the simulation ended + 365
Or + whatever number

#

t2 > t1

#

Btw, I have no idea what the hell is going on here
let integral_calc = |t: &Array1<f64>| -> Array1<f64> { // Performs element-wise: (s / den_factor) * (1.0 + factor * t / s).powf(exp) (&s / den_factor) * (1.0 + factor * t / s).mapv(|base| base.powf(exp)) };

#

So just by looking at it, I can't tell if Gemini messed up

cosmic hedge Apr 27, 2025, 10:27 PM

#

unique salmon t1 = how many days have passed since this card's last review by the time the sim...

so no fixed end date, just + 365 from when the cards were reviewed?

#

surely then it could be simplified like this

    (integral_calc(365)) / 365
```?

unique salmon Apr 27, 2025, 10:28 PM

#

cosmic hedge so no fixed end date, just + 365 from when the cards were reviewed?

Not quite
To calculate R using the forgetting curve, you need t1: time since the last review
To calculate average R using the integral, you need t1: time since the last review, and t2: time since the last review + offset

bold terrace Apr 27, 2025, 10:28 PM

#

btw since the reverse power curve is super flat quite quick, the integral look like a linear function

cosmic hedge Apr 27, 2025, 10:29 PM

#

unique salmon Not quite To calculate R using the forgetting curve, you need t1: time since the...

yeah but offsets constant right?

unique salmon Apr 27, 2025, 10:29 PM

#

Yep

#

You can rewrite it as t1 and t1+offset, if you want

#

Instead of t2 and t1

cosmic hedge Apr 27, 2025, 10:31 PM

#

unique salmon You can rewrite it as t1 and t1+offset, if you want

what im trying to say is you could then simplify away the t1?

#

to just offset?

unique salmon Apr 27, 2025, 10:31 PM

#

If t1=0, then it's as if the card has been reviewed just now, but that's not necessarily the case

cosmic hedge Apr 27, 2025, 10:31 PM

#

ahh right yeah

bold terrace Apr 27, 2025, 10:31 PM

#

#

I mean this is the integral function for S=5 from t=1 to 365 with decay -.2

#

Maybe using f(x)=x as approx of integral is good enough

unique salmon Apr 27, 2025, 10:32 PM

#

At this point it's genuinely simpler to implement the integral than to try to approximate it for no reason

#

It's not even slow or anything

#

Like, it shouldn't make CMRR much slower

bold terrace Apr 27, 2025, 10:33 PM

#

It had complexity for a glorified f(x)=x

#

unique salmon Apr 27, 2025, 10:33 PM

#

cosmic hedge ahh right yeah

For example, if the card was reviewed 10 days ago, then t1=10 and t2=10+365

bold terrace Apr 27, 2025, 10:33 PM

#

Maybe I'm a bit mean

#

it's not necessary f(x)=x

#

#

quite close to f(x)=x/2

unique salmon Apr 27, 2025, 10:36 PM

#

Man, leave the integral, honestly

#

There is no reason to try to approximate it

#

Even if it makes it 10 milliseconds faster, the simulations themselves take x1,000,000 more time

#

The bottleneck is the simulation, not the final calculation

#

Like, it's genuinely one minute vs 10 milliseconds or something

#

Hold on, let me time the integral

bold terrace Apr 27, 2025, 10:39 PM

#

It's not about making it faster but simpler

#

I think that was your main focus

unique salmon Apr 27, 2025, 10:39 PM

#

7 microseconds
And this is with shit-ass Python

#

Well, tbf, this is for one card

#

But still

unique salmon Apr 27, 2025, 10:40 PM

#

bold terrace It's not about making it faster but simpler

It's 6 lines of code, brah

bold terrace Apr 27, 2025, 10:40 PM

#

Mine is 1, 10 char

#

f(stability)=x/2+stability

#

#

Close enough 😆

#

Sorry

#

don't want to ruin your fun

#

but the double standard is excellent

#

"People don't care about simplicity"

#

"Let's introduce an average integral ... to approximate f(s)=s"

unique salmon Apr 27, 2025, 10:41 PM

#

Simplicity of the UI and simplicity of the math are very different things

#

Come on

bold terrace Apr 27, 2025, 10:41 PM

#

Well at least UI you can move it, the math you need to maintain it

unique salmon Apr 27, 2025, 10:42 PM

#

As long as the user sees simple UI, it doesn't matter what kind of horrors beyond human comprehension are happening in the backend

bold terrace Apr 27, 2025, 10:42 PM

#

😆

#

One day you'll understand IT

unique salmon Apr 27, 2025, 10:42 PM

#

And we already have the monstrosity that is the simulator

#

So I really don't see your point

#

Like, I could see advocating for simplicity before the simulator was implemented in Anki, but it's a bit too late to worry about simplicity now

bold terrace Apr 27, 2025, 10:43 PM

#

And since the function is even more gentle than a f(s)=s, I'm really curious how it will help with the CMRR

#

We saw yesterday sqrt(S) was too gentle

unique salmon Apr 27, 2025, 10:44 PM

#

bold terrace And since the function is even more gentle than a f(s)=s, I'm really curious how...

We'll have to wait for Luc

bold terrace Apr 27, 2025, 10:44 PM

#

f(S) was only strong enough when decay was high enough

unique salmon Apr 27, 2025, 10:44 PM

#

If it doesn't help, then screw CMRR, I guess

bold terrace Apr 27, 2025, 10:44 PM

#

With the new decay, I think the weight should take in account the decay in some way

#

Or not

unique salmon Apr 27, 2025, 10:44 PM

#

Btw @cosmic hedge enable loss aversion for CMRR, the time(Again)*2.5 thingy
NOT for the simulator, ONLY for CMRR

bold terrace Apr 27, 2025, 10:45 PM

#

But then it will be normal that the returned DR is the lowest bound, since in practice the user seem to never go below a certain R

#

But we'll see !

#

Have to sleep, I'll dream about f(x)=x/2

cosmic hedge Apr 27, 2025, 10:46 PM

#

unique salmon Btw <@388069992660205588> enable loss aversion for CMRR, the time(Again)*2.5 thi...

loss aversion is just gone now

unique salmon Apr 27, 2025, 10:47 PM

#

cosmic hedge loss aversion is just gone now

Just jam *2.5 somewhere in there 🤣

#

Idk, find time(Again)

cosmic hedge Apr 27, 2025, 11:14 PM

#

unique salmon Just jam *2.5 somewhere in there 🤣

i do not need to XD
You're lucky I apparently fail to value my own time...
its 0.94 for every deck I've tried it on btw

#

on that note i'm done for the night i still have cards to do XD

unique salmon Apr 27, 2025, 11:25 PM

#

cosmic hedge i do not need to XD You're lucky I apparently fail to value my own time... its 0...

So what you're saying is that we've got the opposite problem now 🤣

unique salmon Apr 27, 2025, 11:25 PM

#

cosmic hedge on that note i'm done for the night i still have cards to do XD

Thank you for your work

#

I highly recommend verifying that the integral works as intended via brute-force averaging, like here #1282005522513530952 message

cosmic hedge Apr 27, 2025, 11:31 PM

#

unique salmon I highly recommend verifying that the integral works as intended via brute-force...

pub struct Card {
    // "id" ignored by "simulate", used purely for hook functions (can be all be 0 with no consequence).
    // new cards created by the simulation have negative id's so use positive ones.
    pub stability: f32,
    pub last_date: f32,
}

pub fn average_f_power_forgetting_curve(
    learn_span: usize,
    cards: &[Card],
    decay: f32,
) -> f32 {
    let factor = 0.9_f32.powf(1.0 / decay) - 1.0;
    let exp = decay + 1.0;
    let den_factor = factor * exp;

    // Closure equivalent to the inner integral function
    let integral_calc = |card: &Card| -> f32 {
        // Performs element-wise: (s / den_factor) * (1.0 + factor * t / s).powf(exp)
        let t1 = card.last_date - learn_span as f32;
        let t2 = t1 + 365.;
        (card.stability / den_factor) * (1.0 + factor * t2 / card.stability).powf(exp) -
        (card.stability / den_factor) * (1.0 + factor * t1 / card.stability).powf(exp)  
    };

    // Calculate integral difference and divide by time difference element-wise
    cards.iter().map(integral_calc).sum::<f32>()
}

fn main() {
    let val = average_f_power_forgetting_curve(10, &vec![Card {stability: 5., last_date: 5.}], -0.2);
    assert_eq!(val, 10.);
}
``` This... explains it

#

if you have time to burn paste that code here https://play.rust-lang.org/?version=stable&mode=debug&edition=2024
edit: I forgot to add the - on the decay XD
edit2: t1 and t2 are the wrong way around

Rust Playground

A browser interface to the Rust compiler to experiment with the language

unique salmon Apr 28, 2025, 12:02 AM

#

The output of average_f_power_forgetting_curve should be between 0 and 1 btw

#

I don't see division by (t2-t1)

cosmic hedge Apr 28, 2025, 12:04 AM

#

unique salmon I don't see division by (t2-t1)

doesn't affect the minima

#

t2-t1 will always be 365 or whatever the offset is

rotund summit Apr 28, 2025, 2:20 AM

#

in the absence of a functional CMRR are there any other tools we can use to find safe minimum DRs without manually gauging how our daily load/time spent changes/increases as I try incrementally decreasing my DR?

quasi shadow Apr 28, 2025, 2:43 AM

#

polar maple <@449662392314494987> why no 5-way split in anki? Evaluate means nothing without...

train/test split only make sense when evaluating different models.

#

The current implementation only evaluates FSRS itself.

#

The train/test split could tell us the generalization capability among different models or ablation variants.

#

But when we only evaluate one model, it's not very helpful.

#

If we implement 5-way split, we will have five sets of parameters optimized on different trainset.

#

And they are all different from the parameters which the user actually uses.

#

What can we derive from the evaluation result with 5-way split?

#

And we have recency weighting. Should the 5-way split consider it?

cursive badge Apr 28, 2025, 2:54 AM

#

Wouldn't we need to have a consistent train/test split if we want an untainted Evaluate?
e.g. card.id mod 10 == 0 are never trained on, only used for evaluation.

quasi shadow Apr 28, 2025, 2:56 AM

#

Then FSRS cannot learn from these cards, and its accuracy would decrease on these cards.

cursive badge Apr 28, 2025, 2:57 AM

#

It might end up with worse actual results, but I cannot see how else you would have comparable numbers for a "health check".

polar maple Apr 28, 2025, 2:59 AM

#

quasi shadow train/test split only make sense when evaluating different models.

not true, train/test shows generalization performance, this is standard in data science

quasi shadow Apr 28, 2025, 2:59 AM

#

IMO, we can use the current metrics to predict the future metrics.

polar maple Apr 28, 2025, 2:59 AM

#

quasi shadow And we have recency weighting. Should the 5-way split consider it?

yes

quasi shadow Apr 28, 2025, 3:00 AM

#

polar maple not true, train/test shows generalization performance, this is standard in data ...

Of course I know it's standard. But I only care about it when I wrote papers.

polar maple Apr 28, 2025, 3:01 AM

#

hmm not sure what to say to that, if you purposefully want shoddy data science then be my guest

quasi shadow Apr 28, 2025, 3:02 AM

#

please answer my questions above

#

my professor didn't teach me about that

polar maple Apr 28, 2025, 3:03 AM

#

quasi shadow If we implement 5-way split, we will have five sets of parameters optimized on d...

the health check be a FSRS-training in progress check, for example we can train on the first 4/5 of the revlogs and evaluate on the last 1/5 as a health check, and then just before finalizing the parameters we run another epoch where we include the last 1/5 so that we have full coverage

cursive badge Apr 28, 2025, 3:04 AM

#

It feels like one of the biggest reoccurring problems with SRS is we are so starved for data 😅. We really need the mega AI to come along that is trained on such a stupid amount of data that it works well without much per-user data.

quasi shadow Apr 28, 2025, 3:05 AM

#

But the first set of parameters cannot stand for the final one. The health check only represent the health or the first set of parameters.

#

If it could stand for the final one, the final one could also stand for the future one.

cursive badge Apr 28, 2025, 3:06 AM

#

polar maple the health check be a FSRS-training in progress check, for example we can train ...

If the test split is the last 1/5 couldn't that also really harm the training. I thought there was a very noticeable improvement with recency weighting.

quasi shadow Apr 28, 2025, 3:06 AM

#

If so, why not just evaluate the final one?

polar maple Apr 28, 2025, 3:07 AM

#

this is only a compromise so that all the data is used, the alternatives is to just use the first 4/5 parameters as the final set, or remove evaluate altogether because we cannot actually say anything about the performance on unseen data

#

e.g. even those big LLMs that you see going around do not train on the final test set before deployment, probably

cursive badge Apr 28, 2025, 3:07 AM

#

cursive badge If the test split is the last 1/5 couldn't that also really harm the training. I...

That's why I was suggesting something like card.id mod 10 == 0

polar maple Apr 28, 2025, 3:08 AM

#

cursive badge That's why I was suggesting something like `card.id mod 10 == 0`

this technically would still leak information but its better than nothing

cursive badge Apr 28, 2025, 3:11 AM

#

Now I'm confused. How would that "leak information" if they were not used for training?

#

N.B. I'm not a big data science person 😅

#

(also part of my reasoning for that split is it would stay mostly consistent over time as you add cards etc.)

#

I suppose you could also just add data marking cards as a "test" card so they stay consistent 🤔

polar maple Apr 28, 2025, 3:15 AM

#

cursive badge Now I'm confused. How would that "leak information" if they were not used for tr...

there is leakage through time, say you're trying to predict the y value on this graph, you can just look at the training set to find nearby x values to predict a reasonable y value

#

this is s&p 500

#

so if you have a model that cheats this way, it would be useless for future prediction

#

but FSRS's goal is to predict the future, not fit the past

#

so its similar

cursive badge Apr 28, 2025, 3:19 AM

#

I might just be dumb but I still cannot see it. If you isolate the revlogs of entire cards I cannot see how the model can cheat other than overall trends (e.g. you performed badly one day because you were tired).

polar maple Apr 28, 2025, 3:22 AM

#

cursive badge I might just be dumb but I still cannot see it. If you isolate the revlogs of en...

that's why i say that leakage is still possible, that's one of the ways it could be abused

#

but it might not be significant enough such that the metrics are still reliable enough if we are only limited to parameters that are trained on a subset of the full revlog

#

parameters trained on card.id mod 10 != 0 would likely perform better than on first 9/10 of the revlog

#

since it would include the most recent information as well

quasi shadow Apr 28, 2025, 3:33 AM

#

OK, just implement the same method used by the SRS Benchmark.

#

Then we can compare the evaluation result with SRS Benchmark's result.

#

Everything is solved.

polar maple Apr 28, 2025, 3:34 AM

#

another pro: now recency is expected to improve the metrics

quasi shadow Apr 28, 2025, 4:18 AM

#

https://github.com/open-spaced-repetition/fsrs-rs/pull/326

GitHub

Feat/evaluate FSRS via time series split by L-M-Sherlock · Pull Re...

#

Done

#

🤣 90% AI generate

quasi shadow Apr 28, 2025, 4:29 AM

#

polar maple another pro: now recency is expected to improve the metrics

would you mind reviewing it?

quasi shadow Apr 28, 2025, 5:26 AM

#

btw, it doesn't evaluate any parameters.

#

It only evaluates the "optimization" with given dataset.

#

And it's slower than training.

quasi shadow Apr 28, 2025, 5:47 AM

#

quasi shadow btw, it doesn't evaluate any parameters.

It means changing the parameters will not affect the evaluation result.

polar maple Apr 28, 2025, 5:56 AM

#

quasi shadow would you mind reviewing it?

i'm not comfortable enough with rust to give a good review

polar maple Apr 28, 2025, 5:56 AM

#

quasi shadow It means changing the parameters will not affect the evaluation result.

i think this is fine, change parameters at your own risk and use tools outside of Anki to really tinker around with your parameters

quasi shadow Apr 28, 2025, 6:16 AM

#

wait...

#

#

I forgot we have use the evaluate here.

#

Is it worth to keep the previous evaluate?

cosmic hedge Apr 28, 2025, 6:27 AM

#

quasi shadow Is it worth to keep the previous `evaluate`?

Is it possible that the test train split evaluate loss goes up when the current evaluates loss goes down? Because if that happens it could appear as if the parameters got worse after optimising right?

#

Didn't that check not exist at the start anyway so it would be fine anyway but just saying.

quasi shadow Apr 28, 2025, 6:34 AM

#

😂 Fine. I don't want to modify too much code, so I keep it.

bold terrace Apr 28, 2025, 7:21 AM

#

Something also to keep in mind is compared to case like prediction SP500, here we have way more constraints on how the model can adapt to predict :

There's no up and down. Question is "by how much do we increase S/D when getting right a review right" or down
By nature, we could expect memory to not be super volatile like being "5 times as more potent on certain days" and "only half the perf the other day". So splitting train/test while it makes sense, but it might not change much here

quasi shadow Apr 28, 2025, 7:25 AM

#

https://github.com/ankitects/anki/pull/3962

GitHub

Feat/evaluate FSRS with time series split by L-M-Sherlock · Pull R...

#

OK, the prep work has been done for the health check idea.

#

😎 I leave the rest of work to @unique salmon

lapis hearth Apr 28, 2025, 8:19 AM

#

If Anki gets a short term memory model, I believe I can rest in peace finally...

unique salmon Apr 28, 2025, 8:56 AM

#

cursive badge It feels like one of the biggest reoccurring problems with SRS is we are so star...

@polar maple release the thing

unique salmon Apr 28, 2025, 8:58 AM

#

quasi shadow It means changing the parameters will not affect the evaluation result.

Wut

#

Oh, yeah, I get it

#

Right now Evaluate evaluates a set of parameters, but that's not the case with the 5-way split

#

Wait, now people will complain that changing parameters doesn't change the numbers in Evaluate 😭

#

God, this is such a pain

#

It would be so much easier and less confusing to just remove Evaluate

severe storm Apr 28, 2025, 9:07 AM

#

@polar maple may I ask what was the architecture of the neural network you trained that had better loss than fsrs

unique salmon Apr 28, 2025, 9:08 AM

#

cursive badge It feels like one of the biggest reoccurring problems with SRS is we are so star...

Go kick Alex's butt so he releases his neural net, which is exactly what you described

unique salmon Apr 28, 2025, 9:08 AM

#

severe storm <@142448513622605824> may I ask what was the architecture of the neural network ...

https://github.com/BlinkDL/RWKV-LM

GitHub

GitHub - BlinkDL/RWKV-LM: RWKV (pronounced RwaKuv) is an RNN with g...

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it'...

severe storm Apr 28, 2025, 9:09 AM

#

thank you

quasi shadow Apr 28, 2025, 9:09 AM

#

unique salmon Right now Evaluate evaluates a set of parameters, but that's not the case with t...

It actually splits the dataset into 6 sets.

unique salmon Apr 28, 2025, 9:10 AM

#

unique salmon Wait, now people will complain that changing parameters doesn't change the numbe...

What do we do about this?

quasi shadow Apr 28, 2025, 9:10 AM

#

unique salmon https://github.com/BlinkDL/RWKV-LM

FSRS's versioning will catch up RWKV's🤣

quasi shadow Apr 28, 2025, 9:11 AM

#

unique salmon What do we do about this?

As Alex said, we cannot evaluate the parameters with unseen data.

unique salmon Apr 28, 2025, 9:16 AM

#

So we're making Evaluate even more confusing to the average user...

#

https://youtu.be/86P8SnEnF0g?si=A86xXwogbsq3OXho

YouTube

Alex Klénin

Oh, this is fucking shit!

Fragment from "The Sopranos" TV series, season 5, episode 7.

Uncle June cries on wake. Shit!

▶ Play video

quasi shadow Apr 28, 2025, 10:00 AM

#

For the sake of data science!

unique salmon Apr 28, 2025, 10:25 AM

#

cosmic hedge doesn't affect the minima

Fair
So how is it going?

cosmic hedge Apr 28, 2025, 10:27 AM

#

unique salmon Fair So how is it going?

haven't touched it today yet.

#

if you have any other ideas lmk

unique salmon Apr 28, 2025, 10:28 AM

#

Nope, just the integral with various values of offset (or whatever you wanna call it)

unique salmon Apr 28, 2025, 10:40 AM

#

cosmic hedge if you have any other ideas lmk

Oh, yeah, and make a draft PR for the new diagram for Evaluate

#

Still trying to figure out how to correct for both retention and n(reviews), though. Can't use LOWESS on 2D data FeelsBadAnki

unique salmon Apr 28, 2025, 10:45 AM

#

cosmic hedge i think matplotlib pads it automatically

I recommend updating the graphs here: https://github.com/Luc-Mcgrady/anki-10k-notebooks
Set frac=0.4 for LOWESS, to make it smoother

GitHub

GitHub - Luc-Mcgrady/anki-10k-notebooks

Contribute to Luc-Mcgrady/anki-10k-notebooks development by creating an account on GitHub.

#

Man, I'm just giving you tons of work 😅

unique salmon Apr 28, 2025, 11:04 AM

#

Here's how RMSE (bins) depends on n(reviews)

#

And here's how it depends on retention

#

Now the question is - how the hell do I make a correction based on both...

#

Here's a 3D plot just because why not

bold terrace Apr 28, 2025, 2:57 PM

#

Hey, what's the rational behind the second learning step ? I see it seems to match the of "Again then Good", but a second learning steps would be a "Good then Good", right ? For intervals like 82m, it might also means the second learning step will be done the next day, which I thought was not that ideal with FSRS if you want to let FSRS control your interval right ?

unique salmon Apr 28, 2025, 2:57 PM

#

bold terrace Hey, what's the rational behind the second learning step ? I see it seems to mat...

@quasi shadow

quasi shadow Apr 28, 2025, 3:00 PM

#

bold terrace Hey, what's the rational behind the second learning step ? I see it seems to mat...

If you first rating is good, anki will use the 2nd step.

#

https://docs.ankiweb.net/deck-options.html?highlight=learning#learning-steps

Deck Options - Anki Manual

Anki's user manual. Anki is a flashcard program that makes learning easier.

bold terrace Apr 28, 2025, 3:01 PM

#

Sure, I know the meaning of having 8h as a 2nd step, I'm not entirely sure why FSRS figured out it should be 8h. It feels like it's the "Again then Good" stability, but I don't really see the logic behind it

#

I mean, getting to that 2nd learning step might be any of those case. Could be a Again or a Good first right

polar maple Apr 28, 2025, 3:05 PM

#

unique salmon Go kick Alex's butt so he releases his neural net, which is exactly what you des...

to be clear there are currently no plans to release this nn in anki

bold terrace Apr 28, 2025, 3:05 PM

#

Would make more sense to take : {Again} {Good Then Again} no ? To see if the card is not a difficult one to learn, instead of {Again} {Again then Good}

unique salmon Apr 28, 2025, 3:05 PM

#

polar maple to be clear there are currently no plans to release this nn in anki

Yeah, but I just want to see funny benchmark numbers

quasi shadow Apr 28, 2025, 3:08 PM

#

bold terrace Sure, I know the meaning of having 8h as a 2nd step, I'm not entirely sure why F...

Because if your first rating is again, anki will use the 1st step. Then you grade good, and anki will use the 2nd step.

quasi shadow Apr 28, 2025, 3:08 PM

#

bold terrace Would make more sense to take : {Again} {Good Then Again} no ? To see if the car...

Good Then Again?

#

You will use the first step for this case.

bold terrace Apr 28, 2025, 3:10 PM

#

quasi shadow Because if your first rating is again, anki will use the 1st step. Then you grad...

Yeah but if you press "Again", then I agree, the stability to use is the one for "Again", here 119s.
BUt if you press "Good", and want to know what could be a second learning step, you'd like to take the stability of "The cards that were Good, but unfortunately didn't make it at the second steps", right ?

#

But I guess it's subjective

#

On your side I think your point is : "If in avg, the guy that does Again->Good has a 8h stab, so the second step should have the 8h stab". When in my case I"m more like "If the guy that press Good has a stab of 2.46days, and when things go wrong (Good -> Again), the have only a stab of 11min, then let's put the second learning step at 11min to see if the card survive that stability"

#

My point being : If I take your logic, then "Good then Again" would also put him in the first learning step, so let's use that "11.25m" for a first learning step

#

Ideally those steps should be more like "What kind of succession of good reviews with how much space between those would give the user a DR %-age of chance to have a 1d interval (so he has the chance to succeed it the next day with the desired DR)

#

That's why the {Good Then Again} makes more sense to me : It's the stability of those cards that might good look on paper (first step was a success), but guess what he got the 2nd step wrong .... Well, next time he got it right in the first step, let's wait for that interval to see if now he remember it

#

Good (8h) -> Good : Make sense, it's less optimal then using {Good Stability}, but at least we're sure he's not in a "Good then Again" situation.
Again (2m) -> Good (8h) : Make sense, we use {Again Stability} as interval for the first one, and then the {Again Then Good} stability.
Good (8h) -> Again(2m) -> Good (8h) : the** 8h doesn't make much sense here**, the "Good then Again (12m)" should be used, because the guy just did exactly that, finishing by a "Again then Good".

But yeah, the whole learning steps, if hardcoded in the deck options, lack flexibility to really use those values. And leaving it empty means most of the time not even having learning step (If good/hard have ~ >.5 stability)

clever cargo Apr 28, 2025, 3:31 PM

#

why is it "Good (2m)"? doesnt it immediately go over to the second step

bold terrace Apr 28, 2025, 3:32 PM

#

clever cargo why is it "Good (2m)"? doesnt it immediately go over to the second step

Oops

#

Thanks

cosmic hedge Apr 28, 2025, 3:33 PM

#

bold terrace Hey, what's the rational behind the second learning step ? I see it seems to mat...

https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/fc3f44ba93a7fdf586150e615efcf0a7ece5e625/stats.py#L277 heres the code if it helps

bold terrace Apr 28, 2025, 3:34 PM

#

But then yeah even the Good -> Again doesn't make sense. If we know Good -> Again stability is 12min, having to make the user wait 8h (The Again then Good) feels off

cosmic hedge Apr 28, 2025, 3:35 PM

#

decay = -0.5? i'm guessing thats on purpose

bold terrace Apr 28, 2025, 3:35 PM

#

TBH the most logical would maybe even :
First Step : {Again}
Second Step : math.min(AtG, GtA)

unique salmon Apr 28, 2025, 3:37 PM

#

cosmic hedge decay = -0.5? i'm guessing thats on purpose

Oops
@quasi shadow this line needs to be updated for FSRS-6: https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/fc3f44ba93a7fdf586150e615efcf0a7ece5e625/stats.py#L255

GitHub

fsrs4anki-helper/stats.py at fc3f44ba93a7fdf586150e615efcf0a7ece5e6...

An Anki add-on that supports Postpone & Advance & Load Balance & Easy Days & Disperse Siblings & Flatten - open-spaced-repetition/fsrs4anki-helper

polar maple Apr 28, 2025, 3:38 PM

#

i think there is reason to believe that the current FSRS forgetting curve shape doesn't work well for short stabilities

cosmic hedge Apr 28, 2025, 3:38 PM

#

bold terrace TBH the most logical would maybe even : First Step : {Again} Second Step : math....

    ratings = {
        1: "again",
        2: "hard",
        3: "good",
        4: "again-then-good",
        5: "good-then-again",
        0: "lapse",
    }

Math.min(stability[2] * 2 - stability[1], stability[3], stability[4])
``` It's the min of good and again-then-good

polar maple Apr 28, 2025, 3:38 PM

#

you need some level of near-instant forgetting built into the curve

unique salmon Apr 28, 2025, 3:39 PM

#

I experimented a bit with FSRS for short-term reviews and what helped was subtracting an "instability term" from S that decays with time, so it would be like S=S-I, where I=f(time)
The metrics were still shit though

#

But it helped make them less shit

#

It would look like this

#

polar maple Apr 28, 2025, 3:43 PM

#

👍

bold terrace Apr 28, 2025, 4:18 PM

#

cosmic hedge ```js ratings = { 1: "again", 2: "hard", 3: "good", ...

Nice find ! And of hard * 2 - Again, for whatever it means

bold terrace Apr 28, 2025, 4:20 PM

#

polar maple you need some level of near-instant forgetting built into the curve

I agree, most people just don't realise that "Haven read something" doesn't even mean having it memorized in any kind of form, so as long as you treat those spam'ed Again like you would treat any kind of other review, with the current model, you'll get screwed

#

There's also the coin-flip knowledge : The knowledge that might get a very high Stability as long as you get enough lucky with your wacky knowledge

#

Example : What year X was born ? Let's say you hesitate between 1990 and 1980

unique salmon Apr 28, 2025, 4:22 PM

#

bold terrace I agree, most people just don't realise that "Haven read something" doesn't even...

Here are forgetting curves from Alex's 2.7 million parameters neural net
Notice how they fall off very sharply initially

#

But then again, this isn't useful for FSRS since we're not trying to model what's going on during same-day reviews
It's just experimental evidence that a "proper" forgetting curve looks different from what most (~all) people imagine

bold terrace Apr 28, 2025, 4:23 PM

#

Let's say your model is built on 90% healthy cards, with good knowledge of it, not much ambuigity.
So now, to that flip-coin question, you might get it right 4 times in a row (1 chance over 16), get a stability of 4 months because you did "Good-Good-Good-Good"...

#

But guess what, you never really knew it

#

So sometimes "very low stability" might mean :

You never even memorized it a single time
You never had anything truly substantial to remember, you just remembered it was A or B and most of the time you can get it right

#

And then you complain about Anki minimum review interval to be 10min

#

(For case 1)

#

But for case 2 however (the flip coin question), I have no idea how you could detect that

#

maybe with 2.7 millions parameters ?

#

😄

#

#

I have some ideas though

#

That ratio of performance drop

#

For example that word I always hesitate between "shuusei" or "shousei"

#

How does it translate ? Lapses with no-increasing performance

#

I think those kind of engineered features, fed to a NN like yours @polar maple , could really detect those

robust hill Apr 28, 2025, 4:28 PM

#

when is it best to do the first few optimizations of a deck with new deck options

#

after 5 days or after passing the daily reviews size 5x over?

unique salmon Apr 28, 2025, 4:29 PM

#

robust hill when is it best to do the first few optimizations of a deck with new deck option...

Whenever you want

#

There's no "too early"

robust hill Apr 28, 2025, 4:29 PM

#

well im worried that

#

if i do it too early

#

it will cook me

bold terrace Apr 28, 2025, 4:29 PM

#

@polar maple : If you want to use those engineered features, most of the logic is here : https://github.com/JSchoreels/anki-addon-leechdetector/blob/main/leechdetector/leech_detector.py#L42-L62

lapis hearth Apr 28, 2025, 5:46 PM

#

i want to have an in review leech detector

unique salmon Apr 28, 2025, 8:46 PM

#

@cosmic hedge @quasi shadow Alright, this was a massive pain, but I made a function that approximates RMSE as f(n_reviews, retention)
def func(a, b, c, d, e, f, g, x, y): return a / (np.power(x, b) + g) + c / (np.power(y, d) + f) + e
x is n_reviews/1000, y is retention
List with parameters (a, b, c, d, e, f, g): [0.16398, 0.73318, 0.018426, 10.0, 0.0, 0.35881, 1.6193]
(e ended up being 0, so we can remove it)
Two notes:

I divided n_reviews by 1000, just because
retention is calculated including same-day reviews

So now we can calculate normalized RMSE - take the user's real RMSE and divide it by the output of this function. If the ratio is >1, then the user's RMSE is greater than what would be expected for his retention and n(reviews). If the ratio is <1, then it's lower than what would be expected. Then we can calculate the percentiles of this normalized RMSE, and finally use that as cutoff values for the health check.
1st percentile of norm. RMSE=0.29912 50th percentile of norm. RMSE=0.9972 90th percentile of norm. RMSE=1.8269 99th percentile of norm. RMSE=3.3006
How to use this:

Take the user's normalized RMSE (which, again, is a ratio) and clamp it so that it's not lower than 0.29912 (1st percentile) and not higher than 3.3006 (99th percentile)
If it's <=50th percentile, display "Good" (green zone)
If it's between the 50th and 90th percentile, display "Acceptable" (yellow zone)
If it's >=90th percentile, display "Poor" (red zone)

One last issue - what do we do if it's in the red? The user will panic and be like "nyooo my fsrs is dying...". But I don't know what to display. If we display a list of possible reasons for poor performance, I doubt people will read it. The kind of person who misuses buttons or uses Anki in some dumb way is exactly the kind of person who won't read a list of possible explanations.

polar maple Apr 28, 2025, 9:00 PM

#

unique salmon <@388069992660205588> <@449662392314494987> Alright, this was a massive pain, bu...

i get that we are correcting for the correlation between RMSE and retention here but if we are doing this correction, why not just use log loss?

unique salmon Apr 28, 2025, 9:00 PM

#

polar maple i get that we are correcting for the correlation between RMSE and retention here...

It will require a correction too

#

In this case I don't think either of the two has any advantages, so we might as well flip a coin

polar maple Apr 28, 2025, 9:01 PM

#

ik, i thought originally you wanted to use RMSE because it didn't require a correction but we discovered yesterday that it also needs one

#

then in this case we might as well use log loss since it is a more accurate metric

cosmic hedge Apr 28, 2025, 9:18 PM

#

unique salmon <@388069992660205588> <@449662392314494987> Alright, this was a massive pain, bu...

what does the distribution look like? (this graph)

unique salmon Apr 28, 2025, 9:18 PM

#

cosmic hedge what does the distribution look like? (this graph)

Of normalized RMSE?

cosmic hedge Apr 28, 2025, 9:18 PM

#

yeah

#

well of all the users - the RMSE normalising value(?) which is what i assume were going to be using?

#

how far is the average user from the "middle value" is what trying to ask

#

ahh yeah you get the percentiles so it should probably be good anyway?

unique salmon Apr 28, 2025, 9:23 PM

#

cosmic hedge yeah

Black line is the median

unique salmon Apr 28, 2025, 9:24 PM

#

cosmic hedge ahh yeah you get the percentiles so it should probably be good anyway?

Yep, I described the procedure in the second half of this message: #1282005522513530952 message

unique salmon Apr 28, 2025, 10:18 PM

#

@cosmic hedge where integral

#

#

I need to know if we're axing CMRR or no

#

And Dae has been asking about it too

polar maple Apr 28, 2025, 10:28 PM

#

how about return 0.9 for CMRR

unique salmon Apr 28, 2025, 10:37 PM

#

Also, I have some things to say regarding the wording in Evaluate, but I guess it's a bit too early for that since Luc hasn't made a PR yet

#

I think we should rename it to "Evaluate FSRS" to make it clear that we are evaluating the algorithm, not the parameters
The text should say "Lower values of RMSE and log-loss indicate a better fit to your review history. The results do not depend on your current FSRS parameters". This makes it more clear what is going on

#

Still unsure what to do with people who will get "Poor"

#

I mentioned it on Github

bold terrace Apr 28, 2025, 10:48 PM

#

"The results do not depend on your current FSRS parameters" ?

#

I change my param, I press evaluate, the values change

unique salmon Apr 28, 2025, 10:48 PM

#

Not anymore 🙂

#

Alex and Jarrett want to follow The Path of The Data Scientist

tepid spoke Apr 28, 2025, 10:49 PM

#

That makes it kinda useless then, doesn't it? Like, it's to check how well your current parameters fit your collection.

bold terrace Apr 28, 2025, 10:49 PM

#

And in the road of the datascientist, the Evaluate would not Evaluate the parameters against the Test Sets ?

polar maple Apr 28, 2025, 10:49 PM

#

bold terrace And in the road of the datascientist, the Evaluate would not Evaluate the parame...

that's exactly what the new evaluate will do, however, i believe jarrett wants the parameters to still be trained on all data which leaves us with no Test set

#

so we will basically test FSRS on a train/test split, but the final parameters will still be from training on all of the available data

tepid spoke Apr 28, 2025, 10:50 PM

#

that's SUPER unintuitive, and better to just remove then

bold terrace Apr 28, 2025, 10:50 PM

#

Ok but in any case, you change the param, you potentially change the log loss no ?

#

If not, then the Evaluate use what params ?

polar maple Apr 28, 2025, 10:51 PM

#

bold terrace If not, then the Evaluate use what params ?

this is what Expertium is getting at, Evaluate will not evaluate on current parameters, rather it will evaluate how well the FSRS algorithm does as a whole

polar maple Apr 28, 2025, 10:51 PM

#

bold terrace Ok but in any case, you change the param, you potentially change the log loss no...

no change

unique salmon Apr 28, 2025, 10:52 PM

#

Yeah, the health check will be for FSRS as an algorithm, not for a specific set of params

cursive badge Apr 28, 2025, 10:52 PM

#

If Evaluate becomes a sort of self-test and doesn't look at your params I kind of want an "Automatic/Manual" toggle.

Automatic: you don't show (editable?) params but have an evaluate button.
Manual: you show params but don't have an evaluate button.

bold terrace Apr 28, 2025, 10:52 PM

#

Ok so if I understand correctly :
You Press Evaluate -> It trains -> It gets params -> It evalute on test set ?

polar maple Apr 28, 2025, 10:52 PM

#

tepid spoke That makes it kinda useless then, doesn't it? Like, it's to check how well your ...

this has to do with a common problem in data science, just because a model fits training data well doesn't mean it will generalize well onto unseen data

bold terrace Apr 28, 2025, 10:52 PM

#

Why not, if params are present, skip the "train" and evalute on test set ?

bold terrace Apr 28, 2025, 10:53 PM

#

polar maple this has to do with a common problem in data science, just because a model fits ...

I understand that very well, I just don't understand why you can't evalute params, either coming from an optimization or from your own hands

polar maple Apr 28, 2025, 10:53 PM

#

bold terrace I understand that very well, I just don't understand why you can't evalute param...

i think that if you want to tinker with params in this way you should do it outside of anki

unique salmon Apr 28, 2025, 10:53 PM

#

Me waiting for Alex's course on machine learning

tepid spoke Apr 28, 2025, 10:53 PM

#

I just don't see what information I would gain from Evaluating FSRS as an algorithm against my deck.

unique salmon Apr 28, 2025, 10:53 PM

#

https://tenor.com/view/bill-hader-eating-popcorn-keith-morrison-dateline-snl-gif-25666169

Tenor

bold terrace Apr 28, 2025, 10:53 PM

#

polar maple i think that if you want to tinker with params in this way you should do it outs...

Oh OK, I don't mind that

tepid spoke Apr 28, 2025, 10:54 PM

#

While with evaluating the params, I at least have SOME kind of idea if one set is better than the other, or one set is a completele miss somehow

bold terrace Apr 28, 2025, 10:54 PM

#

The params shouldnt be hidden in some ways ?

#

Would be extremely strange for a UX point of view that you could see and edit params, that they would have effect on scheduling, but not on the Evaluate function

polar maple Apr 28, 2025, 10:55 PM

#

tepid spoke While with evaluating the params, I at least have SOME kind of idea if one set i...

i think this shouldn't be a problem, when you optimize parameters you only get new parameters if they are an improvement over the previous values in terms of training loss

bold terrace Apr 28, 2025, 10:56 PM

#

Or if shown, at least put in read-only

#

Having something that would alter scheduling but not evaluation feel extremely off

tepid spoke Apr 28, 2025, 10:56 PM

#

Why? That has zero merit. People who don't know what they do won't touch them anyway

#

And if you really WANT to modify them, you can anyway. You just made it more annoying.

bold terrace Apr 28, 2025, 10:57 PM

#

I'm more thinking about the people who know what they are but were not expecting people to use them in some function (scheduling) but not in other 😆

unique salmon Apr 28, 2025, 10:58 PM

#

Just saying, all of this mess could have been avoided if people voted for removing Evaluate ¯_(ツ)_/¯
But we got 73% first-preference votes for the "health check"

#

https://tenor.com/view/impeachment-love-democracy-i-love-democracy-gif-15723806

Tenor

tepid spoke Apr 28, 2025, 10:58 PM

#

Well, the vote with no word said that it would stop evaluating your parameters...

unique salmon Apr 28, 2025, 10:59 PM

#

I asked Jarrett "Do you want to implement train set = test set in the benchmark or the 5-way split in Anki?" and he chose the latter

bold terrace Apr 28, 2025, 10:59 PM

#

The choice of not linking Evaluate to the Parameters in the UI is just an abitrary choice, not a real limitation

#

I just feel the whole topic is some kind of ego war more than anything

#

"I need to do the split I don't wantto do ? Then I won't let you evaluate it"

tepid spoke Apr 28, 2025, 10:59 PM

#

Evaluate seems entirely pointless if it does not give you a metric related to the current parameters.

cursive badge Apr 28, 2025, 10:59 PM

#

Well train=test in the benchmark is bad science.

bold terrace Apr 28, 2025, 10:59 PM

#

"You didn't wanted to hide the Evaluate ? I'll make it useless"

bold terrace Apr 28, 2025, 11:00 PM

#

cursive badge Well train=test in the benchmark is bad science.

Yeah I'm not arguing that

cursive badge Apr 28, 2025, 11:00 PM

#

(I meant Expertium's comment)

polar maple Apr 28, 2025, 11:00 PM

#

i think rossgb's card_id mod 10 == 0 might be a decent compromise

cursive badge Apr 28, 2025, 11:00 PM

#

I'm just slow 😅

unique salmon Apr 28, 2025, 11:01 PM

#

bold terrace "You didn't wanted to hide the Evaluate ? I'll make it useless"

Feel free to tell Jarrett to implement train = test in the benchmark as a new command or whatever

bold terrace Apr 28, 2025, 11:01 PM

#

I mean, let's just mark the card used for training, and when the guy click on optimize, it runs on the test-set + all the new data

#

Re-Optimized ? Let's mark the new training set and now Evaluate will work on All\Training Set

unique salmon Apr 28, 2025, 11:02 PM

#

We need either train = test in the benchmark or the 5-way split in Anki. Either one will do

#

Because right now the numbers from Evaluate cannot be compared to the benchmark numbers

#

So either benchmark has to ankified or anki has to be...benchmarkified

polar maple Apr 28, 2025, 11:02 PM

#

unique salmon So either benchmark has to ankified or anki has to be...benchmarkified

the benchmark itself should never be changed in this way

unique salmon Apr 28, 2025, 11:03 PM

#

polar maple the benchmark itself should never be changed in this way

It could just be a separate command

cursive badge Apr 28, 2025, 11:03 PM

#

polar maple i think rossgb's `card_id mod 10 == 0` might be a decent compromise

I can see Jarrett's argument for the "self-test" version. A test split in normal use risks significantly worse performance when we have such a small dataset for an individual user.

unique salmon Apr 28, 2025, 11:03 PM

#

unique salmon It could just be a separate command

Then we can keep the current behavior of Evaluate

bold terrace Apr 28, 2025, 11:04 PM

#

cursive badge I can see Jarrett's argument for the "self-test" version. A test split in normal...

IMO this is not necessarly super true, from my own anecdotical experience, when you reach ˜10-20k reviews, you don't get much changing prediction anyway

#

Most people using anki for a year have way more than 10-20k

#

And youngsters should use default params for longer

#

Before there was a threshold to reach I think before being able to optimize

unique salmon Apr 28, 2025, 11:04 PM

#

bold terrace And youngsters should use default params for longer

Not THAT long. Not for 10k reviews

#

There's nothing wrong with optimizing much earlier

bold terrace Apr 28, 2025, 11:05 PM

#

Right now is there any threshold yet ?

#

Or is it you can optimize with 10 reviews ?

unique salmon Apr 28, 2025, 11:05 PM

#

Not really. Like 8 reviews for pretrain, 64 for full optimization

bold terrace Apr 28, 2025, 11:05 PM

#

IMO this is not great

unique salmon Apr 28, 2025, 11:05 PM

#

Except it has some wacky filters and whatnot

bold terrace Apr 28, 2025, 11:06 PM

#

when you see @tepid spoke case with 100d stability when he filter out cards with Hard/Easy as first review

unique salmon Apr 28, 2025, 11:06 PM

#

So you'll never figure out the exact number of reviews used for training

unique salmon Apr 28, 2025, 11:06 PM

#

bold terrace IMO this is not great

A long time ago another user helped me and Jarrett with it. We found that 64 reviews for full optimization is alright

#

Better than the defaults

polar maple Apr 28, 2025, 11:06 PM

#

do we have any performance metrics for low # of review collections?

bold terrace Apr 28, 2025, 11:06 PM

#

Ok but maybe some rules like "At least a few Easy, Hard ... ?""

tepid spoke Apr 28, 2025, 11:06 PM

#

hm? The 100.000 stability value appears when I filter out enough cards via "Ignore cards reviewed before"

bold terrace Apr 28, 2025, 11:07 PM

#

Because getting 100d stability because you lacked certain case is a bit meh

unique salmon Apr 28, 2025, 11:07 PM

#

bold terrace Ok but maybe some rules like "At least a few Easy, Hard ... ?""

Mmmm, delicious two-button users' tears, yummers

#

https://tenor.com/view/homelander-the-boys-yummers-gif-730693117818784194

Tenor

bold terrace Apr 28, 2025, 11:08 PM

#

Oh yes, I'm 99% using only again/good, but still thinking about the others

#

But I guess it's not that much a big issue

#

Except if Anki use FSRS by default and auto-optimize optimize with 30 reviews

#

But SM2 has the benefit of building a training set for FSRS lol

unique salmon Apr 28, 2025, 11:11 PM

#

Anyway, if anyone wants "Evaluate parameters" instead of "Evaluate FSRS", tell Jarrett to implement train = test in the benchmark

#

Because right now he's doing the exact opposite - implementing the 5-way split in Anki

#

Again, don't point fingers at me. I asked, he chose

polar maple Apr 28, 2025, 11:12 PM

#

if we implement card_id % 10 == 0 we can keep the current Evaluate and it wouldn't be so incorrect

bold terrace Apr 28, 2025, 11:12 PM

#

unique salmon Anyway, if anyone wants "Evaluate parameters" instead of "Evaluate FSRS", tell J...

Train=Test is not related to Evaluate Parameters=Evaluate FSRS, I don't know why this restriction would be there @quasi shadow ?

#

Why not just mark the cards that were used for Training and keep Evaluate work on Tests set ?

polar maple Apr 28, 2025, 11:13 PM

#

@unique salmon btw to train on the test set its pretty much just 1 line of code in other.py

unique salmon Apr 28, 2025, 11:14 PM

#

polar maple if we implement `card_id % 10 == 0` we can keep the current Evaluate and it woul...

That evaluates on ~10% of all cards then

polar maple Apr 28, 2025, 11:14 PM

#

unique salmon That evaluates on ~10% of all cards then

this is just a starting suggestion

#

is 10% too much? too little?

unique salmon Apr 28, 2025, 11:14 PM

#

Too little IMO

#

20-25% is good

polar maple Apr 28, 2025, 11:15 PM

#

ok

unique salmon Apr 28, 2025, 11:15 PM

#

But the point is that we need some method Y that is used both in Anki and in the benchmark so that we gather data for the health check

#

If Anki uses method Y to calculate metrics and benchmark uses Z to calculate metrics, we can't do shit

unique salmon Apr 28, 2025, 11:16 PM

#

bold terrace Train=Test is not related to Evaluate Parameters=Evaluate FSRS, I don't know why...

This answers your question, I believe

cursive badge Apr 28, 2025, 11:17 PM

#

unique salmon But the point is that we need some method Y that is used both in Anki and in the...

Wouldn't the 5-way split benchmark values still be comparable even if we used the "mod card.id" in Anki? The issue is just that train=test ones are not.

unique salmon Apr 28, 2025, 11:17 PM

#

Whatever we do in Anki, we must also do in the benchmark so we can collect data that will be used to decide values for the health check

unique salmon Apr 28, 2025, 11:18 PM

#

cursive badge Wouldn't the 5-way split benchmark values still be comparable even if we used th...

Not quite
@polar maple I don't think it would?

cursive badge Apr 28, 2025, 11:19 PM

#

Something is going very weird with my keyboard. It is hard to type 😕

unique salmon Apr 28, 2025, 11:19 PM

#

Mod would choose cards randomly, whereas in the 5-way split they are not chosen randomly

cursive badge Apr 28, 2025, 11:24 PM

#

It might not be perfect. I was assuming that train=test could have very different values, but different methods that do not mix train and test would have similar RMSE/log loss.

#

My laptop keyboard is very unhappy.

polar maple Apr 28, 2025, 11:28 PM

#

cursive badge It might not be perfect. I was assuming that train=test could have very differen...

I think the mod 10 version would be expected to get a lower loss than a time split for a similar reason as the s&p500 example

#

for example if a new user does a bunch of new cards at day 1 and then reviews them at day 100, passing all of them

#

then a mod 2 split would easily get a zero loss

#

but splitting by time in half would get a high loss since it would basically be fsrs default params

unique salmon Apr 28, 2025, 11:30 PM

#

Man, I'm just dreaming of a nice world where everyone voted to remove Evaluate...

#

https://tenor.com/view/unicorn-castle-rainbow-gif-20862503

Tenor

polar maple Apr 28, 2025, 11:31 PM

#

my view is evaluate parameters is fine if we're not evaluating on the training set

unique salmon Apr 28, 2025, 11:33 PM

#

Well, I guess tomorrow we'll see Sound and Oromit debating with Jarett

tepid spoke Apr 28, 2025, 11:33 PM

#

I do not have any kind of strong attachment to the Evaluate button

#

It just seems pointless to me if it doesn't evaluate the parameters

unique salmon Apr 28, 2025, 11:34 PM

#

Just Sound then 🤣

tepid spoke Apr 28, 2025, 11:34 PM

#

I did vote to keep it, since it's "nice enough to have"

#

but if it causes trouble like that, meh

unique salmon Apr 28, 2025, 11:34 PM

#

tepid spoke It just seems pointless to me if it doesn't evaluate the parameters

I think it's more useful to evaluate the algorithm itself, it's just that it's hard to do this in a way that isn't confusing as hell to the average user

tepid spoke Apr 28, 2025, 11:35 PM

#

The few times I used it was to check how the values looked for my manually tuned parameters

#

to make sure I didn't make a horrible mistake

cursive badge Apr 28, 2025, 11:35 PM

#

Maybe I'll dig out my Health Check PoC and see if I can create something I don't hate 😅

tepid spoke Apr 28, 2025, 11:35 PM

#

But that's such a niche case, it hardly matters

unique salmon Apr 28, 2025, 11:35 PM

#

Fair, in your case old Evaluate is more useful

tepid spoke Apr 28, 2025, 11:43 PM

#

With the Simulator being a thing now, it's a better cross-check anyway

quasi shadow Apr 29, 2025, 1:46 AM

#

unique salmon Well, I guess tomorrow we'll see Sound and Oromit debating with Jarett

I don't want to debate. I'm convinced by Alex.

#

You cannot know the performance of any sets of parameters on unseen data (the future reviews) which is actually important in practice.

polar maple Apr 29, 2025, 5:28 AM

#

for a health check could we just compare FSRS-6 with adaptable params vs FSRS-6 with default params using the 5-way split? it seems that the proportion of users is significant (15.7% do better with default params)
https://github.com/open-spaced-repetition/srs-benchmark/blob/main/plots/Superiority-9999.png
@quasi shadow is FSRS-6 def params with the joint optimization params? if not then this value might be even higher than 15.7%

#

but this is exactly why a train/test split is important, likely 99.9% of users would have a better training loss with adaptable params but the actual benefit is not necessarily that high

quasi shadow Apr 29, 2025, 6:05 AM

#

polar maple for a health check could we just compare FSRS-6 with adaptable params vs FSRS-6 ...

The FSRS-6 def params are the median params because we haven't found a method to generate reasonable default parameters.

quasi shadow Apr 29, 2025, 6:44 AM

#

polar maple for a health check could we just compare FSRS-6 with adaptable params vs FSRS-6 ...

I think it still doesn't solve the complaint.

#

Btw, what could we do if the health check's result is bad?

quasi shadow Apr 29, 2025, 6:48 AM

#

polar maple for a health check could we just compare FSRS-6 with adaptable params vs FSRS-6 ...

If I understand it correctly, we should use the default params if this kind of health check shows bad result.

#

Even if the log loss of adaptable params is better than default params.

bold terrace Apr 29, 2025, 7:47 AM

#

unique salmon This answers your question, I believe

Hmmm not really. Anki and Benchmark using both the same method (Training/Test) doesn't imply being unable to run Evaluate on Parameters.
It's like saying that : evaluate(optimize(data), test_data) implies that **evaluate(user_defined, test_data) **is not possible

quasi shadow Apr 29, 2025, 8:13 AM

#

bold terrace Hmmm not really. Anki and Benchmark using both the same method (Training/Test) d...

But is it beneficial to evaluate parameters?

#

If I understand it correctly, you want to know how well the user defined parameters perform on current data.

#

But the benchmarking method evaluate how well the optimization performs in the future.

#

Let's reframe this question: how well the user defined parameters perform? -> how well the manual optimization by user perform?

#

If we want to compare the built-in optimization with user's optimization fairly, the user should also don't optimize the parameters based on the metrics from the test data.

#

But it's tricky.

#

It's tempting to optimize the parameters based on test data.

bold terrace Apr 29, 2025, 8:46 AM

#

quasi shadow Let's reframe this question: how well the user defined parameters perform? -> ho...

Ah OK I see now a very good point that justify your thoughts, here's what I think about it :

To be fully consistent, it's true that the user-defined parameters should also be evaluated against the same Test-Set than the optimized one, otherwise the User might have the feeling it gets better result with its parameters when in fact, it's just taht when he defines his params, it runs on the training data and thus he might get better result, which is not great.

But if we mark the data that were used for Training, and only perform Evaluate on the Test Set (excluding the Training Set), then both User defined and Optimized Params will complete on the same "unseen" data.

#

If doing such exclusion is difficult for now / not feasible within Anki, then it would be probably better to rework slightly the menu to make clear that user defined params are the responsabiltiy of the user... for example :

Put a warning if the params have been changed by the user
Instead of having a "Evaluate" button, just having the logloss/RMSE of the optimization written to make clear that it's computed only when optimized and represent only optimized parameters, not user defined one
If we keep a evaluate for user defined users and do not exclude the training set, some warning to notify that he might get better evaluation but it's "cheating"since it's not splitting Training/Test set

unique salmon Apr 29, 2025, 9:12 AM

#

bold terrace **If doing such exclusion is difficult for now / not feasible within Anki**, the...

Evaluate button has to stay though

#

If we keep the current Evaluate behavior, then @quasi shadow needs to implement train = test in the benchmark. Alex said it would be easy

quasi shadow Apr 29, 2025, 10:10 AM

#

unique salmon If we keep the current Evaluate behavior, then <@449662392314494987> needs to im...

PR is welcome.

#

I’m training another model now.

unique salmon Apr 29, 2025, 10:49 AM

#

quasi shadow PR is welcome.

https://github.com/open-spaced-repetition/srs-benchmark/pull/210

GitHub

Train = test by Expertium · Pull Request #210 · open-spaced-repet...

#

@ashen light challenge accepted 😎

#

We'll see if Jarrett merges this PR or finds any issues

clever cargo Apr 29, 2025, 10:54 AM

#

unique salmon <@135651514298400769> challenge accepted 😎

extension to the challenge: merge it on gemini's review alone

quasi shadow Apr 29, 2025, 11:17 AM

#

unique salmon https://github.com/open-spaced-repetition/srs-benchmark/pull/210

I mean the result file should be included…

#

My device is busy.

unique salmon Apr 29, 2025, 11:54 AM

#

quasi shadow I mean the result file should be included…

Do you mean me running this code and giving you a .jsonl file?

quasi shadow Apr 29, 2025, 11:54 AM

#

yep, please include it in the PR

unique salmon Apr 29, 2025, 11:55 AM

#

Ok

#

🤣

#

welp

#

Hopefully that's easy to fix

#

Not this error again 😭

#

How many times have I gotten it...

#

If I could see dreams, I would see "not enough values to unpack" in my nightmares

#

@ashen light if AI wrote 60 lines of code and I wrote 1, does that count? 🤣

#

Alright, so now I have to run FSRS-6 on 10k users
See you guys in 30 70 hours, lol

clever cargo Apr 29, 2025, 12:29 PM

#

unique salmon If I could see dreams, I would see "not enough values to unpack" in my nightmare...

you dont visually dream?

unique salmon Apr 29, 2025, 12:29 PM

#

clever cargo you dont visually dream?

Nope

clever cargo Apr 29, 2025, 12:29 PM

#

damn

#

is that what they call aphantasia or smth

unique salmon Apr 29, 2025, 12:30 PM

#

clever cargo is that what they call aphantasia or smth

No, that's inability to imagine things in your head

#

Like, imagine a spinning apple or something

#

I can do that

#

@wind palm @hasty fractal @cursive badge @cosmic hedge I hate to say this, but it's time to debate Evaluate. Again.
In order to implement the health check, the way the metrics are calculated has to be consistent between the benchmark and Anki, otherwise we can't collect the necessary data. Currently, that's not the case. There are 2 ways to fix this, both are technically doable:

Implement a training data/testing data split in Anki and instead of evaluating parameters, evaluate FSRS. What this means in practice is that the Evaluate numbers won't depend on your current FSRS parameters. This will also make Evaluate as slow as Optimize, since it has to do an optimization.
Implement train set = test set in the benchmark. Then I'll run FSRS-6 this way (I'm doing it right now, actually), and then we can keep the current behavior of Evaluate

So either Evaluate evaluates a specific set of parameters, like now, or it evaluates FSRS's ability to perform well on unseen data, like Jarrett and Alex want

#

Currently in favor of 1: Jarrett, Alex, Luc
Currently in favor of 2: Sound (kind of), Oromit
Currently in favor of "please god let's just get this over with just choose whichever": me

bold terrace Apr 29, 2025, 12:46 PM

#

Implement data/test data split, but find a way to evaluate user-defined parameters in Anki (Evaluate) that would run only on test set (All, excluding the cards marked as "trained_set" during the optimize), just like the result of optimize

#

I'm not for 1 neither for 2

unique salmon Apr 29, 2025, 12:49 PM

#

bold terrace 3) Implement data/test data split, but find a way to evaluate user-defined param...

You can't. The whole point of the test set is that you do NOT optimize parameters on it. The moment you allow people to tweak parameters to see how it affects the metrics on the test set, it ceases to be a test set

bold terrace Apr 29, 2025, 12:52 PM

#

unique salmon You can't. The whole point of the test set is that you do NOT optimize parameter...

I'm not saying optimizing the parameters on it, I say evaluating the cost function on it

#

Basically :
evaluate(optimize(train_set).parameters, test_set)
evaluate(user_defined.parameters, test_set)

unique salmon Apr 29, 2025, 12:52 PM

#

bold terrace I'm not saying optimizing the parameters on it, I say evaluating the cost functi...

If manual tweaking is allowed, it still defeats the point

unique salmon Apr 29, 2025, 12:52 PM

#

bold terrace Basically : evaluate(optimize(train_set).parameters, test_set) evaluate(user_def...

Yes, that defeats the point of the test set

bold terrace Apr 29, 2025, 12:52 PM

#

Why though

unique salmon Apr 29, 2025, 12:53 PM

#

Test set is for evaluating how well the algorithm performs on data that it was not trained on

bold terrace Apr 29, 2025, 12:53 PM

#

Yep that's how it's used in

evaluate(optimize(train_set).parameters, test_set)
evaluate(user_defined.parameters, test_set)

unique salmon Apr 29, 2025, 12:53 PM

#

If you tweak the parameters to get lower logloss/RMSE, you "train" the algorithm on the test set

bold terrace Apr 29, 2025, 12:53 PM

#

Only to evaluate the output of the algorithm

bold terrace Apr 29, 2025, 12:54 PM

#

bold terrace **If doing such exclusion is difficult for now / not feasible within Anki**, the...

I addressed taht point in the third bullet

#

Basically if an user goes that far, he would also be able to just train his parameters on the full set 🤷 , basically he would be hacking his way

#

At least with

evaluate(optimize(train_set).parameters, test_set)
evaluate(user_defined.parameters, test_set)

You allow it to tweak slightly the optimized version and see if it doesn't get too bad by changing for example his initial stabilities

#

Of course, if he use that door to train parameters on the whole set, shame on him

#

But by default it would not be the case

#

So the complexity comes to : How to make sure Evaluate doesn't cheat, while still being able to be modified for whatever reasons ? The answer is then, hide the training set to it

#

I just imagine Dae's face with all those flags/parameters in the custom_data of the cards 😆

cosmic hedge Apr 29, 2025, 12:58 PM

#

unique salmon <@820710428081389599> <@1229405045674741790> <@347088848854974465> <@38806999266...

I'll """vote""" 1 #1282005522513530952 message because he's already decided.

unique salmon Apr 29, 2025, 12:58 PM

#

I guess we could add a stern warning to not tweak parameters manually. But then we'll have to implement a third method, like the mod card ID proposed by Alex, in both Anki AND the benchmark 😭

#

The third method of splitting data into train/test, I mean

bold terrace Apr 29, 2025, 12:59 PM

#

Shouldn't be that hard right

#

I mean there is certainly a part in the code where you divide the code into X/1-X

#

you make it take a custom function "f_partition(card, card_index)", one would be based on "first 80%", and the other "mod N"

unique salmon Apr 29, 2025, 1:00 PM

#

Jarrett already made his PR to Anki and FSRS-rs and I already made mine to the benchmark 🤣
God this is a mess

bold terrace Apr 29, 2025, 1:00 PM

#

You're a bit too intense, chill down and wait. No wonder Dae doesn't involve himself that much in those discussion

#

FSRS is already good enough, a few more days wont hurt

cosmic hedge Apr 29, 2025, 1:02 PM

#

2 buttons
"evaluate fsrs"
"evaluate current parameters" (burried somewhere)
god save us all

unique salmon Apr 29, 2025, 1:02 PM

#

https://tenor.com/view/office-no-gif-26506049

Tenor

bold terrace Apr 29, 2025, 1:02 PM

#

cosmic hedge 2 buttons "evaluate fsrs" "evaluate current parameters" (burried somewhere) god ...

TBH not that bad of an idea

#

I mean

#

Maybe some people want to use the full training set

unique salmon Apr 29, 2025, 1:02 PM

#

That's confusing as hell

bold terrace Apr 29, 2025, 1:02 PM

#

Losing 20% review when you have 100 reviews is a lot

unique salmon Apr 29, 2025, 1:02 PM

#

Think of the average user trying to understand it

#

Serious question: why are we trying to expose more of FSRS rather than hiding it? Ideally, the only setting should be desired retention, that's it

bold terrace Apr 29, 2025, 1:03 PM

#

Or a switch "Use Train/Test split or just use Whole set for training ? /!\ This means you're cheating the ability of predicting unseen cards"

bold terrace Apr 29, 2025, 1:04 PM

#

unique salmon Serious question: why are we trying to *expose* more of FSRS rather than *hiding...

I mean, I'm not completely against removing parameters, evaluate from the Deck Options screen, but FSRS should be a bit more trusted for people first

#

Between the ideal world of FSRS in the benchmark and what people observe, I don't blame people not willing to lose control

#

But I agree ideally it should be hidden

#

It's just not mature enough really to be

#

But anyway

#

Let the people actually coding decide 😆

#

And if I really want my

evaluate(optimize(train_set).parameters, test_set)
evaluate(user_defined.parameters, test_set)

I'll PR it in a few months 😆

ashen light Apr 29, 2025, 1:52 PM

#

unique salmon We'll see if Jarrett merges this PR or finds any issues

jarrett isn't you though

ashen light Apr 29, 2025, 1:53 PM

#

unique salmon <@135651514298400769> if AI wrote 60 lines of code and I wrote 1, does that coun...

I guess

unique salmon Apr 29, 2025, 1:55 PM

#

ashen light jarrett isn't you though

Yeah, but he would just approve my PR, without making changes

ashen light Apr 29, 2025, 1:58 PM

#

¯_(ツ)_/¯

#

I'm out if date on the latest fsrs meta theres just too much talking

unique salmon Apr 29, 2025, 1:59 PM

#

ashen light I'm out if date on the latest fsrs meta theres just too much talking

#1282005522513530952 message

ashen light Apr 29, 2025, 2:00 PM

#

I'm glad I'm no longer on the evaluate mailing list

#

but literally just stop talking about it till dae says it's reasonable, whatever it is yall are doing

unique salmon Apr 29, 2025, 2:01 PM

#

ashen light but literally just stop talking about it till dae says it's reasonable, whatever...

He already greenlit the "health check"

ashen light Apr 29, 2025, 2:01 PM

#

oh cool

unique salmon Apr 29, 2025, 2:06 PM

#

ashen light oh cool

But health check requires data, and data requires using the same method both in Anki and in the benchmark for calculating log loss and RMSE, so...here we are

ashen light Apr 29, 2025, 2:09 PM

#

well, have fun

polar maple Apr 29, 2025, 3:33 PM

#

@unique salmon
i hooked up FSRS-6 to optimize on the entire revlog for logloss & rmse (bins) separately but only on the reviews that are evaluated on in srs-benchmark,
logloss: https://pastebin.com/c9c1WniH
rmse (bins): https://pastebin.com/JCkZZZtA

Pastebin

log loss - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

Pastebin

rmse (bins) - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

unique salmon Apr 29, 2025, 3:34 PM

#

polar maple <@530106856593424407> i hooked up FSRS-6 to optimize on the entire revlog for l...

Is this on each individual user or on their combined revlogs?

polar maple Apr 29, 2025, 3:34 PM

#

unique salmon Is this on each individual user or on their combined revlogs?

individual user and with no 5-way splits

#

it was to simulate how much choosing the best rmse (bins) params within anki might affect the result

#

but recently jarrett changed it to log loss

unique salmon Apr 29, 2025, 3:35 PM

#

I want you to try this on a combined revlog to see if the weirdness with parameters persists

polar maple Apr 29, 2025, 3:36 PM

#

how are the params weird?

unique salmon Apr 29, 2025, 3:36 PM

#

I mean w[16] becoming 1.0
I want to see if it persists with both loss functions

#

On a big combined revlog

polar maple Apr 29, 2025, 3:37 PM

#

we already know the answer for log loss

unique salmon Apr 29, 2025, 3:37 PM

#

polar maple we already know the answer for log loss

Time to try RMSE

polar maple Apr 29, 2025, 3:38 PM

#

for rmse (bins) even if its not 1.0 we still wouldn't use it

unique salmon Apr 29, 2025, 3:38 PM

#

polar maple for rmse (bins) even if its not 1.0 we still wouldn't use it

Why not? If those parameters outperform the current median parameters as default parameters, then why no?

polar maple Apr 29, 2025, 3:39 PM

#

unique salmon Why not? If those parameters outperform the current median parameters as default...

because rmse (bins) is a metric that shouldn't be directly optimized for

#

better to just fix w[16] to 1.5 and optimize logloss around it or something

unique salmon Apr 29, 2025, 3:40 PM

#

polar maple because rmse (bins) is a metric that shouldn't be directly optimized for

why

polar maple Apr 29, 2025, 3:40 PM

#

do i have to repeat my rants against rmse (bins)?

unique salmon Apr 29, 2025, 3:41 PM

#

https://tenor.com/view/ron-pearlman-the-goon-yes-yep-anchorman-gif-12449331

Tenor

polar maple Apr 29, 2025, 3:43 PM

#

rmse (bins) does not push the model predict as best as it can

polar maple Apr 29, 2025, 3:49 PM

#

bold terrace 3) Implement data/test data split, but find a way to evaluate user-defined param...

i would be fine with this but you might get users who tweak their parameters a bit and it shows a huge drop loss in test loss and then throw complaints about the efficacy of FSRS

unique salmon Apr 29, 2025, 3:58 PM

#

polar maple i would be fine with this but you might get users who tweak their parameters a b...

We could add a "Do not tweak your FSRS parameters manually to try to bring the values of log loss and RMSE down" warning or something

polar maple Apr 29, 2025, 4:04 PM

#

also new parameters from an optimization are kept only if they improve on the training loss, it could still get worse on the test loss and evaluate would show a worse result

#

but if we insist that the new parameters should also do better on the test set then this is just training on the test set

unique salmon Apr 29, 2025, 4:10 PM

#

I feel like we're screwed either way

Evaluate evaluates parameters - people start tweaking parameters on the test set and then complaining that the optimizer is garbage because they have gotten "better" parameters manually
Evaluate evaluates FSRS - people start asking why changing their current parameters doesn't affect Evaluate

bold terrace Apr 29, 2025, 4:11 PM

#

polar maple i would be fine with this but you might get users who tweak their parameters a b...

But at least they could see that it's their tweaking that had the adverse effect 🙂

#

Another option : Only allow to change the initial stability parameters

unique salmon Apr 29, 2025, 4:15 PM

#

That's weird

bold terrace Apr 29, 2025, 4:15 PM

#

Would be interesting to see what people tweak

#

Personally I never tweaked anything but I would imagine people only tweak initial stabilty ?

#

Making the params read only would solve most issues I guess

#

If Evaluate is already a bit difficult to interpret, imagine those parameters !

#

Could be present in some "Get Troubleshooting parameters / logs" hidden somewhere

#

My whole point with the evaluate(user_defined.paremeters, test_set) is only there because I was trying to find ways to keep the parameters tweak for all users

#

Buuuut I don't personally think anyone should tweak them

#

and if they tweak initial stability, It's not that much of a big deal in terms of logloss/rmse loss

quasi shadow Apr 29, 2025, 4:28 PM

#

unique salmon Why not? If those parameters outperform the current median parameters as default...

Because you can decrease the RMSE(bins) but increase the log loss in the same time.

robust hill Apr 29, 2025, 5:05 PM

#

compute optimal retention be like: always 70% 🔥

unique salmon Apr 29, 2025, 6:10 PM

#

cosmic hedge 2 buttons "evaluate fsrs" "evaluate current parameters" (burried somewhere) god ...

Sorry for being annoying, but I really want to see the results of CMRR with the integral, with different values of offset of whatever you want to call it
We need to decide whether we're keeping CMRR or not

#

#1282005522513530952 message

wind palm Apr 29, 2025, 7:24 PM

#

unique salmon <@820710428081389599> <@1229405045674741790> <@347088848854974465> <@38806999266...

I don't understand how #1 would have any value for a user. If you're not evaluating their parameters, what's the point?
So I vote for #2.

[See also: I think the health-check seems silly, and I am dreading it being launched with insufficient testing, because it will be a support nightmare. So I won't be in favor of anything that would reduce the utility of Evaluate to make health-check work.]

unique salmon Apr 29, 2025, 7:24 PM

#

wind palm I don't understand how #1 would have any value for a user. If you're not evaluat...

@polar maple

robust hill Apr 29, 2025, 7:26 PM

#

the exam is in 50 days

#

i was thinking about it now

#

if i should maek a priority to have like

#

75% desired retention and try to crank out

#

1750 cards in the span of 10 days

#

because it takes around 10 seconds per card and i see them around 3.4 times to turn into young

#

almost like 16 hours of new crads

#

also when exactly does cmrr give me more than 70%

#

😭

#

btw how is this graph linear?

you-dont-understand-retention-in-fsrs-v0-a2t3l6x9eh9e1.png

#

shouldnt it be exponential

#

ah nvm

unique salmon Apr 29, 2025, 7:32 PM

#

robust hill btw how is this graph linear?

It's not 100% linear
Also, it's different for different users with FSRS-6

robust hill Apr 29, 2025, 7:32 PM

#

can i make my own graph with my own parameters

#

i know there was a github link somewhere

unique salmon Apr 29, 2025, 7:32 PM

#

Nope

robust hill Apr 29, 2025, 7:32 PM

#

wasnt there one for this

#

the desired retention x workload

unique salmon Apr 29, 2025, 7:32 PM

#

Well, I can, but I never shared the code

unique salmon Apr 29, 2025, 7:32 PM

#

robust hill the desired retention x workload

That's a different one

robust hill Apr 29, 2025, 7:33 PM

#

is there a

#

average retrievability to desired retenton workload

#

like

#

i want to see average retrievability to workload time

unique salmon Apr 29, 2025, 7:33 PM

#

https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/v5.3.3/fsrs4anki_optimizer.ipynb
But it doesn't support FSRS-6 yet

Google Colab

robust hill Apr 29, 2025, 7:33 PM

#

like this but for average retrievability

#

or not possible

unique salmon Apr 29, 2025, 7:34 PM

#

Yes, you can get a graph like that using the Google Colab optimizer, but again, for now it uses FSRS-5

#

And I don't see why you would want average R instead of DR on the x axis

robust hill Apr 29, 2025, 7:34 PM

#

just need it to explain something to a friend

#

do i folow the steps from the very beginning of the link

unique salmon Apr 29, 2025, 7:34 PM

#

Explain it using this graph, lol

robust hill Apr 29, 2025, 7:35 PM

#

unique salmon Apr 29, 2025, 7:35 PM

#

robust hill do i folow the steps from the very beginning of the link

Yeah, it spells everything out, just follow instructions

robust hill Apr 29, 2025, 7:35 PM

#

thank u

#

i am one of those special people who need step by step instructions for each step 🔥

polar maple Apr 29, 2025, 7:36 PM

#

wind palm I don't understand how #1 would have any value for a user. If you're not evaluat...

the problem is that the way we currently implement Evaluate is a big no-no in data science. It's currently implemented as where the parameters are evaluated on the same data that it was trained on, but the proper way would be to evaluate the parameters on unseen data. You might be interested in Sound's 3) option

robust hill Apr 29, 2025, 7:40 PM

#

my question is

#

how do i choose the exact parameter i want it to make it

#

do i just have to upload the deck

#

instead of collection

unique salmon Apr 29, 2025, 7:40 PM

#

robust hill how do i choose the exact parameter i want it to make it

?

robust hill Apr 29, 2025, 7:40 PM

#

cause i sent in my collection

unique salmon Apr 29, 2025, 7:41 PM

#

You can upload a single deck, if that's what you're asking

robust hill Apr 29, 2025, 7:41 PM

#

well cause like my collection has a lot of deck options yk with different parameters

#

and i want to choose a specific deck option parameter

#

idk im trying to see

unique salmon Apr 29, 2025, 7:42 PM

#

robust hill and i want to choose a specific deck option parameter

I still don't know what you mean

robust hill Apr 29, 2025, 7:43 PM

#

i have a collection with 5 main deck options:
physiology
biochem
etc etc

#

i wanna see the graph for the physiology deck options only

#

not combination of everything

unique salmon Apr 29, 2025, 7:44 PM

#

Put all decks that have the "Physiology" preset into one big deck and export that

robust hill Apr 29, 2025, 7:44 PM

#

alright sounds good

#

#

my personal parameters 💀

wind palm Apr 29, 2025, 10:16 PM

#

polar maple the problem is that the way we currently implement Evaluate is a big no-no in da...

Hasn't it always been understood that this is bad science to a certain extent? We're asking FSRS to tell us how good of a job it is doing -- like a self-reflection grade. FSRS answers, "when I use this memory model [which I came up with by looking at your review history] on your review history, my predictions are wrong X% of the time." It's not good data science, but it is a good test of whether the model is matching the user (or at least whether FSRS thinks its model is matching the user).

Testing the user's parameters against another user's data seems less helpful. The answer back from FSRS would be, "when I use this memory model on someone's else's review history, my predictions are wrong X% of the time." That seems like a measure of whether the model matches someone else. Why would a user care about that?

You might be interested in Sound's 3) option
Is that this?

Implement data/test data split, but find a way to evaluate user-defined parameters in Anki (Evaluate) that would run only on test set (All, excluding the cards marked as "trained_set" during the optimize), just like the result of optimize #1282005522513530952 message
Unfortunately, I have no idea what any of that means. 😅

bold terrace Apr 29, 2025, 10:37 PM

#

It means that splitting or not the whole set into a training/test set is not a reason to not evaluate parameters based on user defined parameters.

If the Evaluate button is doing : evaluate=evaluate(optimize(training_set).parameters, test_set), you can also do evaluate=evaluate(user_defined.parameters, test_set).

Of course, it means :

Having to store what was in the training set, to exclude it from Evaluate when done later.
Warning the user that he could get better logloss and/or RMSE by tweaking his parameters, but because he would break the whole "You train on train set, you evaluate on test set"

bold terrace Apr 29, 2025, 10:43 PM

#

wind palm Hasn't it always been understood that this is bad science to a certain extent? W...

It still make sense to split because without split, the optimization might overfit what it sees but would fail miserably to generalize it to new data. Maybe it's not that much a problem with FSRS in the first place since the forgetting curve has well defined properties, but for any scheduling algorithm using things like Neural Network, if you have enough parameters, you could have very over-specific rule like (If the card has been reviewd 4 times, and the last one was on saturday, the stability will be 6d), just because it saw one or two card in that setup.

But if you split train/set for one, you need to split train/set for all, or you're not comparing models with the same set of rules.

#

For example, right now I trained and test my parameters on one my deck, I get Evaluate :
Log loss: 0.4024, RMSE(bins): 3.15%. Smaller numbers indicate a better fit to your review history.

Now, I train it on a a subset, my training set
Log loss: 0.3512, RMSE(bins): 2.94%. Smaller numbers indicate a better fit to your review history.

I use it on my testing set :
Log loss: 0.4700, RMSE(bins): 12.68%. Smaller numbers indicate a better fit to your review history.
It performs much worse than than the first result, which is a sign the first evaluate was good only because the model did train on a non representative class of cards

If my testing set had an optimization made on it directly, I would could have gotten :
Log loss: 0.4138, RMSE(bins): 3.80%. Smaller numbers indicate a better fit to your review history.

So the difference between those 2 results, show that optimizing and testing on it, I was able to get way better precision, but by cheating since I now the on what I'll be tested

unique salmon Apr 29, 2025, 10:52 PM

#

wind palm Hasn't it always been understood that this is bad science to a certain extent? W...

It's helpful though. I've made a function to predict RMSE given the number of reviews and average retention, and we can compare that approximate value to the real RMSE of the user to find out whether he is doing well for his "weight class"
It's like "For your height and weight, your blood pressure is pretty good", if that analogy helps
The real problem is: what do we tell users with "poor" "health"? If someone's RMSE is way higher than what is expected given their retention and n(reviews), what should Evaluate display?
Just "Poor"? Users will complain
A list of possible explanations and advice? Users won't read it and then will complain anyway

bold terrace Apr 29, 2025, 10:52 PM

#

(In this case I did the partitioning based on High/Low D so of course the diff is enormous, but if the partitioning is done smartly, like card.id % 10 or something, it should be hopefully less)

unique salmon Apr 29, 2025, 10:56 PM

#

Basically, Sound is saying "Let's use all cards whose ID ends with a zero for testing, the rest for training"

bold terrace Apr 29, 2025, 10:57 PM

#

Having such a rule would also make it super easy to know what is part of the Training set and what's not, no need to flag 🤔

#

But I have no idea how the card fields are populated and if card.id mod N is that to get well randomized partitions ...

unique salmon Apr 29, 2025, 10:58 PM

#

It's Unix timestamps

cursive badge Apr 29, 2025, 10:59 PM

#

I think card ID is the epoch ms it was created

unique salmon Apr 29, 2025, 10:59 PM

#

Yeah, it's milliseconds elapsed since 01.01.1970 or something

bold terrace Apr 29, 2025, 11:00 PM

#

maybe some chaos_function(card.id) mod N would be better

unique salmon Apr 29, 2025, 11:00 PM

#

The last digit is as good as random

bold terrace Apr 29, 2025, 11:00 PM

#

I guess yes

cursive badge Apr 29, 2025, 11:00 PM

#

It doesn't guarantee you have exactly x% sets, I just gave it an example of one way to get a "stable" test set.

bold terrace Apr 29, 2025, 11:01 PM

#

I know they say to not create random function based on epoch because if you loop when generating those, you'll get very obvious patterns based on CPU cycles. But here we're talking human creation

unique salmon Apr 29, 2025, 11:01 PM

#

bold terrace maybe some `chaos_function(card.id) mod N` would be better

Nah, I don't think there are any patterns at the millisecond scale

cursive badge Apr 29, 2025, 11:01 PM

#

bold terrace I know they say to not create random function based on epoch because if you loop...

There may be a pattern because of multiple cards being generated at the same time from a single note.

#

Or from bulk importing notes/cards

bold terrace Apr 29, 2025, 11:04 PM

#

#

At least to me it looks pretty good

#

SELECT id % 20 AS mod_result, COUNT(*)
FROM cards GROUP BY mod_result

In SQL querier

tepid spoke Apr 29, 2025, 11:05 PM

#

hm, I wonder how the card IDs for my deck look like

#

Cause it initially came to life as import from a CSV file

bold terrace Apr 29, 2025, 11:05 PM

#

You could test it yes

#

Would be interesting to see

tepid spoke Apr 29, 2025, 11:05 PM

#

that import took less than a second

bold terrace Apr 29, 2025, 11:06 PM

#

50% of my cards are also one-shot imported though

#

But it's on millis so even an import would be spreaded normally evenly

cursive badge Apr 29, 2025, 11:07 PM

#

I think you would have to be unlucky for the import loop to match up with the n you choose , but it could be possible.

bold terrace Apr 29, 2025, 11:07 PM

#

Yeah with the training set 10k we could check if no collection really diverge too much

tepid spoke Apr 29, 2025, 11:07 PM

#

1675618557059
1675618557215
1675618559833
1675618567127
1675618567137

#

are some example card IDs

#

so did it just count up when collisions happened?

#

The card IDs are in perfectly ascending order with the WaniKani sort ID

bold terrace Apr 29, 2025, 11:10 PM

#

sqlite3 ~/Library/Application\ Support/Anki2/User\ 1/collection.anki2 "SELECT id % 20 AS mod_result, COUNT(*) FROM cards GROUP BY mod_result;"

#

(The 'User\ 1" might need to be adapted obviously, or the path alltogether)

unique salmon Apr 29, 2025, 11:11 PM

#

wind palm Hasn't it always been understood that this is bad science to a certain extent? W...

For the health check we have to compare the user's metrics to the values of other users, one way or another, to determine whether this user is doing relatively well or not. There is no absolute standard. Like, you can't say whether 5% RMSE with FSRS-6 is good or not without knowing the values for a ton of users

#

I think you want an absolute standard, not a relative one

#

You want a standard that does not depend on data from other users

#

But we can't do that. Well, we can, but it would be arbitrary
We could just say "RMSE above 10% is bad", without looking at RMSE from lots of users, but that would be kinda dumb

bold terrace Apr 29, 2025, 11:14 PM

#

LOL

#

I asked GPT

#

For a chaos function

#

he gave me

#

SELECT abs((id * 2654435761) % 4294967296) % 20 AS chaos_mod, COUNT(*) FROM cards GROUP BY chaos_mod;

#

Result ?

#

#

WWhat the hell went wrong there 😆

unique salmon Apr 29, 2025, 11:15 PM

#

Google "hash function"

#

That's what you're looking for

bold terrace Apr 29, 2025, 11:15 PM

#

Are hash stricl speaking chaos function ?

#

but it's true here I merely want those to be distributed

cursive badge Apr 29, 2025, 11:16 PM

#

Not intrinsically. But there must be some good, fast, uniform ones used for hash tables.

#

Cryptographic hash functions would probably be bad because they are deliberately slow.

bold terrace Apr 29, 2025, 11:18 PM

#

Boah

#

Might not be necessary

#

I see there's none builtin in sqlite

#

#

SELECT id % 20 AS mod_result, COUNT(*) AS count, ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM cards), 2) AS percent, ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM cards) - 5.0, 2) AS percent_deviation FROM cards GROUP BY mod_result;

#

For SQL, GPT is quite useful 😆

#

Well I'm not sure it was worth it to do a fancy 5-percent for percent_deviation LOL

#

I also see he hardcoded it

#

super clean code 😆

#

But yeah, seems mod Nis more than good enough if don't want to do flagging 🤷

unique salmon Apr 29, 2025, 11:29 PM

#

wind palm Hasn't it always been understood that this is bad science to a certain extent? W...

Btw, while I would prefer removing Evaluate, I think the health check is a step in the right direction: instead of having to wrestle with completely abstract numbers, users will see a nice colorful scale that tells them in plain English whether their numbers are good or not

#

The numbers will still be displayed, just for reference

#

I am repeating myself, but the only real problem is what to do with people who fall in the red zone.

#

FSRS doesn't have any kind of "emergency mode" or whatever

#

Like, there is no secret button to fix your shit

#

Well, I guess "Remedy Hard Misuse" is a bit like that

#

My point is that it's inevitable that some people will have crappy numbers. What's the course of action then?

cursive badge Apr 29, 2025, 11:38 PM

#

Someone nice writes a "Reasons why your FSRS evaluation might be bad and what you can do about it" page to put in the manual

#

🤷‍♂️

cosmic hedge Apr 29, 2025, 11:56 PM

#

unique salmon Sorry for being annoying, but I really want to see the results of CMRR with the ...

left is years, all 0.94

#

what was wrong with jarrets solution for this with the cost from retention btw?

wind palm Apr 30, 2025, 12:02 AM

#

bold terrace It still make sense to split because without split, the optimization might overf...

I can tell you did a great job of explaining it, but unfortunately I still don't get it. It's my deficiency, not yours.
I can't even ask clarifying questions, because I just have no idea what you're explaining to me.

Let's see if we can get there without me understanding it. -- You've seen my reasons for wanting #2.

Is your #3 a better measure than the current method for the user of how well FSRS is working for them, is matching their memory curve, is predicting the appropriate time to study their cards?
Can we still describe it that same way in general terms -- how well it's working, matching, predicting?
Will your #3 run nearly as fast as Evaluate does now?

wind palm Apr 30, 2025, 12:09 AM

#

unique salmon It's helpful though. I've made a function to predict RMSE given the number of re...

we can compare that approximate value to the real RMSE of the user to find out whether he is doing well for his "weight class"
Is that better than a simple lower is better? It feels like comparing it to a set scale is going to cause more trouble than number-goes-down=good, number-goes-up=bad.

We could just say "RMSE above 10% is bad", without looking at RMSE from lots of users, but that would be kinda dumb
Are we still using the same "working definition" (not entirely mathematically accurate, blah, blah) of RMSE? So isn't "FSRS makes mistakes scheduling 10% of your cards (or 10% of the time)" objectively bad? I don't need to compare to anyone else's results to figure that out.

cursive badge Apr 30, 2025, 12:11 AM

#

wind palm > we can compare that approximate value to the real RMSE of the user to find out...

Part of the problem is RMSE going down doesn't necessarily = good if the optimiser is cheating (overfitting)

wind palm Apr 30, 2025, 12:15 AM

#

cursive badge Part of the problem is RMSE going down doesn't necessarily = good if the optimis...

😖 I thought the RMSE-cheating problem got solved ages ago. https://github.com/open-spaced-repetition/fsrs4anki/wiki/The-Metric

cursive badge Apr 30, 2025, 12:16 AM

#

wind palm 😖 I thought the RMSE-cheating problem got solved ages ago. <https://github.com...

I haven't looked at that (before my time active here) but I assume that is cheating in another way because Evaluate still has the problem I'm talking about.

wind palm Apr 30, 2025, 12:20 AM

#

cursive badge I haven't looked at that (before my time active here) but I assume that is cheat...

Is there a way to stop the optimizer from cheating-overfitting?
~~How can a user tell if their optimization has the cheating-overfitting problem?~~ Better question -- Can a user do anything to avoid falling prey to cheating-overfitting parameters?

cursive badge Apr 30, 2025, 12:26 AM

#

wind palm Is there a way to stop the optimizer from cheating-overfitting? ~~How can a use...

The user cannot do it easily. You can do something like Sound did where you manually split things for training, but that's not something we should expect a normal user to do.

polar maple Apr 30, 2025, 12:34 AM

#

wind palm 😖 I thought the RMSE-cheating problem got solved ages ago. <https://github.com...

this is outdated, the new version is still cheatable

wind palm Apr 30, 2025, 12:57 AM

#

polar maple this is outdated, the new version is still cheatable

The way folks talk about this, it sounds like it's impossible to tell the optimizer not to do this -- don't use this cheat, don't overfit. Is that really not possible?

cursive badge Apr 30, 2025, 1:03 AM

#

wind palm The way folks talk about this, it sounds like it's impossible to tell the optimi...

Maybe a slightly different framing will help:

Imagine I want to teach you how to do addition, but I can only do it by showing you lots of examples e.g. "45 + 22 = 67"
I give you the big book full of examples and let you try to figure out the rules yourself.

Now I want to test how well you learned by asking you questions.
I ask you questions from the book and you do really well so I think my job here is done!
Unfortunately you cheated, you just memorised the examples from the big book, you didn't actually understand addition.
If you later encounter addition problems that were not in the book you do really badly.

This is the overfitting problem. I've taught you to be very good at repeating what you have seen before, but not the general rules that will let you solve novel problems in the future.

Imagine instead I only gave you 4/5 of the book to learn from but kept the last 1/5 of it for myself.
If I later test you using only questions from my part of the book that you have never seen I can get a better idea of if you really understand addition because you cannot have memorised the answers.

The downside to splitting the book is that you will have fewer examples to learn from, so may find it more difficult to learn the rules of addition in the first place. I'll be better at evaluating your performance but your performance might actually be worse (than if you did not cheat with the full book).

This problem of splitting data into train/test possibly reducing performance is why some (Jarrett?) like the idea of the "5 way split" Evaluate as seen in the benchmark:

You keep optimising with all the data and just hope that there is not too much overfitting.

You can get an idea of how well FSRS works in general on your data (but not your specific parameters) by splitting your data into 5 parts then training and testing 5 times choosing a different part as the "test" data each time and average the results.

#

(N.B. I have not checked if this last part is exactly how the benchmark does it)

polar maple Apr 30, 2025, 1:20 AM

#

wind palm The way folks talk about this, it sounds like it's impossible to tell the optimi...

rossgb's explanation is great. But just regarding RMSE (bins), it is cheatable in a different way than for what we mean when we are talking about Evaluate so RMSE isn't relevant in this context

cursive badge Apr 30, 2025, 1:21 AM

#

polar maple rossgb's explanation is great. But just regarding RMSE (bins), it is cheatable i...

The infamous RMSE-BINS-EXPLOIT. Best algorithm of them all 🤣

quasi shadow Apr 30, 2025, 2:04 AM

#

About the train/test split, here is a common practice: https://www.kaggle.com/c/home-data-for-ml-course/overview/frequently-asked-questions

Housing Prices Competition for Kaggle Learn Users

Apply what you learned in the Machine Learning course on Kaggle Learn alongside others in the course.

#

What’s the difference between a private and public leaderboard?
The Kaggle leaderboard has a public and private component to prevent participants from “overfitting” to the leaderboard. If your model is “overfit” to a dataset then it is not generalizable outside of the dataset you trained it on. This means that your model would have low accuracy on another sample of data taken from a similar dataset.

Public Leaderboard

For all participants, the same 50% of predictions from the test set are assigned to the public leaderboard. The score you see on the public leaderboard reflects your model’s accuracy on this portion of the test set.

Private Leaderboard

The other 50% of predictions from the test set are assigned to the private leaderboard. The private leaderboard is not visible to participants until the competition has concluded. At the end of a competition, we will reveal the private leaderboard so you can see your score on the other 50% of the test data. The scores on the private leaderboard are used to determine the competition winners. Getting Started competitions are run on a rolling timeline so the private leaderboard is never revealed.

#

😂 You can overfit to the public leaderboard by tuning your model based on the test score and get a bad rank in private leaderboard.

#

In the case of FSRS and @bold terrace's method #3, you can tune the parameters on test set even if it isn't used for training, and may get worse result in the future.

cursive badge Apr 30, 2025, 2:16 AM

#

You can also get into train-test-validation splits because you "taint" any data that you use to twiddle optimisation.
If you are doing good science™ the final evaluation must be on data that has never been used previously.

#

At least these are my memories of an undergraduate long ago 😅

quasi shadow Apr 30, 2025, 2:18 AM

#

So you could only evaluate the parameters in the next month with new data.

#

🤣

cursive badge Apr 30, 2025, 2:26 AM

#

This is why I got into nice deterministic simulations for my research. The evaluation was much simpler! 😅

quasi shadow Apr 30, 2025, 2:30 AM

#

In my view, the evaluation only makes sense when we search for a reproducible optimization method. Tuning the parameters by hand is unlikely reproducible.

#

😅 Feel unsatisfied about your parameters? Please challenge the SRS Benchmark!

cursive badge Apr 30, 2025, 2:33 AM

#

To be frank the moment you manually edit your params you are fully in "here be dragons" territory and should not expect any built-in help.

bold terrace Apr 30, 2025, 7:08 AM

#

Yeah agree and also wonder what people actually tweak. I'd make a bet that it's mostly the initial stability, but I have no proof

#

And I think most of the time, just to reduce the good/easy initial ones

0.1079, 0.8219, 3.3692, 31.2728, 7.2741, 0.4920, 2.0791, 0.0727, 1.3029, 0.2688, 0.8197, 1.8849, 0.0873, 0.3245, 2.3331, 0.0939, 3.2766, 0.7575, 0.3003, 0.0905, 0.1176
Log loss: 0.3512, RMSE(bins): 2.94%. Smaller numbers indicate a better fit to your review history.

If I tweak them because I fear long first intervals :

1.1079, 1.8219, 1.3692, 1.2728, 7.2741, 0.4920, 2.0791, 0.0727, 1.3029, 0.2688, 0.8197, 1.8849, 0.0873, 0.3245, 2.3331, 0.0939, 3.2766, 0.7575, 0.3003, 0.0905, 0.1176
Log loss: 0.3591, RMSE(bins): 4.21%. Smaller numbers indicate a better fit to your review history.

Soooo ... I'm not against putting them in read only and maybe for people actually tweaking them allowing them to still stipulate the 4 initial stab ? I don't know. I think Evaluate is useful to see how well the model is able ot predict your stuff, and how cool it is to copy your parameters in a visualizer and simulate some revlog, but I dont know how useful it is to tweak the parameters

#

But IMO the mod N way to partition Test/Training set seems so nice it would be cool to be able to test it 🙂

quasi shadow Apr 30, 2025, 7:27 AM

#

bold terrace And I think most of the time, just to reduce the good/easy initial ones 0.1079...

Maybe a better method is to provide different initial parameters. I can calculate the median parameters from collections with high retention and low retention. The latter would have small initial stability.

bold terrace Apr 30, 2025, 7:44 AM

#

quasi shadow Maybe a better method is to provide different initial parameters. I can calculat...

Do you know if it's because it converges to a different local minimum or is it just a matter of "optimization budget" as it was referenced earlier

#

But yeah, I'm completely curious to see how some kind of "clustering" can definitely help 🙂

#

I'm also wondering if the decay wouldn't be different between those 2 groups

quasi shadow Apr 30, 2025, 7:46 AM

#

https://l-m-sherlock.notion.site/The-History-of-FSRS-for-Anki-173c250163a1800aa28df6fab676521b

l-m-sherlock on Notion

The History of FSRS for Anki | Notion

Background:

bold terrace Apr 30, 2025, 7:46 AM

#

Since low decay like .1 translate in "It takes very looong time to get down to 60-70% DR", it might be that the group of user with High DR have a different way of approaching Anki (lots of exposure outside Anki) vs people that use mainly Anki (and not a lot of external exposure)

quasi shadow Apr 30, 2025, 7:46 AM

#

Finished!

bold terrace Apr 30, 2025, 7:51 AM

#

quasi shadow https://l-m-sherlock.notion.site/The-History-of-FSRS-for-Anki-173c250163a1800aa2...

lol ! The origin of being a "comment that rubbed you the wrong way"

#

It's funny how alienation can have different results on people 😅 . Some will shut themselves, and other like me included, are almost getting motivated by it 😆

unique salmon Apr 30, 2025, 8:49 AM

#

cosmic hedge left is years, all 0.94

Huh
Have you tried dividing by 1/(t2-t1) just in case? Again, originally the average_forgetting_curve function is supposed to return a number between 0 and 1
This is really strange, I feel like the implementation is wrong somehow
Try dividing and if that still doesn't produce sensible results, show me the Rust code of the integral and I'll try my best to find the problem

unique salmon Apr 30, 2025, 8:58 AM

#

quasi shadow Finished!

I'll re-write some stuff and send you a .docx file later
I think you should write it in very simple layman terms and collapse everything technical, like how you collapsed "Background"

unique salmon Apr 30, 2025, 8:58 AM

#

bold terrace lol ! The origin of being a "comment that rubbed you the wrong way"

Yep, that was me 🤣

quasi shadow Apr 30, 2025, 9:02 AM

#

unique salmon I'll re-write some stuff and send you a .docx file later I think you should writ...

Due to the curse of knowledge, I don't know which terms are technical😂

unique salmon Apr 30, 2025, 9:04 AM

#

quasi shadow Due to the curse of knowledge, I don't know which terms are technical😂

Me and Gemini will handle that 🤣

quasi shadow Apr 30, 2025, 9:06 AM

#

Btw, what's the reason you create FSRS Megathread?

#

I have fogotten it.

unique salmon Apr 30, 2025, 9:07 AM

#

quasi shadow Btw, what's the reason you create FSRS Megathread?

Just so that if people want to talk about FSRS, their messages won't be scattered across different channels and won't be drowned in a sea of other messages

quasi shadow Apr 30, 2025, 9:08 AM

#

make sense

clever cargo Apr 30, 2025, 9:08 AM

#

there probably should be an fsrs channel, to make searching easier

quasi shadow Apr 30, 2025, 9:08 AM

#

But it's still very hard to dig messages from discord.😂

unique salmon Apr 30, 2025, 9:09 AM

#

clever cargo there probably should be an fsrs channel, to make searching easier

I proposed that some time ago, but the mods were like "nah"

clever cargo Apr 30, 2025, 9:09 AM

#

https://tenor.com/view/peeporiot-peeporiot-havi-gif-23057486

Tenor

quasi shadow Apr 30, 2025, 9:09 AM

#

#

We will have 30k messages soon.

clever cargo Apr 30, 2025, 9:10 AM

#

we cant make threads in a thread and we cant search in a specific thread either

hasty fractal Apr 30, 2025, 9:49 AM

#

clever cargo there probably should be an fsrs channel, to make searching easier

this is the 1 millionth time someone said this, mods seem not to care though.

lapis hearth Apr 30, 2025, 10:41 AM

#

Is there actually a learning program like Anki that uses Neural Nets (or AI) as its scheduling algorithm

#

https://www.dekki.ai/

Dekki

AI-driven spaced repetition and flashcard generation. Create, study, and master material faster than ever.

#

I have found this but it seems sketchy

#

unique salmon Apr 30, 2025, 11:00 AM

#

lapis hearth https://www.dekki.ai/

A long time ago I contacted the Dekki guy and suggested that he submit his neural net for our benchmark. Well, he never did

lapis hearth Apr 30, 2025, 11:17 AM

#

has anyone had a good experience with it❓

lapis hearth Apr 30, 2025, 11:18 AM

#

unique salmon A long time ago I contacted the Dekki guy and suggested that he submit his neura...

So it does indeed have a neural net❓

#

Because the whole idea of let the program do the work for you has sold me

unique salmon Apr 30, 2025, 11:18 AM

#

lapis hearth So it does indeed have a neural net❓

Yes

lapis hearth Apr 30, 2025, 11:27 AM

#

So what is holding Anki from using a neural-net as well

#

It seems the Dekki guy sees Anki as a competitor and does not want to reveal the works behind his neural net

robust hill Apr 30, 2025, 11:28 AM

#

the master

#

what can a neural net even do

#

how much more is there that you can optimize

lapis hearth Apr 30, 2025, 11:28 AM

#

robust hill what can a neural net even do

It can notice weird patterns in your memory

#

Which would theoretically make it have a pseudo-short term memory model

#

But I dont know what I am talking about here

#

All I know is that it notices patterns which would otherwise not be easy to model by mathematical formulae

#

So I feel quite tempted by it

#

And then I asked if there are learning programs like it

#

with neural nets above all

robust hill Apr 30, 2025, 11:31 AM

#

somehow

#

im ding so well in this deck

unique salmon Apr 30, 2025, 11:41 AM

#

robust hill how much more is there that you can optimize

Let's pretend that Alex released his net

#

Notice how big of a jump it is compared to everything else in that table

clever cargo Apr 30, 2025, 11:46 AM

#

ye 2.7 million

unique salmon Apr 30, 2025, 11:48 AM

#

I meant log-loss, RMSE and AUC

#

Other models cannot get below 0.31 log-loss, this gets 0.27
Other models cannot get below 3.5% RMSE, this gets 1.4%
Other models cannot get above 0.73 AUC, this gets 0.82

lapis hearth Apr 30, 2025, 11:51 AM

#

So what is the hold up❓ The sync problem I get it but why when other programs like Dekki are doing it🥲

unique salmon Apr 30, 2025, 11:52 AM

#

@polar maple

#

Well, one of the holdups is that it doesn't have a forgetting curve 😅
I mean, it does, but not as a nice, simple formula. So you can get all kinds of weirdness, like the probability of recall increasing over time and whatnot
And it would be very difficult to calculate an interval that corresponds to a specific probability of recall, for scheduling purposes
And it would be difficult to ensure things like Again <= Hard <= Good <= Easy

#

The nice thing about FSRS is that predicting the probability of recall and scheduling the next interval are equally easy, but not with this

lapis hearth Apr 30, 2025, 12:03 PM

#

unique salmon Well, one of the holdups is that it doesn't have a forgetting curve 😅 I mean, ...

Well the thing is, memory is really weird

#

So weird memory = weird intervals = weird curves

#

And Dekki seems to be fine

#

I was just asking for examples of programs with neural nets and it does not seem to be a major con

robust hill Apr 30, 2025, 12:04 PM

#

how are there 2.7 million parameters

#

😭

#

so thisi means that neural net is going to make me a super genius

unique salmon Apr 30, 2025, 12:05 PM

#

I'll message the Dekki guy again, maybe he will participate in the benchmark

lapis hearth Apr 30, 2025, 12:07 PM

#

I really REALLY hope for Anki to have a Neural Net

#

The only way to come close to match the weirdness of the human memory

robust hill Apr 30, 2025, 12:08 PM

#

start coding

unique salmon Apr 30, 2025, 12:08 PM

#

...or not, Reddit just doesn't load chat

lapis hearth Apr 30, 2025, 12:09 PM

#

F***** me

robust hill Apr 30, 2025, 12:11 PM

#

is there a way to see average retreivability for a sepcific day

#

now i have this, only for today, but is there a way i can see what it would be like in 5 days or 6 days

#

if i didnt review the deck

unique salmon Apr 30, 2025, 12:12 PM

#

robust hill is there a way to see average retreivability for a sepcific day

@cosmic hedge sounds like someone wants a "Memorized Over Time" graph natively in Anki
(and that someone is not just me)

robust hill Apr 30, 2025, 12:12 PM

#

because my plan is to only do filtered decks for cards under 90% average retrievability the day before the exam

cosmic hedge Apr 30, 2025, 12:13 PM

#

robust hill is there a way to see average retreivability for a sepcific day

i can do that so retrivability/cards right

robust hill Apr 30, 2025, 12:13 PM

#

yes

#

so like

#

today its 93% but i want to see

#

if i dont do reviews, what would it be tomorrow, or in 5 days

unique salmon Apr 30, 2025, 12:14 PM

#

unique salmon Huh Have you tried dividing by 1/(t2-t1) just in case? Again, originally the ave...

Also

cosmic hedge Apr 30, 2025, 12:14 PM

#

robust hill if i dont do reviews, what would it be tomorrow, or in 5 days

try the simulator with a review limit of 0

cosmic hedge Apr 30, 2025, 12:14 PM

#

unique salmon Also

i'll get to it but last time i tried it with 365 days it didnt do anything

robust hill Apr 30, 2025, 12:14 PM

#

does it use the parameters of the deck im in

#

or of the cards of whatever deck they are in

cosmic hedge Apr 30, 2025, 12:15 PM

#

robust hill does it use the parameters of the deck im in

the simulator runs on presets

unique salmon Apr 30, 2025, 12:15 PM

#

cosmic hedge i'll get to it but last time i tried it with 365 days it didnt do anything

Alright, but there is no way 0.94 for everything is correct. I ran it with the Python simulator (with a simplified config) and it sure as heck wasn't maxing out

cosmic hedge Apr 30, 2025, 12:15 PM

#

unique salmon Alright, but there is no way 0.94 for everything is correct. I ran it with the P...

i can try plot it at some point if you want?

robust hill Apr 30, 2025, 12:15 PM

#

#

lose 2 cards everyday somehow

#

doesnt seem right

#

shouldnt it be a lot more

cosmic hedge Apr 30, 2025, 12:16 PM

#

robust hill lose 2 cards everyday somehow

what settings did you use to simulate it?

unique salmon Apr 30, 2025, 12:16 PM

#

cosmic hedge i can try plot it at some point if you want?

Plot what?

robust hill Apr 30, 2025, 12:16 PM

#

cosmic hedge Apr 30, 2025, 12:16 PM

#

could you screenshot them?

cosmic hedge Apr 30, 2025, 12:16 PM

#

unique salmon Plot what?

the integral/workload

cosmic hedge Apr 30, 2025, 12:17 PM

#

robust hill

what does the reviews graph look like bc that is weird

robust hill Apr 30, 2025, 12:17 PM

#

0

#

cosmic hedge Apr 30, 2025, 12:19 PM

#

robust hill

what does your card stability graph look like?

robust hill Apr 30, 2025, 12:20 PM

#

unique salmon Apr 30, 2025, 12:21 PM

#

I tried the Python simulator with the integral over the next FIVE THOUSAND YEARS and still got 70%

cosmic hedge Apr 30, 2025, 12:21 PM

#

robust hill

try simulate more than 30 days

robust hill Apr 30, 2025, 12:22 PM

#

#

to 518 at the far right

cosmic hedge Apr 30, 2025, 12:22 PM

#

unique salmon I tried the Python simulator with the integral over the next FIVE THOUSAND YEARS...

pub fn average_f_power_forgetting_curve(
    learn_span: usize,
    cards: &[Card],
    decay: f32,
) -> f32 {
    let factor = 0.9_f32.powf(1.0 / decay) - 1.0;
    let exp = decay + 1.0;
    let den_factor = factor * exp;

    // Closure equivalent to the inner integral function
    let integral_calc = |card: &Card| -> f32 {
        // Performs element-wise: (s / den_factor) * (1.0 + factor * t / s).powf(exp)
        let t1 = card.last_date - learn_span as f32;
        let t2 = t1 + 365.;
        (card.stability / den_factor) * (1.0 + factor * t2 / card.stability).powf(exp) - 
        (card.stability / den_factor) * (1.0 + factor * t1 / card.stability).powf(exp)  
    };

    // Calculate integral difference and divide by time difference element-wise
    cards.iter().map(integral_calc).sum::<f32>()
}
``` if you want to check it

cosmic hedge Apr 30, 2025, 12:22 PM

#

robust hill

given your stabilities that seems accurate to me

robust hill Apr 30, 2025, 12:23 PM

#

i see

#

i guess i am doubting myself

#

can we bring back decimal desired retention 🙏

unique salmon Apr 30, 2025, 12:24 PM

#

cosmic hedge ```rs pub fn average_f_power_forgetting_curve( learn_span: usize, cards:...

Give me an example output using some S and some t1 and t2 and decay =-0.2

cosmic hedge Apr 30, 2025, 12:24 PM

#

robust hill

only slightly jealous

robust hill Apr 30, 2025, 12:25 PM

#

well

#

my desired retention is 80%

#

haha

cosmic hedge Apr 30, 2025, 12:26 PM

#

unique salmon Give me an example output using some S and some t1 and t2 and decay =-0.2

paste the code above this into here #1282005522513530952 message

unique salmon Apr 30, 2025, 12:26 PM

#

cosmic hedge paste the code above this into here https://discord.com/channels/368267295601983...

Already did

cosmic hedge Apr 30, 2025, 12:26 PM

#

unique salmon Give me an example output using some S and some t1 and t2 and decay =-0.2

wait i forgot to fix it if you copied it quickly copy it again

#

wait no

#

hold on

unique salmon Apr 30, 2025, 12:27 PM

#

I just want you to give me the output for some input S, t1, t2, decay so that I can verify the math

cosmic hedge Apr 30, 2025, 12:28 PM

#

unique salmon I just want you to give me the output for some input S, t1, t2, decay so that I ...

i changed it again it should work now

cosmic hedge Apr 30, 2025, 12:28 PM

#

unique salmon I just want you to give me the output for some input S, t1, t2, decay so that I ...

unique salmon Apr 30, 2025, 12:33 PM

#

cosmic hedge

wut

#

?

#

Is your t1 negative?

#

Or what is going on there?

#

I'm trying to figure out what this could mean, and I can't

#

t1 is just time since the last review of this card

#

And I can't reproduce your number, btw

#

Ok, yeah, so your t1 is negative

#

Though I doubt that's the reason why you're getting 94% every time

#

I have no idea how you're getting 94%
Let's try to do this as properly as possible:

No negative t1, it's the number of days since the last review

def average_f_power_forgetting_curve(t1, t2, s, decay):
if not t2 > t1:
raise ValueError("t2 must be greater than t1")

# Calculate F(t2) - F(t1) where F is the antiderivative
integral = integral_power_forgetting_curve(t2, s, decay) - integral_power_forgetting_curve(t1, s, decay)
print(f'Raw integral={integral:.5f}')

# Divide it by the difference in time to get the average
return integral / (t2 - t1)```

Divide by t2-t1. If the integral is over the next 365 days, divide by 365. If it's over the next 1825 days, divide by 1825, etc. Aka ensure that the output is between 0 and 1

#

I just want to confirm that you get 94% even if everything is exactly as intended, no cutting corners

unique salmon Apr 30, 2025, 2:28 PM

#

lapis hearth F***** me

I sent them an email

#

But I'm like 90% sure they won't participate in Jarrett's benchmark

cosmic hedge Apr 30, 2025, 2:36 PM

#

unique salmon Ok, yeah, so your t1 is negative

now its 0.7 again 🎉 ```rs
pub fn average_f_power_forgetting_curve(
learn_span: usize,
cards: &[Card],
decay: f32,
) -> f32 {
let factor = 0.9_f32.powf(1.0 / decay) - 1.0;
let exp = decay + 1.0;
let den_factor = factor * exp;

let offset = 365. * 10.;
// Closure equivalent to the inner integral function
let integral_calc = |card: &Card| -> f32 {
    // Performs element-wise: (s / den_factor) * (1.0 + factor * t / s).powf(exp)
    let t1 = learn_span as f32 - card.last_date;
    let t2 = t1 + offset;
    (card.stability / den_factor) * (1.0 + factor * t2 / card.stability).powf(exp) - 
    (card.stability / den_factor) * (1.0 + factor * t1 / card.stability).powf(exp)
};

// Calculate integral difference and divide by time difference element-wise
cards.iter().map(integral_calc).sum::<f32>() / offset

}

#

so was the problem you had with Jarrett's cost by retention was that the numbers were too arbitrary or something?

unique salmon Apr 30, 2025, 2:47 PM

#

cosmic hedge so was the problem you had with Jarrett's cost by retention was that the numbers...

Yeah. Time per answer as a function of R would be nice, but his solution wasn't really that

#

We can do it properly though

unique salmon Apr 30, 2025, 2:48 PM

#

cosmic hedge now its 0.7 again 🎉 ```rs pub fn average_f_power_forgetting_curve( learn_sp...

But first I'd like you to try 1/5/10/50 years again with this code and report the values

cosmic hedge Apr 30, 2025, 2:51 PM

#

unique salmon But first I'd like you to try 1/5/10/50 years again with this code and report th...

i just tried another deck with the 10 years one 🎉

#

i think basing anything off of what happens in 10 years time might be slightly insane already though 😂

#FSRS Megathread