#FSRS Megathread

1 messages · Page 11 of 1

unique salmon
#

Too bad only, like, 4-5 people would enjoy it, cause barely anybody pays attention to this benchmark

polar maple
#

maybe we can go lower than 0.5% RMSE (bins) by combining RWKV-P + RMSE-BINS-EXPLOIT

unique salmon
robust hill
#

what if

#

lets see Expertium's algorithm metrics

polar maple
#

the problem is that we cannot exactly know if a good algorithm isn't cheating the metric a bit

unique salmon
#

If an algorithm doesn't use our binning formulas internally, it's fair

polar maple
#

FSRS is even directly optimized on RMSE (bins), direct cheating apparently

#

and when you guys find improvements in FSRS it is often in terms of % reduction in RMSE (bins), you are optimizing over the metric directly over time

unique salmon
unique salmon
#

And it doesn't internally keep track of log-loss

#

So it's fair

polar maple
#

whatever algorithms do internally is their own business, we should make the metrics as robust as possible to anything that they try

robust hill
#

💀

#

the algorithms have more privacy than me

polar maple
#

@quasi shadow please give fsrs wiki modify permissions (I think i need permission to modify the page?), i'll write about RMSE (bins)

polar maple
#

and at least on anki it does compute RMSE (bins) to make sure that the new parameters do better on RMSE (bins) than before

#

so it also does compute bins

#

its just not in the way that you consider cheating, but the fact is that it does compute the bins

unique salmon
ashen light
#

this isn't loss and I am disappointed

unique salmon
#

How about we skip the forum and just ask Dae directly to remove "Evaluate"? 🤔

polar maple
unique salmon
#

Feel free to replace "FSRS" with {algorithm_name}

#

The point is that if it doesn't calculate the loss internally, it can't be cheating

polar maple
#

FSRS computes log loss when doing internal gradient descent and also RMSE (bins) when updating parameters, so it is definitely cheating

#

in the same way that you claim that RMSE (bins) is cheating

polar maple
#

and also in anki, FSRS does not give new parameters unless it does better than the previous ones on RMSE (bins)

unique salmon
polar maple
polar maple
#

it's the entire optimization process that FSRS uses

#

without optimization FSRS doesn't learn

unique salmon
#

You cannot be serious man...
When FSRS calculates difficulty, it does not use log-loss/RMSE
When FSRS calculates stability, it does not use log-loss/RMSE
When FSRS calculates retrievability, it does not use log-loss/RMSE

polar maple
unique salmon
polar maple
robust hill
#

noo do not remove optimize

#

i love optimize

polar maple
#

to me RMSE-BINS-EXPLOIT isn't cheating and it shows that RMSE (bins) is unreliable

unique salmon
polar maple
#

@unique salmon how about we go back to plain old RMSE, no bins? nice and human interpretable

robust hill
#

i love them all

unique salmon
polar maple
#

fair enough, do we know if AUC is also correlated with retention? could show AUC instead of RMSE (bins)

unique salmon
#

I made an issue

robust hill
#

nooo

#

i love it

unique salmon
#

(screw forums because people on forums disagree with me 🤣)

robust hill
#

evaluate makes me feel like a scientist

robust hill
#

expertium if you delete evaluate button im going to delete you

#

no i wont mods dont punish me pls

unique salmon
#

Love me some AUC less than 0.5, lol

#

Love it when FSRS does worse than random 🤣

#

Zoomed in a bit

polar maple
#

seems usable

#

if AUC is less than 0.5, turn off FSRS

#

@unique salmon btw ill take a look at nn D but i'll write my own code, won't make any promises on this

unique salmon
#

🤣

unique salmon
bold terrace
#

I agree with @polar maple , if a metric is exploitable, you can defend an algorithm to not internally try to exploit it but by definition of trying to minimize it it might, a Neural Network could led to such cheating without being specially instructed to do it like this

#

But at the same time with log loss you would over privilege shorter interval precision no ? Since the vast majority of reviews are 1-20d

#

You could argue that then having better precision for that vast majority should be the goal though

bold terrace
unique salmon
#

That would be evidence in favor of NN cheating

unique salmon
bold terrace
#

Not actionnable ? People could based on it understand that something is wrong in their prediction so they can adjust that

#

They can also see if splitting différents deck in different preset help or not

unique salmon
bold terrace
#

So explain how to interpret those numbers

#

You’re going full dictatorship

#

Or just trolling but this is just ridiculous

unique salmon
robust hill
#

look at my fire rmse bins

unique salmon
robust hill
#

well i only have 1 day of reviews so thats probably why i have .38% 💀

unique salmon
#

And before you say "That is not what I said at all" - there is absolutely no way in hell we can make "Evaluate" intuitive

#

Like, none

#

The best we can do is "lower = better", which is what it already says

#

We could add the log-loss formula, but that would scare the average person even more

bold terrace
unique salmon
#

Here's your intuitive log-loss, lol

bold terrace
unique salmon
bold terrace
#

You’re playing the dumb troll but you know you bend truth to justify being smarter

#

You give people a “normal range of log loss” based on user in the 10k dataset and voila case solved

unique salmon
#

...except that the benchmark uses a different procedure

bold terrace
#

But you know that very well you’re just playing pretend

bold terrace
#

You’re just playing the “I know better than people so people should not be able to judge by themselves”

unique salmon
bold terrace
unique salmon
#

The whole point of FSRS is (supposed to be) that it outsorces tweaking to the computer

bold terrace
#

IMO you just get so attached to FSRS like it’s your baby that you just want to control it’s course

unique salmon
bold terrace
#

You unilaterally create a request to do something most people asked by you, not to

#

It tells more about you than average users or than about me

bold terrace
ashen light
#

the evaluate button is literally useless

#

the numbers mean literally nothing to anyone except 5 people

bold terrace
#

Or because you think redditors are smarter than the average user from Anki board ?

unique salmon
bold terrace
ashen light
#

even if it is explaied, what actionable things can I do with that number?

#

like even if I knew exactly what the fuck "Log loss: 0.2826, RMSE(bins): 3.35%. " means, what do I do with that information

bold terrace
unique salmon
ashen light
bold terrace
ashen light
bold terrace
#

FSRS would have to be removed too to be sure people are not screwed by it then, this is non sense

ashen light
#

what imaginary problem will suddenly be resolved by the contents of the evaluate popup

unique salmon
#

As I wrote, the parameter field can be useful

ashen light
#

out of curiosity, has anyone outside of maybe that ismael guy actually posted truly shit numbers?

bold terrace
#

Also I’m sorry but most of the time the “help” I got before diving in understanding FSRS was more or the time “you understand nothing” …

ashen light
#

I'm pro hiding the parameter list too, maybe a "click here to copy info when asking for help" that puts it all in the clipboard for debugging help

bold terrace
#

Comparing my DR to my Actual Retention ? Being said I can’t …

unique salmon
unique salmon
ashen light
#

anyway from what I understand the actual argument is not to actually remove evaluate, but to put it somewhere that downplays its importance/relevance

#

(at least that seems to be david's idea)

#

I think the preset ui is just really bad at containing weird niche options

#

so everything just has the same weight to it no matter the importance

unique salmon
ashen light
#

IF IT WERE ME: multiple setting categories: core, extra, fringe

#

this is peak 'fringe' category

bold terrace
unique salmon
bold terrace
ashen light
#

this isn't about beginner/pro, this is about multiple tabs that may have the same categories but with different tiers of dumbshit

bold terrace
#

“You’re so wrong your beyond the point of being helped”

ashen light
#

two uis are a mess, but this is just one ui

#

¯_(ツ)_/¯

ashen light
unique salmon
ashen light
#

that might have solved everything

bold terrace
#

But I was so wrong it was difficult to explain it to me

ashen light
#

just be less wrong yo

#

I bet rmse bins could solve this

unique salmon
#

...and then I am called a troll 😅

bold terrace
#

Then the same people are taken as example as helping others, while their just getting their ego dose of feeling smarter than everyone

ashen light
#

all this talk yet no real world use of rmse bins found

bold terrace
#

Unfortunately those are ignored I guess

ashen light
#

they will be ignored by literally every anki user

#

I was roleplaying the average person using this app

unique salmon
bold terrace
ashen light
#

"I have to read the manual to understand what this number means and what to do with it? FINALLY I LOVE THIS FEATURE" - sound, probably

bold terrace
ashen light
#

@bold terrace did you move to sm2?

bold terrace
#

Followed by “you’re really a geek to have enabled FSRS and complain about that difference “

unique salmon
bold terrace
#

Reddit

unique salmon
#

Well, I can't ask random Anki users, so this is the best I've got

ashen light
#

@bold terrace maybe accept you're in the 0.001% of people in the dataset that is better served by sm2 and use it instead 🍃

#

can't dae like....scan all the data on ankiweb for decks that have the fsrs feature enabled?

unique salmon
bold terrace
#

500k view

ashen light
#

@bold terrace can you do an experiment for science and use sm2 for a few months?

bold terrace
#

The guy still tweak SM2

ashen light
#

see how much better your numbers are?

#

it'll be a nice data point I think

bold terrace
unique salmon
ashen light
#

I have zero grasp on how it works and I just chose to not care to think about it and I'm doin fine

#

¯_(ツ)_/¯

bold terrace
#

Because you’re using Anki and developing in Anki for years 🤷

ashen light
#

that is quite an overstatement of my experience

#

fsrs didn't even exist the previous time I was using anki

bold terrace
#

Funny enough Anki is in a boom

#

SM2 is 6 times bigger than FSRS for whatever that means

ashen light
#

shitpost: reminder than sm is currently at version 11 and sm2 is basically 20 years old

#

🍃

unique salmon
#

Actually, no, 18

#

Actually, wait...

#

gimme a sec

#

I'm unsure if there exists SM-19 or no

#

SM-18 is definitely a thing

ashen light
#

man I am OUTDATED

unique salmon
ashen light
#

also how many "sm2" searches are looking for super mario 2

#

a nonzero amount thats for sure

bold terrace
#

I’m going a bit in the Ad Hominem territory but sometimes I feel Anki is not always used by the people to actually learn the stuff they want to learn but more because it became almost its own thing 😅

bold terrace
unique salmon
#

There is SuperMemo 19 (software), but unclear if it's using a new algo

#

I'll take that as "SM-18 is the latest algo"

unique salmon
ashen light
#

nooooooooo now I have to fill it out again

bold terrace
unique salmon
ashen light
#

fsrs needs priorities

#

so it can be compared to sm18

unique salmon
#

The issue is that nobody is going to give us data

#

Jarrett barely scrambled 16 or so collections of SM users

bold terrace
#

Yeah

ashen light
#

why has no one reverse engineered any of the sms past like 2

bold terrace
#

On that note at least FSRS community is open about data

unique salmon
ashen light
#

back in the dark ages of sm-5

unique salmon
#

Also, FSRS sorta-kinda counts as "reverse engineered SM"

unique salmon
# bold terrace

Priorities don't affect the calculation of the probability of recall though, unless I misunderstand how SuperMemo works

ashen light
#

@bold terrace would priorities solve your problem

bold terrace
#

Holy cow

#

There’s worst overthinkers out there

#

Using SM to forget things

unique salmon
#

From articles about algorithms and math to...uhhh...some vague stuff about the brain of a certain political leader and about Elongated Muskrat

#

He's a bit of an odd fella

ashen light
#

"can supermemo be used to forget things"

bold terrace
#

Investing and Vtubers too ?

unique salmon
bold terrace
ashen light
#

I read the list before I realized you ALSO pointed that one out

#

I mean techincally I'm using anki for that purpose

#

the idea is to just fill my brain with so much garbage it pushes other things out

#

I realized I was spending way too much on alcohol

#

and anki seemed like a cost effective replacement long-term

bold terrace
ashen light
#

I'm a craft beer weenie

unique salmon
polar maple
#

personally i get a little dopamine rush when i see that parameters have changed and also the metrics look better

ashen light
#

I think anki needs more visual flair, like you hit the optimzie button and theres a graphic that shows the numbers moving

#

animated bar graphs going up or down

unique salmon
unique salmon
#

Adds zero utility, but at least it's fun 🤣

#

Actually, nvm, it would be a nightmare if you have 20 presets and use "Optimize all presets"

#

Would be like getting ads

ashen light
#

nah its one big set of graphs all animating at once

#

this is how we get the young users and vc funding

#

brb gonna make my own anki

#

called wanki

#

with this shit

#

it'll be great

unique salmon
#

DerIshmaelite watching 260 graphs going down

bold terrace
#

Wanna attract more people ? Preinstall decks 😂

#

Things like yomitan or Anki where you have to install external decks or dictionaries is a no go for most people

polar maple
unique salmon
#

Oh, yeah, configuring dictionaries for yomitan is a huge filter

ashen light
bold terrace
#

Anki feels quite hacky

#

I mean configuring the css of your card …

ashen light
#

coca cola presents: basic spanish vocab

bold terrace
#

Defining the data type of your card fields …

#

Even the difference between a card and a note

#

I know more or less FSRS but I’m still afraid of using “cloze”

unique salmon
bold terrace
#

Let’s documente cards notes and cloze with an UML diagrams

unique salmon
#

RMSE (bins) I mean

ashen light
#

on a more serious note: the fact that you gotta use a website then copypaste like a number to get decks/addons is mega jank

#

like how is that not built in still

polar maple
unique salmon
ashen light
ashen light
bold terrace
#

Some graphical editor would help many people

ashen light
bold terrace
#

WYSIWYG

unique salmon
ashen light
#

@bold terrace glad to hear you are working on this feature I expect to see a PR soon

polar maple
#

RMSE (bins) can be made to be uncheatable by actually 5-way splitting it properly, but it would no longer make sense to use it on algorithms that adapt on the fly like RWKV

unique salmon
ashen light
#

but think of the tryhards who don't want a wysiwyg experience

#

I need layers of js in my templates and dae refused to let me add a separate scriping field

bold terrace
ashen light
#

ui complexity oh nooooooo

bold terrace
#

Reddit, old bbcode forums, …

#

Two tabs, easy win

ashen light
#

ui complexityyyyyyyyyy

unique salmon
#

I'm trying to visualize how to hide "Evaluate" in away that isn't jank and doesn't require lots of scrolling

bold terrace
#

Come on

ashen light
#

but now we got tabs and css radio buttons

unique salmon
#

kek

ashen light
#

maybe opacity: 60%

#

possibly more seriously: is there a good drop-in wsywig

bold terrace
#

I mean

bold terrace
#

Come on

#

Be slightly honest for one minute

#

Preview/Markup, UI Complexity compared to this ?

ashen light
#

where even is that

bold terrace
#

come on

#

You can do a bit better can’t you

bold terrace
ashen light
#

I mean thats peak ui performance right theer

unique salmon
bold terrace
#

The only one that you need to pay

ashen light
#

oh I don't use ios

#

cause I don't pay for things

ashen light
bold terrace
unique salmon
#

You know what would make it even better? Smaller button + less opacity

#

You know what would make it even better? Make it even smaller and decrease opacity further!

#

Perfection

ashen light
#

[insert that vince mcmahon meme here using these images]

unique salmon
bold terrace
#

Don’t know if I’ll be able to use Anki with CMRR outliving Evaluate

#

At least get rid of that with it

unique salmon
#

Jarrett wants to remove CMRR

bold terrace
#

The fact it will be be improved or removed ?

unique salmon
#

And I am like "LUUUUC! SAVE US!"

unique salmon
bold terrace
unique salmon
#

Except that I can see the benefits of CMRR, but have to think of rare edge cases to maybe come up with some questionable benefits of Evaluate

ashen light
#

the people involved think its easier to remove than fix

#

unless you're gonna do the work making it better, its probably gone

bold terrace
unique salmon
bold terrace
#

Ad hominem but maybe changing that could help you bonding with real people 😅

unique salmon
#

Aka we'll see what the average user thinks

ashen light
#

I think we burn down all metrics, remove everything except "enable fsrs" and an "optimize" button

unique salmon
#

Or at least the average r/Anki guy

unique salmon
ashen light
#

maybe "optmize" and "optimize with reschedule" and get rid of that stupid fucking toggle

unique salmon
#

Actually, you forgot DR

#

Optimization can be made automatic

#

Well...uhhh...

#

😅

ashen light
#

how do you handle auto optimize and rescheduling on optimzie

#

auto-optimize to me implies no rescheduling

unique salmon
#

ahem
Anyway, this is peak
(David's pic btw)

ashen light
#

100% agree

#

those nerd numbers can go be an addon

bold terrace
#

TBH I wouldn’t mind those options to be in FSRS helper and leave the normal use case being DR only

ashen light
#

https://i.imgflip.com/9qx5v1.jpg anyway I quickly used some garbage website to make this garbage meme, its not even high enough res to see anything and I didn't even have a 4th image

bold terrace
#

I’m against removing evaluate but moving it I don’t mind

ashen light
#

I was hitting "I'm spending too much time on this" territory and so you get this half-finished thing

bold terrace
#

But moving it would mean moving it with the optimize

ashen light
#

actually david's image should be in the last slot

unique salmon
# bold terrace I’m against removing evaluate but moving it I don’t mind

Btw, I expanded my Github comment a bit

Moving "Evaluate" somewhere else is nice in theory, but I can't think of a good implementation. If we put it in "Advanced", it will be awkward (scrolling back and forth between sections) and unclear that this button is related to FSRS in the first place, unless it says "Evaluate FSRS parameters" or something. Maybe we could collapse it, but again, I can't visualize a good implementation.

ashen light
#

if only it had enough pixels to actually see anything

bold terrace
#

While someone still has to pay its rent with his anki videos

unique salmon
#

Ok but this kinda true

unique salmon
#

Now THIS is the average Anki user

#

The perfect specimen

bold terrace
#

When you think about i

#

I knew Anki through r/LearnJapanese

unique salmon
#

I have no idea why, honestly

bold terrace
#

but actively following r/Anki is not something you do as an average guy

unique salmon
#

I mean, I can think of a few explanations, such as not optimizing parameters and Hard misuse, but still

polar maple
#

but who would complain about TR>DR?

bold terrace
unique salmon
cursive badge
#

I've only just caught up 😅
Evaluate can be useful to decide if splitting presets is a good idea. You don't need to understand what the numbers mean, just before > after = good
You might be able to make nice UI that tells you without showing the numbers, but until then having the numbers is useful.

bold terrace
#

FSRS on the other hand is "expected retention"-based. You don't get "higher prediction" at every lapse per se, you might get higher D and lower I, but the prediction will still be a constant DR (not increasing R)

bold terrace
#

I wanted to switch one deck to SM2, and one to FSRS, but it's a global setting

#

so I need 2 user profiles

#

and I'm a bit lazy for that now

#

And IMO FSRS is superior at both task (precision AND increasing retention) if you know how to tackle it (which is extremely simple : Increase your DR, wether globally through Deck Options, or through Filtered Decks)

unique salmon
#

Time for another spaced repetition civil war

bold terrace
#

SM2 is "let's calibrate your ease factor for each card" which is less efficient than letting FSRS guess the profile of your card in 4-5 reviews

bold terrace
#

And I know I can sound contradicting, but my point is : FSRS is truely the way forward, but we just need a way to hold the hand of the user

unique salmon
#

Remove Evaluate side: me, David, sorata, jake
Keep Evaluate side: Sound, Danika, rossgb

bold terrace
#

Seems you forgot @cursive badge , and all people in the Anki forum that were too dumb for this poll 1h ago 😄

unique salmon
ashen light
#

ultimately none of our opinions matter on the topic

#

since daes gonna do what daes gonna do

unique salmon
#

Dae giveth and Dae taketh away

bold terrace
cursive badge
#

Well maybe I'm going to make my own SRS app with blackjack and hookers 😝

bold terrace
cursive badge
#

And 0 users because I'll probably get bored and go back to Anki ;p

ashen light
#

how would you make anki clean and simple

unique salmon
#

Unironically, why not just keep "Evaluate" in the Helper Add-on?

ashen light
bold terrace
#

Maybe a full js memory-vore thing

ashen light
#

thats like....the least complicated thing about it

bold terrace
#

Or some kind of full-python stuff

unique salmon
#

We could just hide the button, but keep the underlying code to be accessed by the add-on

ashen light
#

anki used to be 90% python

bold terrace
#

The rust-cool boys entered the chat ?

#

Rust sounds super super cool

cursive badge
#

I still dream of WASM/JS runtime.

ashen light
#

rust was done to allow the code to be used on all platforms

bold terrace
#

But just like Ruby was before it

unique salmon
#

Although Jarrett said that he doesn't plan to add any new features to the add-on FeelsBadAnki

ashen light
#

can't really have a python backend running on ios

bold terrace
#

oh

#

Python and iOS doesn't work well ?

ashen light
#

there were real reasons for the rust move

cursive badge
ashen light
#

and the current multi-lang architecture

#

java could have been an option!

#

point is ankidroid is able to have feature-parity trivially due to the rust backend, before that it was constantly year(s) behind in terms of compatibility

bold terrace
#

Also personally, in my little dream SRS, no concept of Notes/Cards, no cloze, searching deck and addon from within the app, just "Again/Good" by default

ashen light
#

how do you handle cases where multiple cards naturally fall out of the same source data

unique salmon
ashen light
#

note/cards is like one of the most convenient things, its just the ux is garbage

bold terrace
#

I don't know I feel notes/cards is a good idea theoritically but in practice I don't think I could transform my Vocabulary deck into a Sentence deck just by mapping different fields in my front

#

The quality of the card would suffer

ashen light
#

my own system wuold be very frustrating without it, especially as I bury my siblings

unique salmon
#

I think notes vs cards is a necessary evil: it's confusing, but the alternative is having to make a lot more cards, not being able to edit all of them at once, and not having "bury siblings"

bold terrace
#

For ex, I think people doing JP->EN and EN->JP from the same notes will always struggle with synonyms

#

Becuase you might need some context around the JP term to know which one you want in the EN->JP answer, and context for the EN in the JP->EN

unique salmon
#

Same goes for deck vs preset: a necessary evil. Otherwise you will have to either

  1. configure each deck separately
    OR
  2. have the same settings for all decks
cursive badge
#

It would probably be terrible UX, but I kind of wish cards were decoupled more from notes. I like the idea of there being an explicit knowledge graph where cards can be based on multiple nodes.

bold terrace
#

Oh !

cursive badge
#

It's probably something that would work better hidden behind the scense in something like Duolingo

bold terrace
#

@cursive badge you got me there, I was about to point of "F'cking card links"

#

Right now I have a field that help me tremendously : "Confusion". When I confuse card A with card B, I add "B" in "Confusion" of card "A"

#

So at every review of A, I also reinforce my ability to dissociate A and B

#

Having things completely atomic in Anki is not that great IMO

#

Being able to relate cards together would be a killer feature

ashen light
#

what types of relations

#

how would relations affect reviews

bold terrace
#

"Synonyms", "Antagonist", "Different form"...

unique salmon
#

FSRS could take that into account in theory

ashen light
#

what do those relations do in terms of actually reviewing

bold terrace
#

Let's say you got wrong a word because you mixed it with another, Anki could then put those 2 in alternance in the next upcoming days

unique salmon
#

We tried that with siblings, but the improvement was too small

ashen light
#

and people are gonna....draw lines between cards?

#

whats the ui for this

bold terrace
#

Typical problematic scenario : A and B are very similar.
Day 1 : Review A (i=5)
Day 5 : Review A (i=10)
Day 15 : Review A (i=20)
Day 20 : Review B, got it wrong because you thought it was A
Day 21 : Review B, got it right
Day 26, Review B, got it right
...
Day 35 : Review A, got it wrong because now you thought it was B.
...
Day 65 : Review B, got it wrong because now you thought it was A

#

You do that "confusion dance" until A and B fall in the same day bin

#

With relationship, confusing B could reduce the interval of A, or sync it to B, so you review always them together, to be sure you can differentiate them

cursive badge
#

A dream feature for me would would be to map answers to the knowledge graph and automate the creation of "scaffolding" cards when interference is detected.

#

The Math Academy FIRe stuff would be fun too.

bold terrace
#

Yuup

ashen light
#

the math academy stuff is carefully crafted decks and relations though?

bold terrace
#

But IMO, this kind of graph might be easier to do if you do a tool speciallized to a domain (math, japanese ...)

ashen light
#

like how much work goes into that

bold terrace
#

if you do it agnostic, then it's a bit difficult to build all that model yourself

#

With a domain-specific approach, you could even use "community defined relationship"

cursive badge
#

It would be really hard to create a good UX. That's why I said it makes more sense for a Duolingo-like. I can dream though.

bold terrace
#

New users would directly benefit from some relationship like "Synonyms" etc

ashen light
#

finally the peanut gallery can affect my anki reviews

unique salmon
#

Model: FSRS-5-siblings
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5-dev LogLoss (mean±std): 0.3270±0.1525
FSRS-5-dev RMSE(bins) (mean±std): 0.0507±0.0325

Model: FSRS-5
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3276±0.1526
FSRS-5 RMSE(bins) (mean±std): 0.0518±0.0333

FeelsBadAnki

bold terrace
unique salmon
#

For the record, a neural net didn't do much better either

#

Though maybe Alex's nets are smarter

bold terrace
#

here we're not really talking JP->EN vs EN->JP, but more like mesh of networks, with semantic relationship

polar maple
#

it would be cool to get a dataset with actual information on the cards

bold terrace
#

Also, the precision would not necessarly be the first benefit

#

The fact B entered the chat, even change the initial prediction of A

#

so prediction could change very organically based on what other things happen

#

"A card C has been introduced ? It has "Synonyms" links with cards A and B ? Let's sync their recall timing !"

#

You could even create "routes" of learning in that network

#

bulk things together, etc

ashen light
cursive badge
ashen light
#

what I have problems with some other person might just get

bold terrace
#

I had cards with >100 reviews I kept getting wrong until I did the work to manually do that outside Anki

#

I think Anki is an extremely inefficient way to learn even vocabulary

ashen light
#

and identifying and doing the work outside anki would be faster than creating some sort of relation graph in anki

bold terrace
#

If you consider a language like Japanese has 500k word, let say you need to know 50k word, at a rate of 8 new/day, it's 17 years !

bold terrace
bold terrace
unique salmon
bold terrace
#

Before I had tht mindset : "Every word I know should be in Anki so I'll be able to really track my progress"

#

Now I realize how infeasible it is

ashen light
#

every word should be in anki because anki is a grind

#

and are you grinding if you don't add everything you don't know to it?

#

how else are you gonna push the unwanted memories out

bold terrace
#

That's the problematic mindset indeed

#

BUT

ashen light
#

anki is a form of penance

unique salmon
bold terrace
#

In fact, many words overlap, so finding ways to batch them is really a nice trick to achieve higher new/day without having to wait to master 20k words before it goes faster

#

I batch my next new card by common kanji now

ashen light
#

at least have the kanji for orange there

#

its so left out

unique salmon
bold terrace
#

And to find the next batch, I added a sorting order "Unseen Card" in Kanji Grid (By Kuuuube)

#

so I take a mastered-kanji for which I have a lot of cards to learn

#

and I can make learnig new words way more easily

ashen light
#

not even 100%

bold terrace
#

IMO grinding better logloss/RMSE prediction already hit a threshold of diminishing returns

ashen light
#

you need to grind harder clearly

cursive badge
#

As a side topic: I reduced my DR from 90%->80% last month because I was struggling. It halved my reviews but somehow my TR for the month only dropped to 87% 😅

unique salmon
bold terrace
#

I dropped from 80->70% in December, went from 77% to 55%

#

never again

bold terrace
#

#Team90DRforevernow

cursive badge
#

Well, the helper addon version.

bold terrace
#

CMRR is full of lies and shadows

#

@cursive badge : But you still do higher DR for mature words no ?

#

So it pull the retention higher /

#

I remember you said you had Filtered Decks for mature

unique salmon
#

CMRR be like:
This user has a deck with exactly 10*days_to_simulate cards in it. And he can learn an infinite number of new cards, but only as long as the overall studying time does not exceed 30 minutes

#

This is why I asked Luc to just hook it up to the simulator config

bold terrace
#

CMRR be like : return 0.70

#

What about ... 0.70

cursive badge
bold terrace
#

Gotta take some 0.70 ?

ashen light
#

I like how it just returns the lowest valid value, "I mean it is the minimum recommended value"

unique salmon
#

CMRR was intended to have as few settings as possible, but as a result you get this "The user has no learned cards, deck size=10*days_to_simulate, Easy Days don't exist, fuzz doesn't exist, sort order doesn't exist, new card limit is infinite" shit

ashen light
#

sounds a bit halfbaked

unique salmon
#

This is why I asked Luc to just hook it up to the simulator config

bold terrace
unique salmon
#

luc plz deliver

#

prayge

bold terrace
#

So it's easier to have a full backlog of 30% Retention than to maintain cards at 80% R

polar maple
cursive badge
#

Secretly I was one of the low decay weirdos all along 😮

#

I'm definitely not remembering this stuff in 100 years though, so maybe not that weirdly low ;p

bold terrace
#

Also IMO the default graph selection of Anki should be a bit more useful than "What was your Retention at 9PM".

  • Stability over Time
  • Memorized over Time f(R,S), not just sum(R)
  • Daily Load profiling (by lapse, repetitions...)
unique salmon
unique salmon
bold terrace
#

Like, my R is around 90% between noon and 6PM and very low after 8PM ...

No shit sherlock, I had DR=70/80% when I was doing my reviews at night

bold terrace
#

f(R,S) means : a function depending on R and S

unique salmon
#

Yep

bold terrace
#

integral based or not, doesn't matter

#

you create fights where there is none

unique salmon
#

Think of one that still has an intuitive interpretation

#

Oh, the integral one does, btw

bold terrace
#

exactly

unique salmon
#

R*(1-exp(-S)) or whatever doesn't

bold terrace
#

whatever

unique salmon
#

Or R×sqrt(S) or R×ln(S)

bold terrace
#

Damn you really missunderstand everything 🥲

#

By F(R,S) I just mean any kind of function taking into account R and S

#

so if you like your integral so be it

#

I Don't care xD

#

Go jerk on vtubers

unique salmon
#

There is a difference between "it has desirable mathematical properties" and "it can be explained in one sentence with <20 words to an average user"

#

You want something with desirable mathematical properties, but in this case it makes it very hard to come up with something that also has a simple, intuitive interpretation

#

And not just "higher number = better"

unique salmon
#

I was planning to use the integral for CMRR

cursive badge
#

Ross like line go up. Line go up make Ross feel good.

bold terrace
#

sum(R) should be called "Expected Score at a test" IMO 😛

#

If you have 50% R for all your card, you might be able to get a 50% at a test with the same cosntraints than in Anki

#

So yeah, sum(R) can make sense

#

But c'mon, please people, have a bit more self-love than just trying to get x-% at a test 🥲

#

Try to remember it more than one day 😄

bold terrace
#

Soooo if all you see is a retention line that stick at DR for month

#

you might feel you're getting nowhere

#

but in fact, Stability might increase, workload might be dropping ...

#

all those tiny positive things need to be brought up 🙂

#

That's also why I think sum(R) is a bit silly : If your DR is 80, then sum(R), without adding card ........... will always be somewhere around N*90% (N your number of active card)

cursive badge
#

Kind of sad if you are looking for progress though.

bold terrace
#

Yeah to me sum(R) is really just a "test score estimate" in some way

#

Well not even an estimate 😛

#

You know at 100% R 50% of your deck, the Memorized will tell you 50% N, but in practice you'll most definitely miss your test if the grading condition is 50% good answers

#

(if you're not lucky and pick questions you were at 0% in your deck)

#

So sum(R) is like "Your estimated score test result, if the test consist of all the cards of your deck" lol

cursive badge
#

Maybe I need a Stability over time heatmap to show progress.

unique salmon
#

You can plot sum(S), yes. Though I can't think of a nice interpretation

#

Average S is a bit better in that sense

cursive badge
bold terrace
#

Yeah average S is nice, median too but is a bit less smooth in practice

cursive badge
#

(N.B. the cut-off at 21 days is because of the filtered decks Sound was talking about earlier)

bold terrace
#

I'm not entirely sure what the due dimension brings to the table though 😄

#

Could be useful to find "spike" of workload, but those are often the 1-5d stability that are accounted only for one rep in those things

#

I need to redo it, but Stability over Reps is the most depressing things I ever plotted

#

It's ... a declining function

#

The more you rep, the less the average stability

#

It's at that point I thought : Ok now my focus is to find where my workload goes XD

#

My higher lapses, represent 10% of workload for each slice of 5% cards 🥲

#

54% of my workload for 33% of my higher lapse cards 🥲

cursive badge
#

I have not found the "due in" version very useful. It just lets you see a bit more of what is happening in "Future Due".
A version looking into the past would let you see how the card stabilities changed over time (hopefully lots of them increasing).

bold terrace
#

So now I even consider reducing my lapse normal -> hard to 4 and my hard to suspend to 8 lol

#

The overall idea would be : Discover as many easy words as possible, taking the low hanging fruits, and then build the more difficult words based on them

#

I have 1157 cards I never lapsed a single time xD

#

Over 3000

#

So when I see a word like 躱す (To dodge), that I reviewd 62 times in 4 months for a current stability of 4d... I'm like ... ok maybe I should just postpone it

cursive badge
#

One of my worst is 靖 (109 reviews, 23 lapses). I kept on confusing it with 情.

bold terrace
#

Yeah for those I really think brute force is not the key

#

So either you take time to really analyze it, or you just suspend it

cursive badge
#

I finally got it after noticing the issue and spending some extra effort. Now the thing that usually gets me is answering "peace" instead of "peaceful".

ashen light
#

I think you just aren't brute forcing hard enough

unique salmon
#

bro just 500 more reviews and i will remember it bro

ashen light
#

maybe meditate on the card for an hour

#

new preset: brute force. 20 learning steps and they're all 15 minutes

unique salmon
#

Though, finding the exact user among millions of users is a problem

ashen light
#

its me

#

I'm doing it

unique salmon
#

where poisson binomial jake

#

jake

#

we need to cook code

#

where leech detector

bold terrace
bold terrace
#

Mother of all leeches

ashen light
#

one day

bold terrace
#

What were their stability ?

ashen light
#

I'm waiting for dae to sign off on it before I touch code

cosmic hedge
#

my screenshot software crashed 😭

cosmic hedge
bold terrace
#

Nice

#

And the words ?

cosmic hedge
#

i mean theres more than 3 XD

#

精進 is the top one

#

yeah I need a leech detector now tbf 😅

cursive badge
bold terrace
#

Lapse>100 might already give you some insights 😂

sick moth
cursive badge
#

I still think there needs to be a component of looking at study time and passed intervals in a leech detector. The Poisson Binomial stuff is interesting, but I don't think it fully captures "leechness" on its own.

unique salmon
#

Now let's see how Dae reacts to my issue 😅

cursive badge
#

Maybe the solution is to remove all the advanced stuff but leave the APIs for an "Advanced Mode" addon that puts them back in. Then the burden of maintaining the separate UI is offloaded to the Addon maintainers instead of core Anki / Dae.

ashen light
#

thats something I've been thinking on how it would sort of work, like anki could give a handful of hooks into say fsrs logic and addons could use that in a variety of ways

#

imagine: fsrs auto-optimize addon

#

conflcts are your own problem 🍃

#

the issue with this though is that mobile users get fucked

cursive badge
#

My kingdom for a [cross platform addon system] 😂

ashen light
#

if only apple didn't explictly forbid them

cursive badge
#

We just have to get the EU to harass them harder until they submit. I saw something about them having caved on emulators recently because of EU pressure.

ashen light
#

see you in 5 years

cursive badge
#

Just in time for the Svelte migration to be finished! 😂

ashen light
#

🍃

quasi shadow
polar maple
#

or none, expertium wants to remove the evaluate button

quasi shadow
#

Expertium hopes the metric human-readable, so we have RMSE(bins).

quasi shadow
polar maple
quasi shadow
quasi shadow
#

Now you can edit it.

polar maple
#

ok thanks

quasi shadow
#

@unique salmon pretraining decay doesn't work well.

Model: FSRS-rs-dev
Total number of users: 345
Total number of reviews: 10524780
Weighted average by reviews:
FSRS-rs-dev LogLoss (mean±std): 0.3275±0.1511
FSRS-rs-dev RMSE(bins) (mean±std): 0.0479±0.0311
FSRS-rs-dev AUC (mean±std): 0.7184±0.0831

Weighted average by log(reviews):
FSRS-rs-dev LogLoss (mean±std): 0.3496±0.1618
FSRS-rs-dev RMSE(bins) (mean±std): 0.0620±0.0389
FSRS-rs-dev AUC (mean±std): 0.7078±0.0884

Weighted average by users:
FSRS-rs-dev LogLoss (mean±std): 0.3517±0.1638
FSRS-rs-dev RMSE(bins) (mean±std): 0.0640±0.0399
FSRS-rs-dev AUC (mean±std): 0.7071±0.0909

parameters: [0.2027, 1.0535, 2.8078, 15.9455, 6.9865, 0.5577, 2.2141, 0.0069, 1.5326, 0.1223, 1.0383, 1.8223, 0.1175, 0.3022, 2.2859, 0.2162, 3.0055, 0.79, 0.2611, 0.1427, 0.2029]

Model: FSRS-rs
Total number of users: 345
Total number of reviews: 10524780
Weighted average by reviews:
FSRS-rs LogLoss (mean±std): 0.3273±0.1509
FSRS-rs RMSE(bins) (mean±std): 0.0479±0.0309
FSRS-rs AUC (mean±std): 0.7187±0.0838

Weighted average by log(reviews):
FSRS-rs LogLoss (mean±std): 0.3489±0.1607
FSRS-rs RMSE(bins) (mean±std): 0.0615±0.0374
FSRS-rs AUC (mean±std): 0.7079±0.0888

Weighted average by users:
FSRS-rs LogLoss (mean±std): 0.3509±0.1624
FSRS-rs RMSE(bins) (mean±std): 0.0634±0.0382
FSRS-rs AUC (mean±std): 0.7072±0.0913

parameters: [0.216, 1.1977, 2.8019, 15.7018, 6.9865, 0.5514, 2.2311, 0.007, 1.533, 0.1272, 1.0386, 1.8204, 0.1162, 0.2988, 2.2863, 0.2181, 3.0072, 0.8048, 0.2625, 0.1379, 0.1914]
quasi shadow
cosmic hedge
quasi shadow
quasi shadow
#

Good News: FSRS now outperforms SM-17 significantly.

bold terrace
#

People complaining about long intervals, meanwhile my hard deck : 50 consecutive good rating, stability 28 lol

#

4 reviews for my normal deck to get to 25d stability

#

Hard : logloss 0.4353, RMSE 4.34%
Normal : logloss 0.3579, RMSE 3.29%
Merged : logloss 0.4203, RMSE 3.39%

#

But funny enough, a mistake in the normal one is more sanctionned than in terms of interval (but not in reps to recover) in the normal one

unique salmon
#

You should add AUC btw, just for the sake of consistency with the other benchmark

#

Actually, once your PR is merged I'll make another one just to re-write some stuff in readme

quasi shadow
#

If everything goes well, the benchmark will be done tomorrow.

unique salmon
#

@quasi shadow would you move "Evaluate" to the Helper add-on if Dae was ok with it?

quasi shadow
#

Nope

#

Just use the notebook optimizer.

unique salmon
#

It's better than removing it entirely

quasi shadow
#

yeah

unique salmon
#

Then Sound won't complain 🤣

robust hill
#

if you remove my evaluate button

cursive badge
#

Just let Jarrett retire 😂

unique salmon
robust hill
#

yes

bold terrace
#

If R=50%, would be the cost of a "Good" be equal to the cost of a "Again" ?

unique salmon
bold terrace
#

Ah no no in terms of optimization, to reduce logloss and RMSE

#

I was wondering if someone is strange enough to put as DR, 50%

#

What would be all the implications

unique salmon
#

Ah, yes. At R=50% logloss is the same for any grade

bold terrace
#

Could it be that the closer to 50% you are, the less precise FSRS could be then ?

#

For example for DR=60%, the cost of 3 or a 1 would be much more similar, so the optimization result might not be a model that target 60%, but a model "that just doesn't really care" 🤔

#

Of course depends on the subject, getting 50% if the questions are "Yes/No" would be different than "Type the year when this happened"

#

The "endgoal" question being : "Isn't because of that, that higher DR like 90-95% could just be easier to predict for FSRS than lower one like 70%"

unique salmon
#

If your true R is 50% all the time, FSRS would still do its best to adapt to predict that. So it wouldn't "not care", it has to care to accurately predict R=50% all the time

#

It's not like it won't be penalized. If your true R=50% all the time and FSRS predicts 40%, it would be penalized

bold terrace
#

Yep you would get very volatile parameters only if the answers would be a flip coin "Yes / No"

#

(By nature you'd have a R of 50%)

#

For the DR=95% vs 70% though

robust hill
#

what if ur true R is at 2%

unique salmon
unique salmon
#

@polar maple ok man, I didn't want to bother you, but I REALLY hope you can release RWKV soon. My benchmarking article has been in the making for months and I want to finish it 😅

#

https://docs.google.com/forms/d/1Uy8zr9QOS6u-oLVRwVCuQfyFUiSwKxt9pWEvlTGdn9k/viewanalytics
Regarding Evaluate, things may change as I gather more responses, but so far a very strange pattern emerges: there are a lot of users who use Evaluate without understanding where the numbers come from or even what values are sane.
As of right now:

  1. ~60% of users use Evaluate regularly
  2. Only ~15% of users can give a range of log-loss/RMSE values that are good. Out of the remaining 85%, most don't know what values are reasonable at all, not even roughly
  3. ~88% of users don't know the math behind the metrics
  4. Yet only ~40% of users are confident that their Anki routine would not be negatively affected by the removal of Evaluate

That's...strange. It means that a lot of users are using Evaluate on a regular basis without knowing what values are good or how they are calculated, and those users feel like removing numbers that they don't understand would (somehow) disturb their way of using Anki.

@bold terrace thoughts?

#

TLDR: 85-90% users have no idea what the numbers mean or what values are good, but 60% of users use Evaluate regularly anyway

#

I'm not sure how to reconcile these two facts

polar maple
hasty fractal
#

and anki is educational software, we gotta remove all porn!

#

ah wait, u didn't give them enough options. I personally don't fit into any of those groups.

#

I used evaluate a lot at one time (for presets) but have completely stopped using it.

#

I only use it now if a new update comes (stats porn).

cursive badge
# unique salmon I'm not sure how to reconcile these two facts

I think you are still not considering that knowing exactly what the numbers mean / how they are calculated does not matter.
Knowing that number goes down = good is enough for people to:

  • See that FSRS is improving over time
  • Check if splitting / reorganising Presets is worth it

Knowing what range is "good" would be useful, but could be replaced with a simple traffic light Good/Ok/Bad (if we actually know what ranges are "good").

unique salmon
#

So we're still left with a situation where the majority of users don't know what values are good, but keep using Evaluate anyway

cursive badge
#

To be fair I don't know what ranges are technically "good". If I created a new preset and saw it had massively larger values than existing presets it still helps me know something might be wrong, even if I don't know exactly what range I should be expecting.

unique salmon
#

I like the idea of a "health check", but Evaluate is really poorly suited for that. Evaluate is like a health check that tells you "You have fatal organ failure" when it's already too late AND doesn't tell you which organs are shutting down or why

cursive badge
#

In an ideal world I do agree that they would be debugging/advanced values and we would have nice "Health Check" tools.

unique salmon
#

We don't have good tools to diagnose Hard misuse

#

sadge

cursive badge
#

Is it even possible to detect Hard misuse?

#

As in ever

unique salmon
#

I've proposed detecting it based on one of parameters, but Jarrett said it's a bad idea

bold terrace
bold terrace
cursive badge
#

I assumed it was impossible to detect Hard misuse because you are effectively lying and we have no way of knowing the objective truth apart from what the user tells us.

unique salmon
#

Yep

bold terrace
#

However, if there is something I think could be simplified ... or even ... removed... would be to use both logloss and RMSE in the screen. You know lower is better, but what if RMSE goes down but not logloss, etc

unique salmon
#

We can kind of assume that the user is misusing Hard if FSRS decided to set their SInc(Hard) to 1 aka S doesn't increase with Hard, but again, Jarrett said it wouldn't work well

bold terrace
#

Even to this day, something I'm like "Ok now I have 0.40 logloss instead of 0.60, but I get a bigger RMSE by splitting the deck... so what do I chose ? Lower logloss ? Lower RMSE ?"

polar maple
#

have we tried something like treating 'hard' as 'again' and checking if the metrics look better after?

bold terrace
#

ah shit can't ignore those

cursive badge
bold terrace
#

Definitely more something for the addon though ?

unique salmon
#

The best solution is to have a "I use Hard as fail" toggle

#

That's it

#

Simple and no false positives/negatives

bold terrace
#

Wouldn't hurt either

#

at least the user can check how different results would be

unique salmon
#

It would require maintaining two versions of FSRS though, that sucks

bold terrace
#

I'm fucking surprised at 97% FSRS though

polar maple
#

apparently 'Remedy Hard Misuse' just does this Hard -> Again relabelling, why isn't this just automatically done?

cursive badge
unique salmon
bold terrace
#

I'm really curious how much this fit "a more broad" population like the 500k people that watched the "anki introduction" where the guy still tweak SM2 in 2024

robust hill
unique salmon
#

That's like making a survey asking "What's you favorite anime?" and being surprised that 95% of participants watch any anime at all

cursive badge
bold terrace
bold terrace
cursive badge
bold terrace
#

That would be DOPE

bold terrace
#

Wait

#

We could take the 10K user dataset

unique salmon
#

I hope nobody will interpret this article as “It’s ok to use review time to automatically select the answer button for the user”.

  1. Time to answer varies not only between different people but also between different types of material. So Anki will have to estimate what time corresponds to Again-Hard-Good-Easy for this specific user and for this specific material.

  2. average_t(Again) > average_t(Hard) > average_t(Good) > average_t(Easy) is true only for 40% of users.

  3. There will be outliers if the user went to the toilet or got distracted by a phone call or something.

It’s WAY easier to just use self-reported grades. There are a lot of arguments about using 2 vs 4 buttons, and those arguments will likely last as long as Anki itself, but using time as a proxy for the answer button will be worse than either of those options. Using time as a proxy will work reliably only for about 40% of users, will be prone to outliers, and the exact cutoffs will have to be adjusted for each user individually and for different decks.

Compare that to just asking the user to click a button.

bold terrace
#

Force all <X sec to be "hard', all >Y to be "easy", run optimization on it, and see if FSRS fit better ?

#

That would be proof

unique salmon
#

In case you are confused: for example, Again > Hard > Good > Easy means “Average time for ‘Again’ is greater than the average time for ‘Hard’, which in turn is greater than the average time for ‘Good’, which in turn is greater than the average time for ‘Easy’”. But that’s too long, so I just wrote it as Again > Hard > Good > Easy.

polar maple
cursive badge
bold terrace
#

And see how well it improve their rating

#

their prediction*(

#

Also

#

Hard > Good > Easy is the only thing we need to take into account

unique salmon
#

Using both time and grades would be neat, but idk how to do it in practice with FSRS

bold terrace
#

so we have the blue, orange part that match

#

60% user fit perfectly !

unique salmon
#

I tried it once (time + number of reviews done on that day) and it didn't do shit

#

So either I'm dumb or it's just hard to do

bold terrace
#

Well it's true that time taken to answer, is already somewhat captured in the Retention info... So long answers already weight more on the "fail" side

#

And, since people didn't use themselve hard/good/easy, fitting a model on those "faked entries" in the benchmark means if they press "Good" for everything, they won't benefit from tit

#

So we'd need to take users that already respect that pattern

#

can't do that on people not using Hard/Easy consistently in the first place like me

#

But if an Addon was forcing those Hard/Easy, and the user was just pressing "Good", it would solve that

unique salmon
#

Going back to Evaluate

  1. 66% of users who use FSRS use Evaluate regularly
  2. Only 23% of them can give ranges of sane values
  3. Only 21% of them know the math (that's actually surprisingly high, I thought it will be like 2%)
  4. Only 30% of them believe that removing Evaluate will not be bad for them
bold terrace
#

But FSRS optimizer wouldn't have to change at all since the time info is captured in those 3 values

polar maple
#

iirc i did small tests with LSTM and excluding duration information affected log loss by ~0.001 and treating hard = good = easy affected log loss by ~0.003

bold terrace
#

Hope crushed

unique salmon
#

Speaking of which
binary means "hard = good = easy"

#

I'm surprised by how not-shit it is

#

And better than FSRS-5 btw

cursive badge
#

I still suspect time-to-flip could be useful.

unique salmon
#

So FSRS-6 with "pretend that Hard = Good = Easy" is still better than FSRS-5

#

(marginally)

#

Maybe 4 buttons really are placebo

bold terrace
unique salmon
#

Anything that involves the content of the card is a "no"

#

Only soulless numbers and IDs

bold terrace
#

What do you think @polar maple 😄 ?

polar maple
#

yeah we just don't have the info available to us

unique salmon
#

As in "you LITERALLY can't", not "Expertium is telling you it's bad"

bold terrace
#

aah

#

User Card data is not shared ?

cursive badge
#

Not in the benchmark dataset

polar maple
#

the dream is that we get some vocabulary deck data so i can throw it at a nn to let it figure it out

#

use some word/sentence embedding nn to encode the card info

bold terrace
#

Would 74K reviews suffice ?

polar maple
#

prob not

unique salmon
bold terrace
#

Sad, there are few things that could be useful while being anonymous (glossary, front, ...)

#

I even started noting the words I confused with other when I did

#

Imagine this on NN

unique salmon
cursive badge
bold terrace
unique salmon
#

Deck1::Subdeck2::NuclearCodes

bold terrace
#

For example if you Remove "Screenshot" and "Sentence" in mine, you won't see my dirty talk

polar maple
cursive badge
#

I'm assuming Dae does not want to hand check 10k users worth of decks for sensitive data.

bold terrace
ashen light
#

just go on r/anki and ask for volunteers

#

won't get 10k but hey you might get 100

bold terrace
#

I'm sure 90% of people don't even mine their own card

ashen light
#

make the fsrs helper addon have a button to upload a deck to allow science to be done on it

bold terrace
#

they just download a shared core deck

cursive badge
#

Have an "I donate my decks to science" setting inside Anki ;p

ashen light
#

insidious: make the fsrs helper addon just do that with no prompting 🍃

bold terrace
#

With shared decks, infering card relationship would even be easier since we'd have huge amount of data

ashen light
#

you'll kill your reputation but who cares when you have all that fresh real data

bold terrace
ashen light
#

yeah but they ahve money

bold terrace
#

Because they didn't care about their reputation first

ashen light
#

I mean when you control the flow of info you can just hide flows that make you look bad

#

¯_(ツ)_/¯

ashen light
#

what I'm hearing is we need fsrs incorporated first, then we can use it to steal all the decks 🍃

#

hahah you weren't even memeing about nuclear codes

bold terrace
cursive badge
#

I really want WaniKani to donate their dataset to science.

#

They have a massive dataset and all the "cards" already have nice links showing how they are related.

bold terrace
#

Never used it

#

But when I read that I'm like maybe it's not too late

cursive badge
#

Unfortunately the WK SRS is terrible 😦

#

They just use fixed intervals. Not even SM2 levels of adapting to the user 😦

unique salmon
#

Excerpt
p_recall,timestamp,delta,user_id,learning_language,ui_language,lexeme_id,lexeme_string,history_seen,history_correct,session_seen,session_correct 1.0,1362076081,27649635,u:FO,de,en,76390c1350a8dac31186187e2fe1e178,lernt/lernen<vblex><pri><p3><sg>,6,4,2,2 0.5,1362076081,27649635,u:FO,de,en,7dfd7086f3671685e2cf1c1da72796d7,die/die<det><def><f><sg><nom>,4,4,2,1 1.0,1362076081,27649635,u:FO,de,en,35a54c25a2cda8127343f6a82e6f6b7d,mann/mann<n><m><sg><nom>,5,4,1,1 0.5,1362076081,27649635,u:FO,de,en,0cf63ffe3dda158bc3dbd55682b355ae,frau/frau<n><f><sg><nom>,6,5,2,1 1.0,1362076081,27649635,u:FO,de,en,84920990d78044db53c1b012f5bf9ab5,das/das<det><def><nt><sg><nom>,4,4,1,1 1.0,1362076081,27649635,u:FO,de,en,56429751fdaedb6e491f4795c770f5a4,der/der<det><def><m><sg><nom>,4,3,1,1 1.0,1362076081,27649635,u:FO,de,en,1bacf218eaaf9f944e525f7be9b31899,kind/kind<n><nt><sg><nom>,4,4,1,1 1.0,1362082032,444407,u:dDwF,es,en,73eecb492ca758ddab5371cf7b5cca32,bajo/bajo<pr>,3,3,1,1 1.0,1362082044,5963,u:FO,de,en,76390c1350a8dac31186187e2fe1e178,lernt/lernen<vblex><pri><p3><sg>,8,6,6,6 0.75,1362082044,5963,u:FO,de,en,7dfd7086f3671685e2cf1c1da72796d7,die/die<det><def><f><sg><nom>,6,5,4,3 0.888888888889,1362082044,5963,u:FO,de,en,35a54c25a2cda8127343f6a82e6f6b7d,mann/mann<n><m><sg><nom>,6,5,9,8 0.8,1362082044,5963,u:FO,de,en,0cf63ffe3dda158bc3dbd55682b355ae,frau/frau<n><f><sg><nom>,8,6,5,4 0.8,1362082044,5963,u:FO,de,en,84920990d78044db53c1b012f5bf9ab5,das/das<det><def><nt><sg><nom>,5,5,5,4 1.0,1362082044,5963,u:FO,de,en,56429751fdaedb6e491f4795c770f5a4,der/der<det><def><m><sg><nom>,5,4,5,5 1.0,1362082044,5963,u:FO,de,en,1bacf218eaaf9f944e525f7be9b31899,kind/kind<n><nt><sg><nom>,5,5,3,3 1.0,1362082130,77,u:dDwF,es,en,73eecb492ca758ddab5371cf7b5cca32,bajo/bajo<pr>,5,5,1,1 0.0,1362082194,150,u:FO,de,en,84920990d78044db53c1b012f5bf9ab5,das/das<det><def><nt><sg><nom>,10,9,1,0 1.0,1362082194,150,u:FO,de,en,35a54c25a2cda8127343f6a82e6f6b7d,mann/mann<n><m><sg><nom>,15,13,1,1

#

No idea what some of these mean, but whatever

polar maple
#

what could p_recall be?

unique salmon
#

No, that's the easiest one 🤣

polar maple
#

it's not 0/1

#

did they measure over a session or over a set of users or something

#

for the same item

#

or did they include their HLR predictions into the dataset itself

unique salmon
#

Per session, it seems

polar maple
#

unlucky

#

per session data is not useful

unique salmon
#

0.888888888889,1362082044,5963,u:FO,de,en,35a54c25a2cda8127343f6a82e6f6b7d,mann/mann<n><m><sg><nom>,6,5,9,8
If session_seen=9 and session_correct=8, that gives us 0.888888888889

#

So yeah, checks out

polar maple
#

yeah this isn't usable for us

unique salmon
# polar maple or did they include their HLR predictions into the dataset itself

Btw, I find it funny that Duolingo reports lower AUC on their own dataset than we on our
https://github.com/open-spaced-repetition/srs-benchmark
HLR 3 0.41±0.012 0.105±0.0030 0.633±0.0050
https://github.com/duolingo/halflife-regression/blob/master/settles.acl16.pdf

GitHub

A benchmark for spaced repetition schedulers/algorithms - open-spaced-repetition/srs-benchmark

GitHub

Contribute to duolingo/halflife-regression development by creating an account on GitHub.

polar maple
#

maybe they have a bug in their implementation

#

jk but 0.538 is very low

unique salmon
#

I'm not joking when I'm telling Jarrett to contact Duolingo and just straight up tell them "HLR sucks, use FSRS instead"

south lodge
#

Is there enough outcome reporting to make that claim convincing?

unique salmon
#

Considering that we have a dataset with ~700 million reviews and Duolingo thought that 13 million reviews was good enough for their paper - yes

south lodge
#

Review count and external testing results might not necessarily correlate with each other

bold terrace
#

Well, maybe right now US-China relationship are not that great for international hiring though

quasi shadow
polar maple
#

50.1% wow

quasi shadow
#

Should I remove L2 regularization when changing the default value?

#

😅I guess we don't need to check the distribution.

#

After I change the default value of w[11] from 1.8 to 4.0, the median of optimized value of w[11] is 3.77.

#

Notice that median values of w[12] and w[13] aer also changed significantly.

#

When w[12] increases, S_fail decreases.

#

When w[13] decreases, S_fail decreases, too.

#

When w[11] increases, S_fail increases.

#

So, in some degree, the changes of w[12] and w[13] compensate the change of w[11].

quasi shadow
#

They uses the in-day correct rate of a word as the P(recall).

#

It assumes the trials of the same word in the session are iid but it's not true.

#

This one is also problematic when p = 0 or 1.

#

😅 I don't know whether they were aware of these problems. But the paper was accepted. That's why I thought the peer review makes nonsense when the peer knows nothing about the niche domain.

south lodge
#

Note that to prevent computational overflow and under-
flow errors, we bound ˆp_Θ ∈ [0.0001, 0.9999] and
ˆh_Θ ∈ [15 min, 9 months] in practice.
(fwiw, in A.3)

quasi shadow
unique salmon
quasi shadow
#

the result is similiar

west whale
hasty fractal
#

I thought Jarrett will stop at FSRS-5 lol.

#

Lunar new year is long gone, Jarrett is still here.

bold terrace
#

Can't wait for it to reach Anki 🥲 Good job

quasi shadow
#

😅 That's why I planned to release FSRS-5.5.

#

But I accepted more improvement ideas so we have FSRS-6.

#

@cosmic hedge I have a problem.

#

After I modify the Easy Days Config in the Options screen, the Easy Days Config in the Simulator screen doesn't keep sync with it.

clever cargo
# quasi shadow <@388069992660205588> I have a problem.

try this

diff --git a/ts/routes/deck-options/SimulatorModal.svelte b/ts/routes/deck-options/SimulatorModal.svelte
index 1b587c985..afd0d1eb2 100644
--- a/ts/routes/deck-options/SimulatorModal.svelte
+++ b/ts/routes/deck-options/SimulatorModal.svelte
@@ -178,7 +178,7 @@ License: GNU AGPL, version 3 or later; http://www.gnu.org/licenses/agpl.html
         );
     }
 
-    let easyDayPercentages = [...$config.easyDaysPercentages];
+    $: easyDayPercentages = [...$config.easyDaysPercentages];
 </script>
 
 <div class="modal" class:show={shown} class:d-block={shown} tabindex="-1">
quasi shadow
#

It works!

#

Thank you.

cosmic hedge
robust hill
#

when fsrs 6 coming

#

how do i put it inside my anki

lapis hearth
robust hill
#

noooooo

lapis hearth
#

But dae takes 10 working days to respond to smth

#

and 10 more days to make a new build

#

Could someone bring this to daes attention.

quasi shadow
#

My colleagues have finished the refactoring of our App's scheduling module recently, so I will take over the rest of work (refactoring the long-term scheduling algorithm). So I won't have time to improve FSRS in the next several months.

robust hill
#

damn

#

went out with a bang i see

unique salmon
# cosmic hedge I remember removing this on purpose for some reason https://github.com/ankitect...

Btw, other than CMRR, there's this: https://forums.ankiweb.net/t/desired-retention-ui-overhaul/57678/33?u=expertium
But it hasn't got an explicit ok from Dae

And there's also this: https://forums.ankiweb.net/t/ideas-to-make-deck-preset-interactions-more-clear/58773/5?u=expertium
Which also hasn't got an explicit ok from Dae
FeelsBadAnki

robust hill
#

dont worry

#

@lapis hearth will replace you

lapis hearth
#

If I could I would

#

But I dont

unique salmon
#

The realest answer (from my survey on Evaluate)

robust hill
#

also question

#

why doesnt hard count as a good for first time cards?

#

as fulfilling one of the learning steps

#

curious why it was that way

unique salmon
#

Because learning steps are shit

robust hill
#

alright

unique salmon
#

I've said this many times - the whole thing with learning steps shouldn't exist in the first place

robust hill
#

how should it work then

unique salmon
#

It's a mess

unique salmon
robust hill
#

i see

#

well

#

doesnt that mean when u learn new things u have to be on anki the whole day

#

if u learn it in the morning

#

does retention decrease equally

#

e.g. over 8 waking hours vs 8 sleeping hours

unique salmon
#

That's a very good question. I don't know 🤷‍♂️

robust hill
#

no and the reason is

#

sleep is when memory consolidation begins (oversimplified)

#

boom cooked

#

unfortunately i cannt bring myself to do new cards at the end of the day 💔

#

billions must nap

robust hill
#

@unique salmon after starting a new deck

#

with lets say 30 new cards a day

#

when would you recommend to start the first optimization

#

i use default fsrs parameters

#

havent optimized yet

unique salmon
#

Whenever you want, really

robust hill
#

alright

#

sounds good

unique salmon
#

😭

#

I swear, if FSRS only had one toggle, people would still find ways to not use it properly

#

So this guy didn't realize that optimization is a thing AND he also didn't realize that he can control interval lengths by adjusting desired retention

ashen light
#

maybe the problem is that people have to hit secret hidden buttons

unique salmon
ashen light
#

its in a corner with like 50 other things to also look at

unique salmon
#

Like, I can see not realizing that DR affects interval lengths if you have never changed DR, but not realizing that optimization is a thing...

#

We need an interactive tutorial so bad, man

ashen light
#

no one would use it

#

or, the type of person who would is also the type to not have this problem in the first place

#

whats needed is basically an exam when you open anki the first time, you gotta answer a bunch of questions that shows you read the manual

#

only then can you use the program

unique salmon
#

kek

#

Just make the interactive tutorial unskippable

ashen light
#

have fun implementing such a thing

soft skiff
#

Hi, guys, how many new cards can i learn every day, which is the upper limit of human cognition?

unique salmon
robust hill
#

lol

#

thats amazing