#Handball Match Goal Predictor

1 messages · Page 1 of 1 (latest)

dim phoenix
#

Hey guys. Few weeks ago I started working on "something" which's goal is predict the number of goals in a handball match. It started as something for PRE Game, meaning, predict before the game starts, but I left it, since I was literally getting averages from past results, and that clearly wouldn't work (I'll probably work on that when I learn more about ML).

So, now I'm trying to come up with "something" that can predict the number of goals of an in-play match. I've been talking with @rolandodata and @Big Bux Chungus
Here you have a quick access to some of the conversations:

  • #💬・general-discussions message
  • #💬・general-discussions message (large conversation, take your time scrolling)

Okay, so, I'm a bit confused on what I want, it looks like I just need to try different things, tune them, and see if they work, so maybe instead of thinking what's the thing that can work the best, just start with something and see if it works. Having said this, what should I start with?

Like, what are the different things I could try? I'm lost to be honest, I also don't have any knowledge regarding ML, it'll be my first project. I'm a CS student but I'm only finishing second year.

Help is SUPER appreciated.

#

PD: Right now I'm trying to plot the line that describes the goals, and I'm also trying to correctly calculate poisson predictions for both over and under.

For the over prediction dataframe, I'm trying to calculate the number of goals at which the probability of seeing less goals is 1 - expected_winrate

While for the under prediction dataframe,I'm trying to calculate the number of goals at which the probability of seeing more goals is 1 - expected_winrate

But I'm not seeing success in the results of my dataframes, might need to dig a bit deeper and see what's going on.

#

Anyways, this PD is more like a comment to what's the main goal (which I describe in the first message).

fading flare
#

Maybe I misunderstand you, but are you trying to use poisson to make a range of values? Like 95% confidence interval?

dim phoenix
#

This is what I got

#

Makes sense?

#

but idk if this will work long term... I want to learn how to build something good.

fading flare
#

No, the distribution is too narrow at the beginning

#

It should be really wide at the beginning and then start narrowing as you get more and more information

dim phoenix
#

Anyways, this is what I have now, and again, I doubt it'll perform well.

So I would like to hear from you, what can I do?

I still need to read the book you recommended, couldnt start yet, uni takes me a lot of time, but I definitely can find some time.

dim phoenix
#

This is a different example, clearly not on point, too simple to be good.

dim phoenix
#

I suppose, use MAP with a good prior as you said, is the solution? You have the prediction take into account matches where you start with really small number of goals?

fading flare
#

No no, what I mean is that your calculation must be wrong because it is too narrow in the beginning

#

And it doesn't change in width at all

#

You are doing some sort of f(x)+- C

dim phoenix
#

yeah it looks like that

fading flare
#

But it should be roughly f(x, N) +- C/sqrt(N)

dim phoenix
#

strange... I'm using scipy.stats.poisson

#

I'm working in python

#
time is an int between 0 and 3600
goal_rate = (cumulative_goals_at_time / (time if time > 0 else 1))
total_predicted_goals = goal_rate * match_duration
over_pred = poisson.ppf(prob_for_over, total_predicted_goals)
fading flare
#

It's a wrong model I think

dim phoenix
fading flare
#

Think about it, at the very very end, the range should be zero

#

But yours isn't

dim phoenix
#

yeah

#

you are right

#

So lets imagine I fix this. I'm still left with something that's far from enough

fading flare
#

You are left with something that works

#

compared to what you have

dim phoenix
#

I mean, what I have rn doesnt work, so whatever works will be way better, but still not enough yet.

#

idk.. I want to have something solid good

fading flare
#

Suppose time t1 has passed and you expect the match to run until t2. You have observed n events so far. You assume that P(N events in dt) = Poisson(lambda * dt). You can fit your poisson distribution using MLE (fit function in scipy). Then compute n + Poisson(lambda * (t2 - t1))

#

Lol first build a baseline.

dim phoenix
fading flare
#

One of the rules of ML is to always start from a baseline model.

#

Even if it is the absolute dumbest model.

#

Send me one set of goal times

dim phoenix
#

ok

#

want a pastebin?

fading flare
#

sure

dim phoenix
#

to avoid spam

fading flare
#

these are 95% intervals using MLE

#

basically the range of expected number of home/away goals based on the information you have up to that point

#

like if 5 minutes in I told you the number

#

what would be your guess

#

for the final score

dim phoenix
#

I see

#

understood.

#

Okay I want to understand the whole process behind this and then come back to it.

#

Like I'll read your code and once I get what's going on (in code, I do understand what you explained), I'll talk again

#

wow why didnt I join the ml world before 😭

#

Thanks a lot for your help!!!!!!!

#

is it intelligent to have a prediction for home and another for away?

#

or is it the same

#

my guess, its the same

#

since you are looking for events, and not who was the author of such event

fading flare
#

I don't actually know what those two numbers are exactly.

#

I assumed it was for 2 teams

dim phoenix
#

Yeah, so I sent you a list of dicts which gave the home score and the away score. You separately predicted for home and for away.

What if I just gave you the cumulative score? Will the prediction be the same?

fading flare
#

I don't know the domain of sports at all (haha)

fading flare
dim phoenix
#

cumulative is home + away

#

{'timestamp': 367, 'current_home_score': 3, 'current_away_score': 1}

cumulatice is 3 + 1 = 4

#

the goal is to predict the number of goals at the end of the match. In total

fading flare
#

Ah hmmmm

#

Well, it would make as much sense to model the sum as a poisson as it does for each individual component.

#

One rate of goals instead of two.

#

This is 2 chapters of the book I recommended btw.

#

Since it uses conjugate priors for MAP.

dim phoenix
#

alright, time to read then

dim phoenix
#

bro understanding whats going on in the code is wild 😭

dim phoenix
#

I've a few questions.

  • Is it intelligent to reset alpha en beta when the second half stats?
  • Do we want to use past data to (I guess) somewhat influence the value of lambda? this would be MAP, right?
  • Does the initial values for ALPHA and BETA influence the prediction as time passes? I'm interested in predicting accurately for the second half more specifically - or as the second half develops. If so, does it make sense to calculate the ALPHA and BETA using past data?
  • When picking the X% credible intervals, you are expecting to be successful at your predictions, X% of the times?
  • Will it help to make predictions for team 1 and team 2 separately? Maybe if one team is superior, we can expect more goals from them? Perhaps we can look at past data and be more precise in our predictions?
#

Wow this is a lot at once. Super interesting, SUPER.

fading flare
#

Do we want to use past data to (I guess) somewhat influence the value of lambda?
If you mean past matches, then yes, that would be a great improvement. Good teams presumably score more in all matches they play.

this would be MAP, right?
Kind of, MAP just means that you are using some kind of prior to influence your predictions, not a specifically the past. Could be your expert knowledge. Also, in the last example I posted I am actually using something even better - conjugate priors + bayesian inference.

dim phoenix
fading flare
#

are beta and alpha the parameters that comtrol the shape and rate of the gamma distribution?
Yes

If so, assuming that each half could halve a different pace, is it intelligent to reset them then the second half starts?
Well... that's complicated. Only if you think that information about the first half doesn't influence your predictions about the second half at all.

#

Does the initial values for ALPHA and BETA influence the prediction as time passes?
Yes, though a lot more at the beginning when you don't have any other information.

dim phoenix
fading flare
#

When picking the X% credible intervals, you are expecting to be successful at your predictions, X% of the times?
Yes.

dim phoenix
fading flare
#

Will it help to make predictions for team 1 and team 2 separately?
Yes.
Maybe if one team is superior, we can expect more goals from them?
Definitely.
Perhaps we can look at past data and be more precise in our predictions?
Yes.

dim phoenix
#

ahaha cool

#

So apart from learning the theory behind all of this, which Ill learn in the book you shared, where can I learn how to implement my thoughts?

#

Like once I understand what I want, where can I learn where to implement it.

fading flare
#

The issue is that there is like 10 different libraries you might need to be using (not all at the same time)

#

So there is not a single place where you can learn all of them.

dim phoenix
#

Gotchu

fading flare
#

The search terms are "probabilistic programming", "bayesian modeling"

#

PyMC3, Pyro (python library from uber), PyStan (and Stan itself) are all good tools

dim phoenix
#

Like I started to understand plain poisson distributions wont be enough. Usually, small number implies a "smaller" prediction, when most of the times you see how later in the game, the number of events compensate. Makes sense?

fading flare
#

Mmm not sure what you mean. Poisson can have different mu/lambda parameter depending on how far you are into the game.

dim phoenix
#

I guess here is where variance comes up? Thats the parameter that accounts some randommess, right?

fading flare
#

Again not sure what you mean. Basically, if you have a fixed rate of events, Poisson is the right distribution to use. But we don't actually know the rate of events, we can only estimate it. You can estimate it by dividing #goals / time but that's inaccurate in a couple ways - when time is low, this will be biased towards zero; you are not taking into account the inherent uncertainty (variance) that you have when estimating the rate.

#

When you take both of those issues into account, you have to come up with some sort of P(rate | data), and then calculate P(#remaining goals | rate)

#

Extra "variance" comes from the P(rate | data) term, the second term is technically still Poisson.

dim phoenix
dim phoenix
fading flare
#

Bookmakers take into account previous games and even for new teams they probably model it with some prior.

#

You were using an estimator for the rate that was probably biased towards zero. Plus the width of your distribution never changes, so there might be more critical issues with the code.

fading flare
# fading flare

Note how here the center of the shaded regions doesn't start close to zero.

dim phoenix
#

Right, that was wrong, but I wanted to show how maybe the rate is small at the beggining, so our prediction might be small also, when we should take into account that this might reverse. But now that I think of it, it sounds as MAP. Right?

fading flare
#

Yes

#

Like if I told you that it took players 3 minutes until the first goal, would you immediately say "the rate is 1/3 goals per minute so we will see exactly 20 goals"

#

This is obviously wrong but there is more than one issue with this statement

#

"the rate is 1/3 goals per minute so we will see Poisson(20) goals" is still wrong

#

because the first part is false - you don't really know that the rate is 1/3, that's just the argmax_{rate} P(data | rate) (MLE) estimate for a particular model of the world that you have (fixed rate of goals over time)

#

rate could be much higher and you just weren't lucky enough

#

or lower and you just happened to see the first goal early

#

plus MLE is just plain wrong for this kind of downstream calculation - you don't maximize f(g(x)) by computing f(argmax g(x)). something similar, except more complicated is happening here as well

#

finally your model can be very wrong to begin with. maybe the very beginning of the game works completely different from the rest of the game

#

Anyway, the point is that the statement "the rate is 1/3 goals per minute so we will see 20 goals" is packed full of dozens of different assumptions and small logical errors. You can correct for most of them. Bookmakers probably do exactly that.

#

It is still useful - it is a crude approximation that gives you a ballpark estimate quickly without much thought. It will be order of magnitude correct.

#

And btw, even MAP is a crude approximation and has almost all of the same issues. It just takes into account the prior, calculating argmax_{rate} P(rate | data) = argmax_{rate} P(data | rate) P(rate), but the rest of the problems doesn't disappear.

#

The only real way to do everything correctly and in a scalable way is to use probabilistic programming.

#

Conjugate priors method works only on very small and simple problems. The nice thing about it is that it is analytically tractable.

#

It has a book as well

#

I suggest you borrow it from your library or support the author by buying it :)

dim phoenix
dim phoenix
#

I want to start coding but I dont even know what I have to do tiredorange

fading flare
#

Then iterate.

fading flare
dim phoenix
#

understand it, then apply MAP, and start from scratch

#

makes sense

#

okay thanks once more prayge really I super appreciate your help

#

super appreciate it

fading flare
dim phoenix
#
def set_parameters(self, home_data: list[list[dict]], away_data: list[list[dict]]) -> None:

    # Process home data
    home_goal_rates = []
    for match in home_data:
        if len(match) == 0:
            continue
        total_goals = match[-1]['score']  # Final score in the match
        total_time = match[-1]['timestamp']  # Assuming match starts at time 0
        if total_time == 0:
            continue  # Avoid division by zero
        goal_rate = total_goals / total_time
        home_goal_rates.append(goal_rate)

    # Compute mean of home goal rates
    if not home_goal_rates:
        raise ValueError("No valid home goal rates available to set parameters.")

    mean_rate_home = np.mean(home_goal_rates)

    # Compute variance of home goal rates
    if len(home_goal_rates) > 1:
        variance_rate_home = np.var(home_goal_rates, ddof=1)  # Sample variance
    else:
        variance_rate_home = 1e-6  # Small positive value to avoid division by zero

    # Estimate prior parameters for home team
    self.beta_prior_home = mean_rate_home / variance_rate_home
    self.alpha_prior_home = mean_rate_home * self.beta_prior_home

    # Process away data
    away_goal_rates = []
    for match in away_data:
        if len(match) == 0:
            continue
        total_goals = match[-1]['score']
        total_time = match[-1]['timestamp']
        if total_time == 0:
            continue
        goal_rate = total_goals / total_time
        away_goal_rates.append(goal_rate)

    # Compute mean of away goal rates
    if not away_goal_rates:
        raise ValueError("No valid away goal rates available to set parameters.")

    mean_rate_away = np.mean(away_goal_rates)

    # Compute variance of away goal rates
    if len(away_goal_rates) > 1:
        variance_rate_away = np.var(away_goal_rates, ddof=1)
    else:
        variance_rate_away = 1e-6  # Small positive value to avoid division by zero

    # Estimate prior parameters for away team
    self.beta_prior_away = mean_rate_away / variance_rate_away
    self.alpha_prior_away = mean_rate_away * self.beta_prior_away

    print("Mean Rate Home:", mean_rate_home)
    print("Variance Rate Home:", variance_rate_home)
    print("Mean Rate Away:", mean_rate_away)
    print("Variance Rate Away:", variance_rate_away)
#

I'm trying to code a function to get the values for prior alpha and beta, but I'm getting extremely high alphas and betas, at least when I only have 1 match of prior data.

Mean Rate Home: 0.008611111111111111
Variance Rate Home: 1e-06
Mean Rate Away: 0.006944444444444444
Variance Rate Away: 1e-06


print(mle.alpha_prior_home) # 74.15123456790124
print(mle.alpha_prior_away) # 48.2253086419753
print(mle.beta_prior_home) # 8611.111111111111
print(mle.beta_prior_away) # 6944.444444444444

This doesnt seem right?

#

I read that for these cases I could reconsider the statistical model (use the Poisson Distribution directly) or implement minimum mean and variance values, but not sure what's theoretically right.

#

I do understand why the variance would be extremely low (it should actually be 0) when you only have 1 match of data, but it could even be super small with 2 matches of data, so perhaps I want to change the way I calculate these values?

dim phoenix
#

Or should I incorporate prior knowledge in a bayesian framework (o1 recommendation)

fading flare
dim phoenix
#

Been investigating about Bayes theorem, I think I somewhat understand it. It sounds like MAP, right?

#

It's using prior data, to come up with a hypothesis?

dim phoenix
# fading flare And that

So what can I do? I do want to have a prior, but when variance tends to 0, I get super high values for both beta and alpha.

fading flare
dim phoenix
#
beta = mean / variance
alpha = mean^2 / variance
#

It's super tricky. If you give one match of prior, variance will always be so small that beta and alpha will be extremely high, but even with two or three games of prior you can find yourself in that situation.

#

Yesterday, I was stuck all day trying to figure out how to determine the values for alpha and beta.

dim phoenix
#

not using confidence interval yet, not sure if it's needed? idk...

#

Do you see something off? I can share code.

dim phoenix
#

Is a confidence interval required?

Hmmmm

dim phoenix
#

Another example of prediction with prior, bookmaker was way off.

#

What should I do next?

fading flare
fading flare
dim phoenix
#

I was thinking how this would perform, and thought of coding the following strategy:

  • if timestamp = 0 and our prediction is X% away from bookmaker, place a bet
  • if there are no, or almost no goals in a 2-3 minute window, and the bookmaker line offer dropped, and our prediction is higher than the book's, bet on over
  • if there are lots of goals in a 2-3 minute window, and the bookmaker line increase, and our prediction is lower than the book's, bet on under

makes sense? idk though...

#

😄

fading flare
#

Well maybe you should come up with some numeric metrics for being "better than the bookmaker"

#

E.g. how far are you from the true answer vs bookmaker

#

Something like "medium of absolute error at minute 3 across all games"

#

Since you are dealing with a sophisticated opponent that can also learn, you should probably setup realistic backtesting. Do you have dates for all matches?

#

Sort all matches by date, then for a match at time t0 use only information from time t < t0 to train your model (e.g. any priors)

#

You can also use "will I win money with this strategy" as your success metric. Compute the probability of a successful bet across all games

#

As for improvements to the model, maybe you could model some more subtle effects like teams scoring a bit less when away, teams scoring less against stronger opponents, differences in the number of goals as game progresses (either due to players getting tired or riled up), "non-linear" effects where a team is more likely to score after a short series of failures but less likely after a long. Effects of team composition. Etc

dim phoenix
#

Hey, I'm back.

Been investigating more, talking with a guy that already works with sports prediction, and well as time passes I'm learn more and more. Anyways, still lots of things to learn.

So, I was somewhat happy with my initial way of predicting the goals for a match, but it wasn't really a model, I calculated BETA & ALPHA at the beginning of the match, and then I just updated it as I observed data. So, I didn't backtest the code yet, in order to say if it'll work or not, but what I can say is that sometimes the line offered by the bookmaker was like 10% (or more) away from my code's prediction. So well this could be an opportunity to take advantage of.

Well this leaves me again, talking about the different scenarios I should hardcode I guess, so you code your strategy, which could you indicators like:

  • predictions of book and code being X% away at timestamp 0 (initial of match)
  • no goals or lots of goals in a time window of a to b minutes

So I believe it's not a bad idea to do this (hardcode strategies), but only because I haven't heard of other ways of using the predictions of my code.

And so I was also thinking about using prior data more in depth, not only to calculate BETA and ALPHA for my Gamma distribution. My friend was mentioning that some leagues have slow first halfs and faster second halfs, perhaps it's possible to model goal rates across different minutes of the match? Get lets say 10 matches for home, model the goal distribution across the different minutes of the match and maybe when our prediction differs an X% from the bookmaker prediction, you take advantage of the opportunity.

#

I also had other questions like:

  • Do recent matches have more influence on the current match? How can I code weights 🤷‍♂️
  • How many matches should I use as prior data? Why?
  • Assuming I want to predict for each team separately: teams will score more or less goals depending on the quality of the opponent, how can I take this into account when predicting?
#

And there was something else I wanted to ask but I forgot lol

fading flare
fading flare
#

How many matches should I use as prior data? Why?
All of them. Why would you throw away information.

#

Assuming I want to predict for each team separately: teams will score more or less goals depending on the quality of the opponent, how can I take this into account when predicting?
Good question. Maybe take a look at TrueSkill paper and https://www.youtube.com/watch?v=veiLCvcLIg8 (he talks about modeling scores for teams a bit)

fading flare
#

It is very unlikely that a team suddenly becomes much better than it was and unlikely that it becomes much worse.

#

And then goal_rate(T1, T2, t) = f(skill_T1(t), skill_T2(t)) where f is some function that monotonically increases with skill_T1 and monotonically decreases with skill_T2

#

it's not straightforward to write something down immediately

#

you could model early game / midgame / endgame effects as some sort of slowly varying function as well

#

or as 3 coefficients

#

as a multiplier maybe, so you know it is close to 1 but maybe a bit more goals in the first 3rd of the game

#

this multiplier or slowly varying function can be shared across all games as well

#

just learn either Stan or Pyro

#

this stuff will become progressively more difficult to do manually, so you need the right tool here for complex modeling

#

you also need a hierarchical model ideally so that you can easily add new teams and make sane predictions for them

fading flare
#

Not... the most amazing answers from chatgpt but some good search terms and some good ideas

#

this reads a bit like a summary but it's good

dim phoenix
dim phoenix
#

Oh btw! Important note! Maybe you come up with an idea because of what I'll say.

Bookmakers have a prediction when the match starts, lets say 60 goals. Well 60 goals in a 60/m match is a rate of 1 goal per minute. So lets say in 10 minutes we have 20 goals (double the rate), the line offered by the book now will be current_goals + remaining_time * initial_expected_goal_rate, so for my example it would be: 20 + 50 * 1 = 70

#

Just a comment, I didnt know about this hehe

#

Read the chatgpt convos, it's so much stuff, I would try it all but I dont have time for all of it rn, I will in around a month! So in the meantime where should I specialize?

I think I'll first backtest what I've now.

#

Please dont delete the chagpt convos prayge

fading flare
#

as some sort of gaussian process / random process

dim phoenix
#

is this in the book you shared?

#

I think I'll leave this thread dormant till I'm done reading the book, to be honest, I havent even started.

fading flare
#

It has a chapter on Gaussian Processes

dim phoenix
#

I'm trying to work on something I dont exactly understand.

fading flare
#

it has soooo much good stuff tbh

dim phoenix
#

what do you think? I should stop with the coding process a bit and focus more on understanding everything behind modelling and stuff, to decide what might be best for me later, right?

fading flare
#

It depends whether you are willing to sacrifice time to learn things properly

#

I would start with the first chapter and see how much of it you understand

dim phoenix
#

yes I'm willing to sacrifice my time

#

I "SOMEWHAT" always did

fading flare
#

then I would make a detour and learn about automatic differentiation and implement it from scratch

dim phoenix
#

like rn I might be able to read a bit per day for 2-3 weeks, and when december starts I'm done with my exams and I've 4 months of vacation

fading flare
#

the reason being is that it will be sooo much helpful to solve real problems quickly

dim phoenix
#

detour?

#

automatic differentiation like calculus? isnt there a lib that can differentiate functions for you?

fading flare
#

yes

#

yes there is

#

but knowing how it works is incredibly useful

dim phoenix
#

alr

#

yeah I'm willing to do everything tbh

#

I enjoy learning

#

so I'm not sacrificing much tbh shrugCat

fading flare
#

maybe I'll join you in reading the book lol

dim phoenix
#

haha nice

fading flare
#

I want to compile a solution's manual

dim phoenix
#

I wanna get it physically but it's kinda expensive

#

150usd in my country

fading flare
dim phoenix
#

double the us price

fading flare
#

my solutions to the first chapter's problems

dim phoenix
fading flare
#

lol

dim phoenix
#

well

#

lots of things to learn

#

can start slowly at least

#

and in vacations, 24/7 sunglas

fading flare
#

the reason to learn AD early is that it's so amazingly useful. not just for deep learning, but for everything. same thing with some other algorithms like bayesian optimization or markov chain monte carlo but to a slightly lesser extent

#

and an implementation can be written in any language in < 100 LOC

#

it's a detour in a sense that it has no direct relevance to what you are doing right now haha

dim phoenix
#

like AD is literally coding functions that can differentiate a function?

#

I'm saying literally as if it was easy lol

fading flare
#

yes

dim phoenix
#

is there a marketplace for knowledge?

fading flare
#

There is a lot of things you can do like differentiate with respect to its arguments, only some arguments, find second derivative with respect to an argument, etc. If the arguments are changing with respect to some other parameter t, you can calculate the derivative of the result with respect to t. You can even find symbolic representation for all of these.

dim phoenix
#

maybe like an injection or a pill, you know

fading flare
#

I wish

dim phoenix
dim phoenix
#

While working on a simple backtesting module, to read different metrics, see how bad my work is, I sometimes plot matches that other guys with models send.

So there's this guy who I know is good.

#

And for this match, after minute 40, when the book over/under offer was at 71.5/72.5, he said there was value in the under.

Right here.

As you can see, first off my model would initially think there was value on the over, since my prediction is like 5 points higher than the book's prediction. But leaving that aside for a moment, what do you think this guy's model saw around that minute?

The only thing I can imagine is maybe lots of goals, more than usual, but that wouldnt tell you there is value on the under, since there is randomness in sports and you could have matches where you have much more goals than what you expected.

I dont know. What do you think?

fading flare
#

Maybe he is just sampling

#

It is odd how many predictions he makes in sudden bursts

#

Maybe he has some information about the game you don't?

dim phoenix
dim phoenix
#

my code is 💩 , I'm not implementing models as such, I'm not training a model and cross validating, etc, I'm simply coming up with a beta and alpha for each team and making predictions

fading flare
fading flare
#

My point is that there is very little information you can possibly incorporate into the prediction. The dynamics of the game can't actually affect the average rate too much I think

#

Or there is little information in each individual goal

dim phoenix
dim phoenix
#

===== Backtest Results =====
Total Bets: 164
Total Wins: 70
Total Losses: 94
Average Odds: 1.84
Average Stake: 1.00
Total Profit: 59.05
Total Loss: 94.00
Yield (Profitability): 0.36
Correct Calibration: 8 / 164
Expected Value (EV): -34.95
Sharpe Ratio: -0.2328
Maximum Drawdown: 38.15
Total Analyzed Matches: 10,392
#

🫤

#

60% C.I
Betting when the book's line is outside my C.I range

#

Horrible results...

dim phoenix
#

What can I do?

dim phoenix
#

Finally vacation, starting to read Bishop Pattern Recognition

fading flare
#

Let me know if you have any questions

#

Also if you want to help make a solution's manual hahaha

dim phoenix
#

Cool!!!!!

#

How do you recommend to study this? Personally, for uni, when I have to study math, I read the textbook and make a summary while reading it.

#

But maybe you suggest a better approach for this?

fading flare
#

Sure, do that

#

Solve exercises, make notes

#

It's the best way

dim phoenix
fading flare
#

I can upload my solutions to exercises from the first chapter

#

but they are all good, so I highly recommend

dim phoenix
#

Yeah I'll try to solve them! But I'm in page 23, and excercises start i n the 58th page 😄

#

wow I'm so excited

dim phoenix
#

how important is it to note these functions?

#

Like, I do understand the goal of that function, but is it necessary to note it?

#

I'm writting a summary, that's why I ask this

fading flare
#

Capture the main idea

#

What were you going to do, copy the whole plot?

dim phoenix
#

Like I'm writting down the things I consider the most important

#

but it's taking me a while to read/understand

#

I'm more "studying" than just reading

#

but thats ok, its better

dim phoenix
#

omg this is a neverending book lmao

#

So I've a question, I'm willing to read and study it all, it'll take me around 2 months.

Now the question is, is it necessary? should I pick specific chapters of the book for my goal?

I want an honest anser.

#

On one hand I super want to know what can help me build a working model or whatever it is. So I want to go somewhat straight to the point (will save me maybe 1 month?)

If I read the whole book, it will probably take me an extra month, but I'll know almost all of it.

#

So for example, I read polynomial curve fitting and gaussian distribution, it's an introduction so definitely important.

Maybe I can come up with a function that describes the goal rate? Maybe for gaussian I can also do something related to goal rates?

But I don't know. Ummm, what do you recommend?

#

Sorry I'm just eager to start working on something, but my problem is I've nothing to work with yet since I dont know my options 🤣

fading flare
#

You could try to describe goal rate with a gaussian, sure (apart from goal rate always being > 0). The question is what are the inputs then.

#

Maybe something closer to what you might want is Statistical Rethinking. It is more practical and focuses more on bayesian methods.

dim phoenix
#

But I should continue to read the book I assume (?)

dim phoenix
#

⁉️

fading flare
#

Read/watch both. I don't have a good answer for the question of "what is the minimal thing I should learn to get to the results quicker" since it is really hard to tell what you need and there are limits to my knowledge as well.

#

If you want to understand bayesian methods and most of ML, those books are probably good. But I can't tell you "read this chapter but not this one".

dim phoenix
#

haha yeah you are right

#

I'll continue reading

#

thanks again

dim phoenix
#

I coded a train, and retrain module for my model

#

basically I retrain every time I've new data, but it takes a lot of time to do so

#

On the other hand, this is a trace_plot for 2 months of training, only for one league, what can I interpret from this?

fading flare
#

depends on the meaning of the parameters

#

I am a bit concerned about this ^

#

but everything else looks great

dim phoenix
fading flare
#

not sure there is enough samples

#

it seems like this parameter is bimodal possibly

dim phoenix
#

does this look better?

#

btw I changed my code and now training doesnt take much!!!

fading flare
#

yes, much much better

#

you can tell that the model actually converges correctly

fading flare
#

But I also suggest looking into documentation for Rhat and checking that it looks okay

dim phoenix
#

what if my plots are horrible, my rhat is too high, but I beat the book? think_fade

dim phoenix
#

wtf is this 🤣

fading flare
#

Looks like your model is a bit unconstrained so there are two possible solutions

#

Probably

fading flare
#

Lol but you should probably address it nevertheless

#

The problem is most likely due to multiple modes that different chains converge to

#

You described your problem in terms of differences of two variables perhaps x-y, and two solutions are possible x=3.5,y=0 and x=0,y=-3.5

dim phoenix
#

gotchu