Handball Match Goal Predictor | Learn AI Together | Page 1

dim phoenix Oct 13, 2024, 1:08 AM

#

Hey guys. Few weeks ago I started working on "something" which's goal is predict the number of goals in a handball match. It started as something for PRE Game, meaning, predict before the game starts, but I left it, since I was literally getting averages from past results, and that clearly wouldn't work (I'll probably work on that when I learn more about ML).

So, now I'm trying to come up with "something" that can predict the number of goals of an in-play match. I've been talking with @rolandodata and @Big Bux Chungus
Here you have a quick access to some of the conversations:

#💬・general-discussions message
#💬・general-discussions message (large conversation, take your time scrolling)

Okay, so, I'm a bit confused on what I want, it looks like I just need to try different things, tune them, and see if they work, so maybe instead of thinking what's the thing that can work the best, just start with something and see if it works. Having said this, what should I start with?

Like, what are the different things I could try? I'm lost to be honest, I also don't have any knowledge regarding ML, it'll be my first project. I'm a CS student but I'm only finishing second year.

Help is SUPER appreciated.

#

PD: Right now I'm trying to plot the line that describes the goals, and I'm also trying to correctly calculate poisson predictions for both over and under.

For the over prediction dataframe, I'm trying to calculate the number of goals at which the probability of seeing less goals is 1 - expected_winrate

While for the under prediction dataframe,I'm trying to calculate the number of goals at which the probability of seeing more goals is 1 - expected_winrate

But I'm not seeing success in the results of my dataframes, might need to dig a bit deeper and see what's going on.

#

Anyways, this PD is more like a comment to what's the main goal (which I describe in the first message).

fading flare Oct 14, 2024, 2:19 AM

#

Maybe I misunderstand you, but are you trying to use poisson to make a range of values? Like 95% confidence interval?

dim phoenix Oct 14, 2024, 10:37 PM

#

fading flare Maybe I misunderstand you, but are you trying to use poisson to make a range of ...

Well, right now I am

#

This is what I got

#

Makes sense?

#

but idk if this will work long term... I want to learn how to build something good.

fading flare Oct 14, 2024, 11:04 PM

#

No, the distribution is too narrow at the beginning

#

It should be really wide at the beginning and then start narrowing as you get more and more information

dim phoenix Oct 14, 2024, 11:27 PM

#

fading flare No, the distribution is too narrow at the beginning

Yes, and it kinda makes sense that it's like that.

I'm interested in being precise in the second half

#

Anyways, this is what I have now, and again, I doubt it'll perform well.

So I would like to hear from you, what can I do?

I still need to read the book you recommended, couldnt start yet, uni takes me a lot of time, but I definitely can find some time.

dim phoenix Oct 15, 2024, 12:12 AM

#

This is a different example, clearly not on point, too simple to be good.

dim phoenix Oct 15, 2024, 12:28 AM

#

I suppose, use MAP with a good prior as you said, is the solution? You have the prediction take into account matches where you start with really small number of goals?

fading flare Oct 15, 2024, 12:29 AM

#

No no, what I mean is that your calculation must be wrong because it is too narrow in the beginning

#

And it doesn't change in width at all

#

You are doing some sort of f(x)+- C

dim phoenix Oct 15, 2024, 12:30 AM

#

yeah it looks like that

fading flare Oct 15, 2024, 12:30 AM

#

But it should be roughly f(x, N) +- C/sqrt(N)

dim phoenix Oct 15, 2024, 12:30 AM

#

strange... I'm using scipy.stats.poisson

#

I'm working in python

#

time is an int between 0 and 3600
goal_rate = (cumulative_goals_at_time / (time if time > 0 else 1))
total_predicted_goals = goal_rate * match_duration
over_pred = poisson.ppf(prob_for_over, total_predicted_goals)

fading flare Oct 15, 2024, 12:32 AM

#

It's a wrong model I think

dim phoenix Oct 15, 2024, 12:32 AM

#

rolf

fading flare Oct 15, 2024, 12:33 AM

#

Think about it, at the very very end, the range should be zero

#

But yours isn't

dim phoenix Oct 15, 2024, 12:33 AM

#

yeah

#

you are right

#

So lets imagine I fix this. I'm still left with something that's far from enough

fading flare Oct 15, 2024, 12:36 AM

#

You are left with something that works

#

compared to what you have

dim phoenix Oct 15, 2024, 12:38 AM

#

fading flare compared to what you have

haha right, but still bad! right?

#

I mean, what I have rn doesnt work, so whatever works will be way better, but still not enough yet.

#

idk.. I want to have something solid good

fading flare Oct 15, 2024, 12:41 AM

#

Suppose time t1 has passed and you expect the match to run until t2. You have observed n events so far. You assume that P(N events in dt) = Poisson(lambda * dt). You can fit your poisson distribution using MLE (fit function in scipy). Then compute n + Poisson(lambda * (t2 - t1))

#

Lol first build a baseline.

dim phoenix Oct 15, 2024, 12:41 AM

#

fading flare Lol first build a baseline.

yeah sorry

fading flare Oct 15, 2024, 12:42 AM

#

One of the rules of ML is to always start from a baseline model.

#

Even if it is the absolute dumbest model.

#

Send me one set of goal times

dim phoenix Oct 15, 2024, 12:43 AM

#

ok

#

want a pastebin?

fading flare Oct 15, 2024, 12:44 AM

#

sure

dim phoenix Oct 15, 2024, 12:44 AM

#

to avoid spam

#

https://pastebin.com/KWwGtX4f

fading flare Oct 15, 2024, 12:53 AM

#

#

these are 95% intervals using MLE

#

📎 message.py

#

basically the range of expected number of home/away goals based on the information you have up to that point

#

like if 5 minutes in I told you the number

#

what would be your guess

#

for the final score

dim phoenix Oct 15, 2024, 12:56 AM

#

I see

#

understood.

#

Okay I want to understand the whole process behind this and then come back to it.

#

Like I'll read your code and once I get what's going on (in code, I do understand what you explained), I'll talk again

#

wow why didnt I join the ml world before 😭

#

Thanks a lot for your help!!!!!!!

#

is it intelligent to have a prediction for home and another for away?

#

or is it the same

#

my guess, its the same

#

since you are looking for events, and not who was the author of such event

fading flare Oct 15, 2024, 1:12 AM

#

I don't actually know what those two numbers are exactly.

#

I assumed it was for 2 teams

dim phoenix Oct 15, 2024, 1:22 AM

#

Yeah, so I sent you a list of dicts which gave the home score and the away score. You separately predicted for home and for away.

What if I just gave you the cumulative score? Will the prediction be the same?

fading flare Oct 15, 2024, 1:28 AM

#

I don't know the domain of sports at all (haha)

#

#

fading flare Oct 15, 2024, 1:30 AM

#

dim phoenix Yeah, so I sent you a list of dicts which gave the home score and the away score...

Is cumulative score = home - away?

dim phoenix Oct 15, 2024, 1:30 AM

#

cumulative is home + away

#

{'timestamp': 367, 'current_home_score': 3, 'current_away_score': 1}

cumulatice is 3 + 1 = 4

#

the goal is to predict the number of goals at the end of the match. In total

fading flare Oct 15, 2024, 1:31 AM

#

Ah hmmmm

#

Well, it would make as much sense to model the sum as a poisson as it does for each individual component.

#

One rate of goals instead of two.

#

(partly generated by o1, but it looks okay to me)

📎 message.py

#

This is 2 chapters of the book I recommended btw.

#

Since it uses conjugate priors for MAP.

dim phoenix Oct 15, 2024, 1:50 AM

#

alright, time to read then

dim phoenix Oct 15, 2024, 10:39 PM

#

bro understanding whats going on in the code is wild 😭

dim phoenix Oct 15, 2024, 10:55 PM

#

I've a few questions.

Is it intelligent to reset alpha en beta when the second half stats?
Do we want to use past data to (I guess) somewhat influence the value of lambda? this would be MAP, right?
Does the initial values for ALPHA and BETA influence the prediction as time passes? I'm interested in predicting accurately for the second half more specifically - or as the second half develops. If so, does it make sense to calculate the ALPHA and BETA using past data?
When picking the X% credible intervals, you are expecting to be successful at your predictions, X% of the times?
Will it help to make predictions for team 1 and team 2 separately? Maybe if one team is superior, we can expect more goals from them? Perhaps we can look at past data and be more precise in our predictions?

#

Wow this is a lot at once. Super interesting, SUPER.

fading flare Oct 16, 2024, 2:44 PM

#

dim phoenix I've a few questions. - Is it intelligent to reset alpha en beta when the secon...

Is it intelligent to reset alpha en beta when the second half stats?
I don't get the question, rephrase please.

#

Do we want to use past data to (I guess) somewhat influence the value of lambda?
If you mean past matches, then yes, that would be a great improvement. Good teams presumably score more in all matches they play.

this would be MAP, right?
Kind of, MAP just means that you are using some kind of prior to influence your predictions, not a specifically the past. Could be your expert knowledge. Also, in the last example I posted I am actually using something even better - conjugate priors + bayesian inference.

dim phoenix Oct 16, 2024, 2:48 PM

#

fading flare > Is it intelligent to reset alpha en beta when the second half stats? I don't g...

are beta and alpha the parameters that comtrol the shape and rate of the gamma distribution?

If so, assuming that each half could halve a different pace, is it intelligent to reset them then the second half starts?

fading flare Oct 16, 2024, 2:49 PM

#

are beta and alpha the parameters that comtrol the shape and rate of the gamma distribution?
Yes

If so, assuming that each half could halve a different pace, is it intelligent to reset them then the second half starts?
Well... that's complicated. Only if you think that information about the first half doesn't influence your predictions about the second half at all.

#

Does the initial values for ALPHA and BETA influence the prediction as time passes?
Yes, though a lot more at the beginning when you don't have any other information.

dim phoenix Oct 16, 2024, 2:50 PM

#

fading flare > Do we want to use past data to (I guess) somewhat influence the value of lambd...

So would it be another improvement to calculate goals gor each team specifically, and for each half separately?

fading flare Oct 16, 2024, 2:50 PM

#

dim phoenix - So would it be another improvement to calculate goals gor each team specifical...

Sure. The issue though is that I would not discard all of your information from the first half but try to model the second half somehow in terms of it.

#

When picking the X% credible intervals, you are expecting to be successful at your predictions, X% of the times?
Yes.

dim phoenix Oct 16, 2024, 2:51 PM

#

fading flare > Do we want to use past data to (I guess) somewhat influence the value of lambd...

Ok, will investigate more on conjugate priors + bayesian inference

fading flare Oct 16, 2024, 2:51 PM

#

Will it help to make predictions for team 1 and team 2 separately?
Yes.
Maybe if one team is superior, we can expect more goals from them?
Definitely.
Perhaps we can look at past data and be more precise in our predictions?
Yes.

dim phoenix Oct 16, 2024, 2:52 PM

#

ahaha cool

#

So apart from learning the theory behind all of this, which Ill learn in the book you shared, where can I learn how to implement my thoughts?

#

Like once I understand what I want, where can I learn where to implement it.

fading flare Oct 16, 2024, 2:59 PM

#

Maybe https://dataorigami.net/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/#contents would be helpful.

Bayesian Methods for Hackers

Bayesian Methods for Hackers : An intro to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view.

#

The issue is that there is like 10 different libraries you might need to be using (not all at the same time)

#

So there is not a single place where you can learn all of them.

dim phoenix Oct 16, 2024, 3:00 PM

#

Gotchu

fading flare Oct 16, 2024, 3:00 PM

#

The search terms are "probabilistic programming", "bayesian modeling"

#

PyMC3, Pyro (python library from uber), PyStan (and Stan itself) are all good tools

dim phoenix Oct 16, 2024, 3:02 PM

#

Like I started to understand plain poisson distributions wont be enough. Usually, small number implies a "smaller" prediction, when most of the times you see how later in the game, the number of events compensate. Makes sense?

fading flare Oct 16, 2024, 3:04 PM

#

Mmm not sure what you mean. Poisson can have different mu/lambda parameter depending on how far you are into the game.

dim phoenix Oct 16, 2024, 3:05 PM

#

I guess here is where variance comes up? Thats the parameter that accounts some randommess, right?

fading flare Oct 16, 2024, 3:09 PM

#

Again not sure what you mean. Basically, if you have a fixed rate of events, Poisson is the right distribution to use. But we don't actually know the rate of events, we can only estimate it. You can estimate it by dividing #goals / time but that's inaccurate in a couple ways - when time is low, this will be biased towards zero; you are not taking into account the inherent uncertainty (variance) that you have when estimating the rate.

#

When you take both of those issues into account, you have to come up with some sort of P(rate | data), and then calculate P(#remaining goals | rate)

#

Extra "variance" comes from the P(rate | data) term, the second term is technically still Poisson.

dim phoenix Oct 16, 2024, 3:10 PM

#

dim phoenix This is a different example, clearly not on point, too simple to be good.

Assume the violet line describes a well programmed poisson distribution

Look at the blue and green line (im using the bookmakers line to take that as "another prediction")

This game started with a small number of events compared to the bookmakers prediction (green line).

dim phoenix Oct 16, 2024, 3:11 PM

#

fading flare Mmm not sure what you mean. Poisson can have different mu/lambda parameter depen...

So I would be wrong on the prediction.

Thats what I meant in the message above.

fading flare Oct 16, 2024, 3:13 PM

#

Bookmakers take into account previous games and even for new teams they probably model it with some prior.

#

You were using an estimator for the rate that was probably biased towards zero. Plus the width of your distribution never changes, so there might be more critical issues with the code.

fading flare Oct 16, 2024, 3:19 PM

#

fading flare

Note how here the center of the shaded regions doesn't start close to zero.

dim phoenix Oct 16, 2024, 3:19 PM

#

Right, that was wrong, but I wanted to show how maybe the rate is small at the beggining, so our prediction might be small also, when we should take into account that this might reverse. But now that I think of it, it sounds as MAP. Right?

fading flare Oct 16, 2024, 3:20 PM

#

Yes

#

Like if I told you that it took players 3 minutes until the first goal, would you immediately say "the rate is 1/3 goals per minute so we will see exactly 20 goals"

#

This is obviously wrong but there is more than one issue with this statement

#

"the rate is 1/3 goals per minute so we will see Poisson(20) goals" is still wrong

#

because the first part is false - you don't really know that the rate is 1/3, that's just the argmax_{rate} P(data | rate) (MLE) estimate for a particular model of the world that you have (fixed rate of goals over time)

#

rate could be much higher and you just weren't lucky enough

#

or lower and you just happened to see the first goal early

#

plus MLE is just plain wrong for this kind of downstream calculation - you don't maximize f(g(x)) by computing f(argmax g(x)). something similar, except more complicated is happening here as well

#

finally your model can be very wrong to begin with. maybe the very beginning of the game works completely different from the rest of the game

#

Anyway, the point is that the statement "the rate is 1/3 goals per minute so we will see 20 goals" is packed full of dozens of different assumptions and small logical errors. You can correct for most of them. Bookmakers probably do exactly that.

#

It is still useful - it is a crude approximation that gives you a ballpark estimate quickly without much thought. It will be order of magnitude correct.

#

And btw, even MAP is a crude approximation and has almost all of the same issues. It just takes into account the prior, calculating argmax_{rate} P(rate | data) = argmax_{rate} P(data | rate) P(rate), but the rest of the problems doesn't disappear.

#

The only real way to do everything correctly and in a scalable way is to use probabilistic programming.

#

Conjugate priors method works only on very small and simple problems. The nice thing about it is that it is analytically tractable.

#

Ohh, I remember another great course on the topic - https://www.youtube.com/watch?v=FdnMWdICdRs&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus

YouTube

Richard McElreath

Statistical Rethinking 2023 - 01 - The Golem of Prague

Full course details at https://github.com/rmcelreath/stat_rethinking_2023

Chapters:
00:00 Introduction
03:30 DAGs (causal models)
17:50 Golems (stat models)
43:06 Owls (workflow)

Intro music: https://www.youtube.com/watch?v=9yHZdLswArc

▶ Play video

#

It has a book as well

#

https://github.com/rmcelreath/stat_rethinking_2024

GitHub

GitHub - rmcelreath/stat_rethinking_2024

Contribute to rmcelreath/stat_rethinking_2024 development by creating an account on GitHub.

#

https://civil.colorado.edu/~balajir/CVEN6833/bayes-resources/RM-StatRethink-Bayes.pdf (but this is 1st edition I think)

#

https://github.com/Booleans/statistical-rethinking/blob/master/Statistical Rethinking 2nd Edition.pdf

#

I suggest you borrow it from your library or support the author by buying it :)

dim phoenix Oct 16, 2024, 6:52 PM

#

miyazaki_pain

dim phoenix Oct 16, 2024, 6:54 PM

#

fading flare The only real way to do everything correctly and in a scalable way is to use pro...

And what about that? Are you suggesting me to try solve this problem using probabilistic programming?

#

I want to start coding but I dont even know what I have to do tiredorange

fading flare Oct 16, 2024, 6:56 PM

#

dim phoenix And what about that? Are you suggesting me to try solve this problem using proba...

Start with the simplest solution.

#

Then iterate.

fading flare Oct 16, 2024, 6:57 PM

#

dim phoenix And what about that? Are you suggesting me to try solve this problem using proba...

Only once you understand what it means.

dim phoenix Oct 16, 2024, 7:01 PM

#

fading flare Start with the simplest solution.

so a simple poisson distribution

#

understand it, then apply MAP, and start from scratch

#

makes sense

#

okay thanks once more prayge really I super appreciate your help

#

super appreciate it

fading flare Oct 16, 2024, 7:02 PM

#

dim phoenix understand it, then apply MAP, and start from scratch

MLE => MAP => Conjugate Priors => Laplace Approximation => Probabilistic Programming & MCMC

dim phoenix Oct 16, 2024, 11:18 PM

#

fading flare MLE => MAP => Conjugate Priors => Laplace Approximation => Probabilistic Program...

Nice roadmap!

Thanks again!

dim phoenix Oct 17, 2024, 2:31 PM

#

def set_parameters(self, home_data: list[list[dict]], away_data: list[list[dict]]) -> None:

    # Process home data
    home_goal_rates = []
    for match in home_data:
        if len(match) == 0:
            continue
        total_goals = match[-1]['score']  # Final score in the match
        total_time = match[-1]['timestamp']  # Assuming match starts at time 0
        if total_time == 0:
            continue  # Avoid division by zero
        goal_rate = total_goals / total_time
        home_goal_rates.append(goal_rate)

    # Compute mean of home goal rates
    if not home_goal_rates:
        raise ValueError("No valid home goal rates available to set parameters.")

    mean_rate_home = np.mean(home_goal_rates)

    # Compute variance of home goal rates
    if len(home_goal_rates) > 1:
        variance_rate_home = np.var(home_goal_rates, ddof=1)  # Sample variance
    else:
        variance_rate_home = 1e-6  # Small positive value to avoid division by zero

    # Estimate prior parameters for home team
    self.beta_prior_home = mean_rate_home / variance_rate_home
    self.alpha_prior_home = mean_rate_home * self.beta_prior_home

    # Process away data
    away_goal_rates = []
    for match in away_data:
        if len(match) == 0:
            continue
        total_goals = match[-1]['score']
        total_time = match[-1]['timestamp']
        if total_time == 0:
            continue
        goal_rate = total_goals / total_time
        away_goal_rates.append(goal_rate)

    # Compute mean of away goal rates
    if not away_goal_rates:
        raise ValueError("No valid away goal rates available to set parameters.")

    mean_rate_away = np.mean(away_goal_rates)

    # Compute variance of away goal rates
    if len(away_goal_rates) > 1:
        variance_rate_away = np.var(away_goal_rates, ddof=1)
    else:
        variance_rate_away = 1e-6  # Small positive value to avoid division by zero

    # Estimate prior parameters for away team
    self.beta_prior_away = mean_rate_away / variance_rate_away
    self.alpha_prior_away = mean_rate_away * self.beta_prior_away

    print("Mean Rate Home:", mean_rate_home)
    print("Variance Rate Home:", variance_rate_home)
    print("Mean Rate Away:", mean_rate_away)
    print("Variance Rate Away:", variance_rate_away)

#

I'm trying to code a function to get the values for prior alpha and beta, but I'm getting extremely high alphas and betas, at least when I only have 1 match of prior data.

Mean Rate Home: 0.008611111111111111
Variance Rate Home: 1e-06
Mean Rate Away: 0.006944444444444444
Variance Rate Away: 1e-06


print(mle.alpha_prior_home) # 74.15123456790124
print(mle.alpha_prior_away) # 48.2253086419753
print(mle.beta_prior_home) # 8611.111111111111
print(mle.beta_prior_away) # 6944.444444444444

This doesnt seem right?

#

I read that for these cases I could reconsider the statistical model (use the Poisson Distribution directly) or implement minimum mean and variance values, but not sure what's theoretically right.

#

I do understand why the variance would be extremely low (it should actually be 0) when you only have 1 match of data, but it could even be super small with 2 matches of data, so perhaps I want to change the way I calculate these values?

dim phoenix Oct 17, 2024, 3:26 PM

#

Or should I incorporate prior knowledge in a bayesian framework (o1 recommendation)

fading flare Oct 17, 2024, 4:33 PM

#

dim phoenix Or should I incorporate prior knowledge in a bayesian framework (o1 recommendati...

Incorporating a prior is not a bad idea

fading flare Oct 17, 2024, 4:33 PM

#

dim phoenix I do understand why the variance would be extremely low (it should actually be 0...

And that

dim phoenix Oct 17, 2024, 5:36 PM

#

Been investigating about Bayes theorem, I think I somewhat understand it. It sounds like MAP, right?

#

It's using prior data, to come up with a hypothesis?

dim phoenix Oct 17, 2024, 7:15 PM

#

fading flare And that

So what can I do? I do want to have a prior, but when variance tends to 0, I get super high values for both beta and alpha.

fading flare Oct 17, 2024, 11:38 PM

#

dim phoenix Been investigating about Bayes theorem, I think I somewhat understand it. It sou...

Mmm related.

fading flare Oct 17, 2024, 11:39 PM

#

dim phoenix So what can I do? I do want to have a prior, but when variance tends to 0, I get...

You need to write out what you are actually doing in terms of math. Because right now you are kind of faking those alpha and beta parameters by computing the mean and the variance.

dim phoenix Oct 18, 2024, 1:58 PM

#

fading flare You need to write out what you are actually doing in terms of math. Because righ...

Isnt beta and alpha calculated based on mean and variance?

#

beta = mean / variance
alpha = mean^2 / variance

#

It's super tricky. If you give one match of prior, variance will always be so small that beta and alpha will be extremely high, but even with two or three games of prior you can find yourself in that situation.

#

Yesterday, I was stuck all day trying to figure out how to determine the values for alpha and beta.

dim phoenix Oct 19, 2024, 6:46 PM

#

#

not using confidence interval yet, not sure if it's needed? idk...

#

Do you see something off? I can share code.

dim phoenix Oct 19, 2024, 7:46 PM

#

Is a confidence interval required?

Hmmmm

dim phoenix Oct 19, 2024, 9:18 PM

#

Another example of prediction with prior, bookmaker was way off.

#

What should I do next?

fading flare Oct 20, 2024, 4:32 AM

#

dim phoenix Another example of prediction with prior, bookmaker was way off.

Looks pretty good

fading flare Oct 20, 2024, 4:33 AM

#

dim phoenix What should I do next?

Idk, depends on your goal :)

dim phoenix Oct 20, 2024, 2:40 PM

#

fading flare Idk, depends on your goal :)

Well my goal is to beat the bookmaker I guess 😄

#

I was thinking how this would perform, and thought of coding the following strategy:

if timestamp = 0 and our prediction is X% away from bookmaker, place a bet
if there are no, or almost no goals in a 2-3 minute window, and the bookmaker line offer dropped, and our prediction is higher than the book's, bet on over
if there are lots of goals in a 2-3 minute window, and the bookmaker line increase, and our prediction is lower than the book's, bet on under

makes sense? idk though...

#

😄

fading flare Oct 20, 2024, 3:00 PM

#

Well maybe you should come up with some numeric metrics for being "better than the bookmaker"

#

E.g. how far are you from the true answer vs bookmaker

#

Something like "medium of absolute error at minute 3 across all games"

#

Since you are dealing with a sophisticated opponent that can also learn, you should probably setup realistic backtesting. Do you have dates for all matches?

#

Sort all matches by date, then for a match at time t0 use only information from time t < t0 to train your model (e.g. any priors)

#

You can also use "will I win money with this strategy" as your success metric. Compute the probability of a successful bet across all games

#

As for improvements to the model, maybe you could model some more subtle effects like teams scoring a bit less when away, teams scoring less against stronger opponents, differences in the number of goals as game progresses (either due to players getting tired or riled up), "non-linear" effects where a team is more likely to score after a short series of failures but less likely after a long. Effects of team composition. Etc

dim phoenix Oct 27, 2024, 11:32 PM

#

Hey, I'm back.

Been investigating more, talking with a guy that already works with sports prediction, and well as time passes I'm learn more and more. Anyways, still lots of things to learn.

So, I was somewhat happy with my initial way of predicting the goals for a match, but it wasn't really a model, I calculated BETA & ALPHA at the beginning of the match, and then I just updated it as I observed data. So, I didn't backtest the code yet, in order to say if it'll work or not, but what I can say is that sometimes the line offered by the bookmaker was like 10% (or more) away from my code's prediction. So well this could be an opportunity to take advantage of.

Well this leaves me again, talking about the different scenarios I should hardcode I guess, so you code your strategy, which could you indicators like:

predictions of book and code being X% away at timestamp 0 (initial of match)
no goals or lots of goals in a time window of a to b minutes

So I believe it's not a bad idea to do this (hardcode strategies), but only because I haven't heard of other ways of using the predictions of my code.

And so I was also thinking about using prior data more in depth, not only to calculate BETA and ALPHA for my Gamma distribution. My friend was mentioning that some leagues have slow first halfs and faster second halfs, perhaps it's possible to model goal rates across different minutes of the match? Get lets say 10 matches for home, model the goal distribution across the different minutes of the match and maybe when our prediction differs an X% from the bookmaker prediction, you take advantage of the opportunity.

#

I also had other questions like:

Do recent matches have more influence on the current match? How can I code weights 🤷‍♂️
How many matches should I use as prior data? Why?
Assuming I want to predict for each team separately: teams will score more or less goals depending on the quality of the opponent, how can I take this into account when predicting?

#

And there was something else I wanted to ask but I forgot lol

fading flare Oct 28, 2024, 5:08 AM

#

dim phoenix Hey, I'm back. Been investigating more, talking with a guy that already works w...

you might want to take a look into hierarchical distribution modeling

fading flare Oct 28, 2024, 5:09 AM

#

dim phoenix I also had other questions like: - Do recent matches have more influence on the ...

Do recent matches have more influence on the current match?
You could code that as some sort of Gaussian process

#

How many matches should I use as prior data? Why?
All of them. Why would you throw away information.

#

Assuming I want to predict for each team separately: teams will score more or less goals depending on the quality of the opponent, how can I take this into account when predicting?
Good question. Maybe take a look at TrueSkill paper and https://www.youtube.com/watch?v=veiLCvcLIg8 (he talks about modeling scores for teams a bit)

fading flare Oct 28, 2024, 5:14 AM

#

fading flare > Do recent matches have more influence on the current match? You could code tha...

To explain this in a bit more detail: you can assume that the skill of a team changes over time slowly. skill(t+1) = skill(t) + small noise

#

It is very unlikely that a team suddenly becomes much better than it was and unlikely that it becomes much worse.

#

And then goal_rate(T1, T2, t) = f(skill_T1(t), skill_T2(t)) where f is some function that monotonically increases with skill_T1 and monotonically decreases with skill_T2

#

it's not straightforward to write something down immediately

#

you could model early game / midgame / endgame effects as some sort of slowly varying function as well

#

or as 3 coefficients

#

as a multiplier maybe, so you know it is close to 1 but maybe a bit more goals in the first 3rd of the game

#

this multiplier or slowly varying function can be shared across all games as well

#

just learn either Stan or Pyro

#

this stuff will become progressively more difficult to do manually, so you need the right tool here for complex modeling

#

you also need a hierarchical model ideally so that you can easily add new teams and make sane predictions for them

fading flare Oct 28, 2024, 7:51 AM

#

https://chatgpt.com/share/671f426f-6f58-800d-a1a5-cf9a134e0565

ChatGPT

A conversational AI system that listens, learns, and challenges

#

Not... the most amazing answers from chatgpt but some good search terms and some good ideas

#

https://chatgpt.com/share/671f45ee-e3a8-800d-81d8-0ae732971a5d

ChatGPT

A conversational AI system that listens, learns, and challenges

#

this reads a bit like a summary but it's good

dim phoenix Oct 29, 2024, 6:12 PM

#

fading flare you might want to take a look into hierarchical distribution modeling

will look into it! thanks for the recommendation

dim phoenix Oct 29, 2024, 6:14 PM

#

fading flare > How many matches should I use as prior data? Why? All of them. Why would you t...

Well, I'm thinking, as of now I use prior data to calculate the initial alpha an beta. If I use lets say 5 years of data, predictions might not be more accurate, since teams change a lot, and most recent matches describe better the current "form" of each team, right?

#

Oh btw! Important note! Maybe you come up with an idea because of what I'll say.

Bookmakers have a prediction when the match starts, lets say 60 goals. Well 60 goals in a 60/m match is a rate of 1 goal per minute. So lets say in 10 minutes we have 20 goals (double the rate), the line offered by the book now will be current_goals + remaining_time * initial_expected_goal_rate, so for my example it would be: 20 + 50 * 1 = 70

#

Just a comment, I didnt know about this hehe

#

Read the chatgpt convos, it's so much stuff, I would try it all but I dont have time for all of it rn, I will in around a month! So in the meantime where should I specialize?

I think I'll first backtest what I've now.

#

Please dont delete the chagpt convos prayge

fading flare Oct 29, 2024, 6:32 PM

#

dim phoenix Well, I'm thinking, as of now I use prior data to calculate the initial alpha an...

Yes, that's why you should be modeling "skill drift"

#

as some sort of gaussian process / random process

dim phoenix Oct 29, 2024, 6:34 PM

#

is this in the book you shared?

#

I think I'll leave this thread dormant till I'm done reading the book, to be honest, I havent even started.

fading flare Oct 29, 2024, 6:34 PM

#

It has a chapter on Gaussian Processes

dim phoenix Oct 29, 2024, 6:34 PM

#

I'm trying to work on something I dont exactly understand.

fading flare Oct 29, 2024, 6:35 PM

#

it has soooo much good stuff tbh

dim phoenix Oct 29, 2024, 6:37 PM

#

what do you think? I should stop with the coding process a bit and focus more on understanding everything behind modelling and stuff, to decide what might be best for me later, right?

fading flare Oct 29, 2024, 6:38 PM

#

It depends whether you are willing to sacrifice time to learn things properly

#

I would start with the first chapter and see how much of it you understand

dim phoenix Oct 29, 2024, 6:39 PM

#

yes I'm willing to sacrifice my time

#

I "SOMEWHAT" always did

fading flare Oct 29, 2024, 6:39 PM

#

then I would make a detour and learn about automatic differentiation and implement it from scratch

dim phoenix Oct 29, 2024, 6:40 PM

#

like rn I might be able to read a bit per day for 2-3 weeks, and when december starts I'm done with my exams and I've 4 months of vacation

fading flare Oct 29, 2024, 6:40 PM

#

the reason being is that it will be sooo much helpful to solve real problems quickly

dim phoenix Oct 29, 2024, 6:40 PM

#

detour?

#

automatic differentiation like calculus? isnt there a lib that can differentiate functions for you?

fading flare Oct 29, 2024, 6:40 PM

#

yes

#

yes there is

#

but knowing how it works is incredibly useful

dim phoenix Oct 29, 2024, 6:41 PM

#

alr

#

yeah I'm willing to do everything tbh

#

I enjoy learning

#

so I'm not sacrificing much tbh shrugCat

fading flare Oct 29, 2024, 6:42 PM

#

maybe I'll join you in reading the book lol

dim phoenix Oct 29, 2024, 6:42 PM

#

haha nice

fading flare Oct 29, 2024, 6:42 PM

#

I want to compile a solution's manual

dim phoenix Oct 29, 2024, 6:42 PM

#

I wanna get it physically but it's kinda expensive

#

150usd in my country

fading flare Oct 29, 2024, 6:43 PM

#

dim phoenix Oct 29, 2024, 6:43 PM

#

double the us price

fading flare Oct 29, 2024, 6:43 PM

#

my solutions to the first chapter's problems

dim phoenix Oct 29, 2024, 6:43 PM

#

miyazaki_pain

fading flare Oct 29, 2024, 6:43 PM

#

lol

dim phoenix Oct 29, 2024, 6:44 PM

#

well

#

lots of things to learn

#

can start slowly at least

#

and in vacations, 24/7 sunglas

fading flare Oct 29, 2024, 6:50 PM

#

the reason to learn AD early is that it's so amazingly useful. not just for deep learning, but for everything. same thing with some other algorithms like bayesian optimization or markov chain monte carlo but to a slightly lesser extent

#

and an implementation can be written in any language in < 100 LOC

#

it's a detour in a sense that it has no direct relevance to what you are doing right now haha

dim phoenix Oct 29, 2024, 6:57 PM

#

like AD is literally coding functions that can differentiate a function?

#

I'm saying literally as if it was easy lol

fading flare Oct 29, 2024, 6:58 PM

#

yes

dim phoenix Oct 29, 2024, 6:59 PM

#

is there a marketplace for knowledge?

fading flare Oct 29, 2024, 6:59 PM

#

There is a lot of things you can do like differentiate with respect to its arguments, only some arguments, find second derivative with respect to an argument, etc. If the arguments are changing with respect to some other parameter t, you can calculate the derivative of the result with respect to t. You can even find symbolic representation for all of these.

dim phoenix Oct 29, 2024, 7:00 PM

#

maybe like an injection or a pill, you know

fading flare Oct 29, 2024, 7:00 PM

#

I wish

dim phoenix Oct 29, 2024, 9:30 PM

#

Can I ask for a favor?

Could you export both chats you had with o1 and send them here? I wanna keep em

https://help.openai.com/en/articles/7260999-how-do-i-export-my-chatgpt-history-and-data

dim phoenix Nov 1, 2024, 11:31 AM

#

While working on a simple backtesting module, to read different metrics, see how bad my work is, I sometimes plot matches that other guys with models send.

So there's this guy who I know is good.

#

And for this match, after minute 40, when the book over/under offer was at 71.5/72.5, he said there was value in the under.

Right here.

As you can see, first off my model would initially think there was value on the over, since my prediction is like 5 points higher than the book's prediction. But leaving that aside for a moment, what do you think this guy's model saw around that minute?

The only thing I can imagine is maybe lots of goals, more than usual, but that wouldnt tell you there is value on the under, since there is randomness in sports and you could have matches where you have much more goals than what you expected.

I dont know. What do you think?

fading flare Nov 1, 2024, 9:08 PM

#

Maybe he is just sampling

#

It is odd how many predictions he makes in sudden bursts

#

Maybe he has some information about the game you don't?

dim phoenix Nov 1, 2024, 9:11 PM

#

fading flare Maybe he is just sampling

what do you mean?

dim phoenix Nov 1, 2024, 9:11 PM

#

fading flare Maybe he has some information about the game you don't?

his model is quantitative, not fundamental

#

my code is 💩 , I'm not implementing models as such, I'm not training a model and cross validating, etc, I'm simply coming up with a beta and alpha for each team and making predictions

fading flare Nov 1, 2024, 9:25 PM

#

dim phoenix my code is 💩 , I'm not implementing models as such, I'm not training a model an...

works surprisingly well considering

fading flare Nov 2, 2024, 12:43 AM

#

My point is that there is very little information you can possibly incorporate into the prediction. The dynamics of the game can't actually affect the average rate too much I think

#

Or there is little information in each individual goal

dim phoenix Nov 2, 2024, 2:25 PM

#

fading flare My point is that there is very little information you can possibly incorporate i...

so I should perhaps think on incorporating other models? I'll keep my code the way it is for the moment - unless a simple-to-implement modification can increase the accuracy of my "model"

I dont want to code much more before reading bishop's book

dim phoenix Nov 5, 2024, 10:26 AM

#


===== Backtest Results =====
Total Bets: 164
Total Wins: 70
Total Losses: 94
Average Odds: 1.84
Average Stake: 1.00
Total Profit: 59.05
Total Loss: 94.00
Yield (Profitability): 0.36
Correct Calibration: 8 / 164
Expected Value (EV): -34.95
Sharpe Ratio: -0.2328
Maximum Drawdown: 38.15
Total Analyzed Matches: 10,392

#

🫤

#

60% C.I
Betting when the book's line is outside my C.I range

#

Horrible results...

dim phoenix Nov 5, 2024, 12:33 PM

#

What can I do?

dim phoenix Dec 10, 2024, 11:21 AM

#

Finally vacation, starting to read Bishop Pattern Recognition

fading flare Dec 10, 2024, 11:49 AM

#

Let me know if you have any questions

#

Also if you want to help make a solution's manual hahaha

dim phoenix Dec 10, 2024, 12:04 PM

#

Cool!!!!!

#

#

How do you recommend to study this? Personally, for uni, when I have to study math, I read the textbook and make a summary while reading it.

#

But maybe you suggest a better approach for this?

fading flare Dec 10, 2024, 12:15 PM

#

Sure, do that

#

Solve exercises, make notes

#

It's the best way

dim phoenix Dec 10, 2024, 12:18 PM

#

fading flare Dec 10, 2024, 1:02 PM

#

I can upload my solutions to exercises from the first chapter

#

but they are all good, so I highly recommend

dim phoenix Dec 10, 2024, 1:08 PM

#

Yeah I'll try to solve them! But I'm in page 23, and excercises start i n the 58th page 😄

#

wow I'm so excited

#

kekw

dim phoenix Dec 11, 2024, 10:52 PM

#

how important is it to note these functions?

#

Like, I do understand the goal of that function, but is it necessary to note it?

#

I'm writting a summary, that's why I ask this

fading flare Dec 12, 2024, 2:47 AM

#

Capture the main idea

#

What were you going to do, copy the whole plot?

dim phoenix Dec 12, 2024, 12:50 PM

#

fading flare What were you going to do, copy the whole plot?

nono

#

Like I'm writting down the things I consider the most important

#

but it's taking me a while to read/understand

#

I'm more "studying" than just reading

#

but thats ok, its better

#

😬

dim phoenix Dec 13, 2024, 10:33 PM

#

omg this is a neverending book lmao

#

So I've a question, I'm willing to read and study it all, it'll take me around 2 months.

Now the question is, is it necessary? should I pick specific chapters of the book for my goal?

I want an honest anser.

#

On one hand I super want to know what can help me build a working model or whatever it is. So I want to go somewhat straight to the point (will save me maybe 1 month?)

If I read the whole book, it will probably take me an extra month, but I'll know almost all of it.

#

So for example, I read polynomial curve fitting and gaussian distribution, it's an introduction so definitely important.

Maybe I can come up with a function that describes the goal rate? Maybe for gaussian I can also do something related to goal rates?

But I don't know. Ummm, what do you recommend?

#

Sorry I'm just eager to start working on something, but my problem is I've nothing to work with yet since I dont know my options 🤣

fading flare Dec 14, 2024, 4:19 AM

#

You could try to describe goal rate with a gaussian, sure (apart from goal rate always being > 0). The question is what are the inputs then.

#

Maybe something closer to what you might want is Statistical Rethinking. It is more practical and focuses more on bayesian methods.

dim phoenix Dec 14, 2024, 12:45 PM

#

?

#

But I should continue to read the book I assume (?)

dim phoenix Dec 15, 2024, 1:42 AM

#

⁉️

fading flare Dec 15, 2024, 8:11 AM

#

Read/watch both. I don't have a good answer for the question of "what is the minimal thing I should learn to get to the results quicker" since it is really hard to tell what you need and there are limits to my knowledge as well.

#

If you want to understand bayesian methods and most of ML, those books are probably good. But I can't tell you "read this chapter but not this one".

dim phoenix Dec 15, 2024, 1:04 PM

#

haha yeah you are right

#

I'll continue reading

#

thanks again

dim phoenix Dec 29, 2024, 8:37 PM

#

I coded a train, and retrain module for my model

#

basically I retrain every time I've new data, but it takes a lot of time to do so

#

On the other hand, this is a trace_plot for 2 months of training, only for one league, what can I interpret from this?

fading flare Dec 30, 2024, 3:30 AM

#

depends on the meaning of the parameters

#

I am a bit concerned about this ^

#

but everything else looks great

dim phoenix Dec 30, 2024, 2:12 PM

#

fading flare I am a bit concerned about this ^

why are you concerned about that?

fading flare Dec 30, 2024, 4:40 PM

#

not sure there is enough samples

#

it seems like this parameter is bimodal possibly

dim phoenix Dec 31, 2024, 9:12 PM

#

does this look better?

#

btw I changed my code and now training doesnt take much!!!

fading flare Jan 1, 2025, 5:36 PM

#

yes, much much better

#

you can tell that the model actually converges correctly

fading flare Jan 1, 2025, 5:55 PM

#

But I also suggest looking into documentation for Rhat and checking that it looks okay

dim phoenix Jan 7, 2025, 9:48 PM

#

what if my plots are horrible, my rhat is too high, but I beat the book? think_fade

dim phoenix Jan 8, 2025, 2:01 AM

#

wtf is this 🤣

fading flare Jan 9, 2025, 3:32 PM

#

Looks like your model is a bit unconstrained so there are two possible solutions

#

Probably

fading flare Jan 9, 2025, 3:44 PM

#

dim phoenix what if my plots are horrible, my rhat is too high, but I beat the book? <:think...

If you consistently beat them when backtesting, who cares about rhat cohle_smoke harold

#

Lol but you should probably address it nevertheless

#

The problem is most likely due to multiple modes that different chains converge to

#

You described your problem in terms of differences of two variables perhaps x-y, and two solutions are possible x=3.5,y=0 and x=0,y=-3.5

dim phoenix Jan 9, 2025, 7:48 PM

#

gotchu

#Handball Match Goal Predictor