#What's the most accurate way to calculate Hunter win rates

167 messages · Page 1 of 1 (latest)

placid jasper
#

I've been analyzing hunter statistics in Super People and encountered an interesting dilemma in calculating win rates. In this game, multiple teams in the same match can choose identical hunters, which creates a unique challenge for statistical analysis. Should we calculate win rates based on total picks or total matches? Let's explore both approaches.

The Two Approaches

Method 1: Ranking by Wins/Picks

This method calculates win rate by dividing the number of wins by the total number of times a hunter was picked.

Key characteristics:

  • Focuses on individual performance regardless of pick rate
  • Shows more balanced win rates (10-13% range)
  • Top performers: Oath (12.7%), Zeph (11.8%), Brall (11.6%)

Method 2: Ranking by Wins/Matches

This method calculates win rate by dividing the number of wins by the total number of matches played, rather than total picks.

Key characteristics:

  • Win rate = (Number of Wins / Total Matches) × 100
  • Different from Method 1's Wins/Picks calculation
  • Top performers with win rates by matches: Hudson (11.6% in 6,046 matches), Kingpin (10.8% in 5,935 matches), Shrike (10.6% in 6,041 matches)

Real Data Comparison (Recent Analysis)

Method 1

+---------+-------+-------+-------+--------+---------+
| Hunter  | Top 1 | Top 4 | Plcmt | Picks  | Matches |
+---------+-------+-------+-------+--------+---------+
| Oath    | 12.7% | 44.3% | 5.2   | 8,612  | 4,855   |
| Zeph    | 11.9% | 44.0% | 5.2   | 11,828 | 5,472   |
| Hudson  | 11.8% | 42.1% | 5.3   | 20,204 | 6,155   |
| Brall   | 11.7% | 42.5% | 5.3   | 11,154 | 5,295   |
| Celeste | 10.8% | 43.9% | 5.3   | 10,973 | 5,334   |
| Kingpin | 10.7% | 41.7% | 5.4   | 19,107 | 6,071   |
| Bishop  | 10.7% | 44.4% | 5.2   | 7,293  | 4,375   |
| Void    | 10.7% | 42.7% | 5.3   | 7,680  | 4,521   |
| Ghost   | 10.6% | 41.7% | 5.4   | 11,093 | 5,402   |
| Elluna  | 10.6% | 43.4% | 5.3   | 17,098 | 6,092   |
+---------+-------+-------+-------+--------+---------+

Method 2

+---------+-------+--------+-------+--------+---------+
| Hunter  | Top 1 | Top 4  | Plcmt | Picks  | Matches |
+---------+-------+--------+-------+--------+---------+
| Hudson  | 38.9% | 138.1% | 5.3   | 20,204 | 6,155   |
| Kingpin | 33.7% | 131.2% | 5.4   | 19,107 | 6,071   |
| Shrike  | 29.7% | 120.0% | 5.4   | 17,339 | 6,098   |
| Elluna  | 29.6% | 121.8% | 5.3   | 17,098 | 6,092   |
| Joule   | 26.6% | 107.5% | 5.3   | 14,847 | 5,869   |
| Zeph    | 25.6% | 95.0%  | 5.2   | 11,828 | 5,472   |
| Felix   | 25.4% | 101.6% | 5.4   | 14,313 | 5,901   |
| Brall   | 24.6% | 89.4%  | 5.3   | 11,154 | 5,295   |
| Oath    | 22.5% | 78.6%  | 5.2   | 8,612  | 4,855   |
| Celeste | 22.2% | 90.2%  | 5.3   | 10,973 | 5,334   |
+---------+-------+--------+-------+--------+---------+

Discussion Questions

  1. Which metric better reflects a hunter's true strength?
  2. Should popularity influence balance decisions?
  3. How should we interpret win rates for less-picked hunters?
  4. What's more important for balance: individual success rate or overall game impact?

Note: All data is from ranked squad matches

*Source: https://supervive.app/hunters

bronze berry
#

I had a chat with bchang in the past about this

#

There's a bunch of ways you can look at data depending on your sample size

#

What's you need to see is average hunter placement per number of hunters in the lobby

#

The greater the number of duplicates in the lobby the more their placement is weighted towards average placement

#

E.g. with every team having a joule every game, a joule will win every single game, but this obviously doesn't indicate that joule is op

#

And if every team has a joule, her average placement is 5.5, but again this doesn't indicate she's average

placid jasper
#

@bronze berry Yes, Method 1 is the data considering the duplicates what you are saying.

bronze berry
#

The ideal solution is we compare like-to-like always - e.g. we compare Hudson average placement for 1 Hudson in the lobby Vs joule average placement for 1 joule in the lobby, and then for 2 in the lobby, etc. - e.g. plot a graph of average placement against duplicates and compare graphs between hunters

#

The issue with this method is data gets less useful the smaller your sample size

#

If I only have 100 games of exactly 6 Hudsons in a lobby for a given MMR band, the average placement stat is likely too variable to draw a conclusion from it

placid jasper
#

Well, the sample is all included every games of top 100 users

bronze berry
placid jasper
#

You mean you want to see the total picks to calculate manually?

bronze berry
#

I want to see average placement per # duplicates

#

So each hunter has 10 average placement stats, one for each duplicates count

#

The issue with the data you provided is it's all combined

#

Which makes it much less useful

#

Like

#

What's the difference in 40% pickrate

#

Between a hunter getting picked 10 times in 40% of lobbies

#

Vs 100% of lobbies have 4 picks

#

You can't tell the difference

#

In the former, joule comes first place in 40% of all games, and there's no data for the other 60%, so she either has a 40% winrate or a 100% winrate depending on how that gets calculated, but she'll always have a 5.5 average placement

#

If she's picked 4 times every lobby, she's going to get pulled towards 5.5 average placement but not as heavily

#

Anyway ultimately an issue is we do likely need to combine the # duplicates stats in the end, because the sample size will get pretty low per individual stat

#

So what you likely need to do is try to un-weight the duplicates problem by assigning significance to the data

#

For example you have 0 significance in average winrate when there are 10 picks because it gives no info

#

And there's very high significance when there's only 1 in a lobby

#

This might have gone a little too into detail, I'm sorry

#

I guess the overall point I'd make is the given stats we see on websites currently are actually pretty unhelpful for seeing hunter balance

#

The best you can do with this data is look at top 1 & top 4 rate per hunters/games probably

#

It's not gonna end up being too helpful though

#

You just can't unpick the biases in the data like this

#

Hope that helps @placid jasper !

placid jasper
#

Thank you, I'm trying to get avg placement now

#

Ok, I've just updated the data. I added Placement and Picks

bronze berry
#

Ok so you now have total average placement and total average picks

#

That's slightly better data but notice that basically every hunter pulls to 5.5? This is because worse hunters have average performance with low duplicates per match, and better hunters get more duplicates per match, pulling the average to 5.5 by there just being more teams with the hunter

#

You really need to look at average placement per number of picks per game

#

So you need 10 columns - average placement for games with 1 pick, average placement for games with 2 picks, etc.

#

For clarity I would just plot a graph showing all hunters average placement per # duplicates

placid jasper
#

Do you mean like this:

Hunter | Picks/Game | Avg Placement | Games
-------------------------------------------
Joule  |     1     |     3.2      |  100
Joule  |     2     |     4.1      |  150
Joule  |     3     |     4.8      |  200
Joule  |     4     |     5.2      |  180
...
Hudson |     1     |     2.8      |   80
Hudson |     2     |     3.9      |  120
Hudson |     3     |     4.5      |  160
...
bronze berry
#

But as mentioned you'll likely start to run into sample size issues

#

Yes

#

That's perfect

#

As you see though you get to around 100 games sample size per data point here

#

It's just about enough that you can work with but you'll need to be careful to not close up MMR Bands too hard when you look at winrate data

#

Going too low sample size per data point is going to make the data pretty poor quality

placid jasper
#

Yes, if you divide data with picks/game in the table too, then the sample size should be 10x. Otherwise total game count would be so small for each picks/game

bronze berry
#

Mhm

#

This data looks like it's workable though

#

If you can, idk what you're using to query the data, can you plot the data on one graph so we can see?

placid jasper
#

It's simple.
Method 1

$stat->top1 / $stat->total_picks,
$stat->total_placement / $stat->total_picks,
$stat->total_picks,
$stat->total_matches,

Method 2

$stat->top1 / $stat->total_matches,
$stat->total_placement / $stat->total_picks,
$stat->total_picks,
$stat->total_matches,
bronze berry
#

Sorry I'm on my phone

#

You should be able to get a graph kinda like this

placid jasper
#

Interesting

bronze berry
#

You can honestly probably cut off lobbies with 8+ picks because it'll be so statistically insignificant

#

Since it'll always end up so close to 5.5

placid jasper
#

Still need 10x sample for that

bronze berry
#

Yea

placid jasper
#

Maybe we can find another way to find the right statistics without dividing by Picks/Game too

bronze berry
#

Well what you need to do if you want to combine data is to find a relevance formula

placid jasper
#

If we add # picks plot, the data will be too much dispersed

bronze berry
#

You can combine the data from games with 1 pick Vs games with 2 picks

#

But you need to understand that games with 2 picks are going to pull more towards 5.5 than games with 1 pick

#

So you need to multiply the variance from 5.5 if that makes sense
Or just count it less in the weighting

#

This is the distribution of average placement per # of duplicates

#

Assuming every hunter had a truly random winrate

#

The greater the picks per lobby the lower the variance between hunters

placid jasper
#

That graph will definitely give more insights

#

But if you want to not draw a graph but only want to see in a simple table, what about the weight system you mentioned? Like 1 # picks: 1 weight, 2 # picks: 0.5 weght, 3 # picks: 0.33

#

This means 8+ picks data will be less weight for statistics

bronze berry
#

Yeah you can do that
The issue is picking "fair" weight values

placid jasper
#

Do you think that the weight of 2 duplicated picks should not be always 0.5? Do you think you need to find more "fair" weight considering the social situation or balance of game?

bronze berry
#

It would need to probably fall off exponentially

#

But I dunno exactly what the parameters of the exponential drop-off should be

#

Other than 10 should map to 0

placid jasper
#

So like you think 2 # pick should be more than 0.5

#

Kind of 1.0, 0.7, 0.4, 0.35, 0.3, 0.2

#

Ah it's opposite

#

Kind of 1.0, 0.4, 0.25, 0.2, 0.18

bronze berry
#

I don't think I can answer this very well sadly :/ this is about the limit of my stats knowledge

#

I would suggest trying a couple weightings and seeing what looks about right to start with

#

And what ends up being helpful

#

I'm sure there is a method to weight the values correctly

#

I'm just not 100% sure what it is though

#

This is also sadly the same issue TC has with looking at winrate data btw haha

#

Maybe @verbal light could help you out further in how to consolidate average placement per picks into a single value for each hunter in order to preserve sample size

placid jasper
#

Nice

#

Fun talk

#

I've just added the avg placement too in the website though

bronze berry
#

👍

#

Which website?

placid jasper
bronze berry
#

Yeah even this stat is looking like it reflects how strength feels much better than the other stats

#

Even without weighing

#

Zeph oath and Brall end up very low

#

Shrike void etc end up high

#

As in

#

"lower" is "closer to 1st place"

placid jasper
#

Nice

#

Weighted scoring needs huge works cause that needs to be calculated by each matches individually

bronze berry
#

Yeah I bet 😨

verbal light
#

Thanks for the ping, super interesting convo!

TL;DR is that yes, when we look at data to assess power, we usually use some sort of weighted placement. There's a lot of different ways to think about weighted placements, but all of them basically revolve around "cancelling out" mirror placements (e.g. if a Myth places 1st and 10th in a Squads game, it's as if there was no Myth in the game at all)

The tricky parts are that

  1. It's possibly incorrect to assume placements are mirrored like that (A 1st should be weighted higher positively in magnitude than a 10th is measured negatively)
  2. From a competitive perspective, this doesn't account for other ways to gain points (e.g. kills)
  3. At some point, if a character is really strong and popular, there migth be so much cancelling out that we don't have enough data to get an accurate read of their power (this is less an issue now with our larger sample size)

The good news is that most weighting schemes are highly correlated with each other. As for weighing combat power, one thing I've been considering is measuring power based on whatever our ranked point gain criteria is (aka, "Average Ranked Points gained/lost per game"), which is in theory, our "objective" measure of performance in a competitive setting. It's too early to do that right now (especially because we're planning some significant changes to Ranked in Januiary), but I don't think we're too far off.

bronze berry
#

I didn't think too hard about how the scaling is worked out, I just kinda picked something that I think works? Although I'm not very confident in it because I think at n=10 it should technically have to scale infinitely?

#

and whatever is scaling the distribution should also scale the error - since error would also propogate through the normal distribution

#

for values where the error is higher, their values should be taken into consideration less

#

i.e. at 10 teams with 10 picks, the scale should be infinite, error therefore is infinite, so it should be considered by an amount of 1/infinity, aka 0 consideration

#

whereas 10 teams with 1 pick, the scale should be 1, therefore error is just the normal error, highest consideration

verbal light
#

To double check - what is distribution of the graph you sketched (with different colored lines) trying to describe?

bronze berry
#

the probability density function of the average placement for a hunter across # of times picked in a lobby

#

I think at least haha

verbal light
#

gotcha - so colors are # of times hunter is picked, higher number of times picked = more peaked around 5.5
if hunter is picked a lot, their "allowed deviation" from 5.5 is lower, so even small deviation would mean chracter's average placement is too high/low compared to expectations

did i interpret what you were saying correctly?

bronze berry
#

oh the scale should be this probably

#

at n=10 you get 4.5/(11/2-5.5) -> 4.5/0, and at n=1 you get a scale of 1

#

so the weightings should be 1/this, so

#

i'm surprised the weightings would be linear not exponential tbh, but it does kinda make sense, since gaussians transform linearly?

#

I think at this point I'd need to actually check this against a set of data to see whether it looks right

#

So i guess what i would do is:
For each hunter, calculate average placement per # times picked in a match
For each of these 10 values, scale their value about the value 5.5 by 4.5/((n+1)/2-5.5)
the final value is the sum of these values, weighted by the inverse of the scalar amount - ((n+1)/2-5.5)/4.5

#

so ig the distribution for n=5 looks something like this?

#

where the red line would be the expected distribution for 5 hunters being picked, and the green line is how you would stretch that back out to 10 to be "equivalent" to the 1 hunter picked case

#

I'm probably not explaining myself too well 😅 I've not done stats work in a long time and I'm not quite sure what language to use to explain this kinda thing easily haha

verbal light
#

You're explaning it well I think! I'm trying to better understand how you reached the normal distribution plots - can you explain how you got to the value of c here? (denominator specifically)

bronze berry
#

(n+1)/2 is how much "tigher" the distribution ends up per number of picks in a lobby

#

and its trying to transform the "tighter" distribution back onto a distribution which reaches between 1 and 10 (the case for n=1)

#

f(x+u(c-1)/c) is just applying that transformation without adjusting the mean

#

since the mean should always stay 5.5

verbal light
#

oh woops i didn't see the negative sign in the weighting factor - thanks!

bronze berry
#

it should be the same as doing this

#

just a normal distribution with a new variance calculated by scaling the old variance

verbal light
#

yeah - normalization like this is probably the purest way of calculating weighted average placement

#

I'm not sure how we would take this normalized placement and combine it with whatever importance we put on kills/elims

#

I guess we could apply similar normalization on expected RP gains (instead of placement)

bronze berry
#

Yeah you'd probably have to do that, although since kills aren't affected by how many of a hunter is picked in a lobby it might not work the same

#

But it likely works out to a very good representation nonetheless, since you would still expect a normal distribution of ranked points, which tightens as the number of picks in a lobby increases

verbal light
#

Yeah I think similar to placement, if there are 10 hudsons in a game, there's a limit to how many kills the 10th place hudson can get

#

Either way - I think the normalization function seems sound, should be pretty easy for me to try and apply it just to see what it looks like!

#

Very early on, DemApples suggested something similar, I might even have a version of it from back then

bronze berry
#

Oh thats funny! What made you choose to not go with that in the end then?

#

I'm not sure what the advantages/disadvantages of different methods might be

verbal light
#

I think one of the primary reasons is how readily interpretable something is to the average person/player/designer

The more normalizations/weighting you do, it's less intuitive how to understand and utilize the data

#

I think we also found that the normalized values were more or less similar (in terms of ordering of hunter power) to just the "remove the dupes and calculate" method

bronze berry
#

Oh i see! So its just easier to work with and understand what its telling you?

#

And yeah, from what I've seen of the data, it seems like prettymuch however you look at the data, the same outliers are still the same outliers

#

Maybe the important takeaway is that having lots of different ways to look at the same data helps view balance from different angles - average placement is always going to paint a false picture if a hunter is doing something strange like a 50/50 coinflip for coming 1st or 10th

verbal light
#

(that being said, you've made me curious so I'm looking at applying your suggestion to see what things look like)

worldly spruce
verbal light
# bronze berry let me know how it goes o:

Gave it a whirl -

  • Left side is average placement for all games where that particular hunter only showed up 3 or fewer times
  • Right hand is average placement using the normalization/scalar you suggested
  • Both are sorted from lowest placement (most powerful) to highest placement

Overall, results look pretty similar, not just in valence but also in magnitude; some deviations here and there but not by much

(Purposely left numbers out and used a specific sample of data because... I don't want this to become a "Is X Hunter OP?" discussion lol)

bronze berry
#

Oh wow yeah that correlates very well!

#

it looks like its just scaled in a little

#

and the median hunters like #7 to #12 or so I would expect to be mostly sorted by noise rather than there being many real differences anyway

#

so I'm not too surprised they're a little different

#

how interesting!

verbal light
#

Yeah - now that I've got it stood up, I might just keep it in our usual dashboards for a little while, just to see if/when there are any deviations

With the soon-to-come ranked changes, we're probably going to pretty dramatically shift how we assess hunter power (to care meaningfully more about combat in addition to placement), so this will be nice to have in our back pocket to check against whatever changes we make to how we interpret the data

#

Thanks for suggesting this/walking me through your math, this was super fun!