Board 8 > Adjusting the contest scoring system based on prediction percentages.

Topic List
Page List: 1
NFUN
05/19/20 10:59:31 AM
#1:


ZeldaTPLink posted...

Greetings!

In the past few weeks, both in this board and Discord, I've noticed a discussion about whether the current points scheme of the contests is fair. I refer to the system that awards points for each round in geometrical progression, which is the following way:

Round 1 - 1
Round 2 - 2
Round 3 - 4
Round 4 - 8
Round 5 - 16
Round 6 - 32
Round 7 - 64

The argument I've often seen against this system is that it places too much weight on later matches, in the sense that someone who did very well in early rounds would lose to someone who did not but who got a single great upset in the later rounds. We have seen that in this contest with the Skyrim vs Witcher match, whose 32 point prize made entire early rounds pointless.

This system does follow a mathematical logic, which is that, assuming each game has an equal chance of winning any match, the probability of a given game winning a match goes down by half each round. Therefore, it makes sense for that match to be worth double points. But this logic skips over the fact that not all matches are made equal: while Rocket League vs DBZ was a very debated match, Dark Souls beating Hotline Miami and then beating the winner of that debated match was all but a foregone conclusion. Even the most casual bracket makers would overwhelmingly agree on that. Yet, the Rocket League vs DBZ match is only worth 1 point, while the winner vs Dark Souls is worth 2.

Arguments in favor of this system include its simplicty, and the fact it's based on systems used in bracket contests of real life sports tournaments. Whoever, a sports tournament can have much more variance than a bracket contest, since it depends on human parformance in the moment, while the tastes of a gaming community are much easier to predict based on sales, reviews, overall word of mouth, and previous contests. Hotline Miami will never beat Dark Souls, except with a rally, and rallies big enough to flip such a match around are extremely rare.

Based on this argument, I had an idea to calculate what would be a fairer scoring scheme. Turns out we actually have data on the difficulty of predicting matches for each round, and it's the prediction percentages available in the Contests Stats page! Using that data, I've made a formula to calculate the ideal points for each round. I'm setting Round 1 with 1 point as the standard, then calculating the score for later rounds by dividing the average prediction % of Round 1 by the average prediction % of each round.

DISCLAIMER: the goal of this topic is not to question the merits of any contest winners or winners of any side contest, of this year or any other year. Everyone who have won contests here made a great bracket. The goal is to propose an improvement the scoring system so that future contests award points based on a more accurate measuring of the relative difficulty of each match.

So let's get to the numbers already. Based on the formula I just explained, here is what the points for the recent Game of the Decade should look like:

(note: decimals appear with commas instead of points because that's the Brazilian standard and it's what my Excel is set to. Just pretend you are seeing points instead)



That's very different from a geometrical progression, huh?

Looking closely, Rounds 1 to 4 seem to follow something similar to an arithmetical progression, with each round adding 0.3 points. Round 5 onwards is when it gets wonky, though. I suspect one reason is the BotW effect: as BotW starts making for a larger fraction of the round, the round itself becomes easier to predict. Another thing is that in this year, Round 4 was where most debated matches happened, while Round 5 and beyond were fairly chalky, with the exception of the Skyrim matches.

Still, that doesn't give us an accurate representation of what the diffficulty for each round should be, so I decided to dig deeper. I made the same analysis for a few other contests. For the sake of simplicity, I restricted myself to contests that have 128 entries, 1v1 matches, and 7 rounds. That means GotD1, Character Battle 2010 and Best Game Ever 2015. Here are the results.



BGE3 is also pretty wonky in the later rounds. That said, the first 4 rounds do show a good degree of consistency.

That 77 in the finals has an obvious reason: Undertale. While this means we can't really take that score as our standard, it does offer a good perspective. 77 is not much above 64, so this shows what it takes for a finals match to actually be worth the 64 points we normally award them: a turbofodder indie game almost nobody heard of getting a Tumblr-fueled rally and winning 7 upsets in a row until it beats Ocarina of Time and wins the contest. Not something that is too likely to happen again, imo. And even then, previous rounds are way below their normal awarded points in difficulty, thanks to being populated by obviously strong games instead of fodder.

Round 5 is worth more than Round 6, and the reason for that is that R6 consisted of Undertale and Ocarina of Time, while R5 was Undertale, Ocarina, Meelee and Super Mario RPG, so on average, R5 has more crazy upsets, including two mega rallies.



The first game of the decade gives us the more smooth results. Rounds 6 and 7 feel like they spiked a bit more than the usual, but hey, those are the last two rounds so maybe they should do that! And it's still a lot less than 32 and 64. This can also be explained by the fact this contest is famously one of the least predictable ones we've had, with legendary results such as Brawl beating Melee and then losing to Majora's Mask. Also rounds 1 to 3 seem pretty similar to the two previous charts, while 4 and 5 go a big higher.



I didn't think I would see a contest chalkier than GotD2, but here we are. The finals are only twice as hard to predict as the average Round 1 match, which makes sense because, well, it's Link > Cloud. Although there are crazy results here and there (i. e. Charizard), they get dampened by the majority of the bracket being a standard Noble Nine, 1v1, no items, final destination story. This is also the easiest Round 4 of the pack, being even easier than Round 3 somehow.

This makes some sense if you think this is the 8th character battle during a time spam of 9 years. There was a ton of data to make predictions from, such as a quick read on the board or the wiki could give someone an idea of what is likely to win here.

(on an unrelated note, this does give us a good idea of what will happen if the next Character Battle doesn't make any big innovation in terms of what characters are in, such as an All Fictional bracket. Expect that contest to even more predictable and have fewer upsets than this year's Game of the Decade).

@ZeldaTPLink

---
You shine, and make others shine just by being near them.
... Copied to Clipboard!
NFUN
05/19/20 11:00:43 AM
#2:




ZeldaTPLink posted...
Conclusions:

One one hand, this research failed to provide a realiable measure of what scoring system will most accurately reflect the difficulty of each round, due to not having a ton of contests to pick from (I could look at other contests, but then I'd have to make arbitrary adjustments for the different bracket sizes). On the other hand, I believe it gave us a great sense of scale: we can see that even in particularly unpredictable contests, the current system still gives way too much weight to late rounds compared to their actual prediction difficulty. A finals match, in order to be actually worth 64 points, should have a prediction rate of 1.56% (assuming Round 1 had 100%, otherwise it should be lower), the semifinals should have 3.13% on average, and so on. And what we usually see instead for later rounds are prediction %s in the double digits. And this is all taking in consideration the fact I'm using data for the overall brackets submitted, not just gurus or B8.

If I had to take a guess at an actual system, my instinct is to take GotD1 as the standard, since it's the one that looks the most neat. I'll then multiply all the numbers by 10 and do some rounding, to get more manageable numbers:

Round 1 - 10
Round 2 - 13
Round 3 - 18
Round 4 - 28
Round 5 - 40
Round 6 - 100
Round 7 - 180

If you think the last two rounds are two high compared to the first one, then you should assume it's because Majora and Brawl getting to the finals is a crazier result than average. In that case, you could settle for something chalkier, and reduce those numbers a bit. I made that adjustment, and also did a little more rounding for early matches.

Round 1 - 10
Round 2 - 15
Round 3 - 20
Round 4 - 30
Round 5 - 45
Round 6 - 75
Round 7 - 120

So, what do you think? Do you agree with this analysys? Do you have a better idea of that system would work best? Give me your opinions, and thanks for reading this!


---
You shine, and make others shine just by being near them.
... Copied to Clipboard!
Steiner
05/19/20 11:02:36 AM
#3:


in conclusion that's why ulti came crying to me like a baby about not being able to chat to his discord pals who hate him

---
Advokaiser makes me feel eternal. All this pain is an illusion.
... Copied to Clipboard!
NFUN
05/19/20 11:03:06 AM
#4:


steinerpls

---
You shine, and make others shine just by being near them.
... Copied to Clipboard!
Mr Lasastryke
05/19/20 11:04:35 AM
#5:


oh no the drama is in this topic too

time for a third one

---
Geothermal terpsichorean ejectamenta
... Copied to Clipboard!
Tom Bombadil
05/19/20 11:10:49 AM
#6:


click click click

---
https://imgtc.com/i/uWMMlnN.png
Radiant wings as the skies rejoice, arise, and illuminate the morn.
... Copied to Clipboard!
ZeldaTPLink
05/19/20 11:12:50 AM
#7:


what is this thread for, specifically

---
There is only one Guru of the Decade, and his name is azuarc. Congratulations!
... Copied to Clipboard!
Emeraldegg
05/19/20 11:19:11 AM
#8:


I think cause nfun has ulti blocked so ulti shouldn't be allowed in here
---
I'm a greener egg than the eggs from dr. seuss
... Copied to Clipboard!
NFUN
05/19/20 11:40:11 AM
#9:


ZeldaTPLink posted...
what is this thread for, specifically
what egg said plus it isn't half drama so it'd be less excruciating to read

---
You shine, and make others shine just by being near them.
... Copied to Clipboard!
Topic List
Page List: 1