Matching by Numbers

One of the things we’ve seen from past analyses is that TV seems to be the best matching criteria from the variables we have available to analyze – but is it really the best matching system? Can it be improved by somehow including other variables in the matching system?

First off, it’s important to declare what we mean by “best”, and in this case it’s “most even match-up” in terms of the match outcome. We basically want a system to match two teams for a match such that they have as equal a chance of victory as possible, statistically. I’ve been told this isn’t necessarily the most important thing… but lets assume that, for the most part, people tend to enjoy games where they are challenged, but not thoroughly disadvantaged.

Now, another new thing to our analyses is that we’ve got match-level data from the OCC league which is a perpetual play scheduled match league NOT matched by TV. This will give us some extra perspective, so we will be looking at both FUMBBL Box (TV based MM) data and OCC data (separately) to see what differences and similarities we might have in using various variables for match outcome prediction.

Before continuing, let me state that for these analyses I use “mirror matches” which is to say any given match is actually two datapoints – each one being the perspective of one team in that match. Why do this rather than treat each match as a single datapoint? A given match has different results for each of the teams involved, so we can look at outcomes from the standpoint of losers or winners, and we can create more detailed models with greater ease. Ok… onward.

Lets first recap with our comparison of TV difference and games played difference in terms of predicting (correlating with) match victory:

Box TV – r = 0.085, p < 0.01, n = 137448
Box GP – r = 0.075, p < 0.01, n = 137448

OCC TV – r = 0.167, p < 0.01, n = 22990
OCC GP – r = -0.038, p < 0.01, n = 26450

As seen previously, TV is a better prediction of match outcome even in a TV matched environment (which, to some degree is already controlling for that variable) but this becomes even more obvious when we look at a non-TV matched environment. Games played actually negatively correlates with match outcome in OCC… so… lets just toss that puppy out for now.

One thing that the OCC data does have is a team’s ELO rating at the time of a match, so lets look at ELO difference as a predictor:

OCC ELO – r = 0.285, p < 0.01, n = 26450

Well then… looks like ELO is better than TV at predicting outcome in the non TV-matched environment. Maybe that’s the answer? Lets look at some more candidates..

One thing that has been speculated is that Fan Factor could be incorporated in some way. Given that FF is part of TV already, there will certainly be some interaction between TV and FF already. Add to this the fact that FF at time of match is not in any of the datasets and not much has been done with FF thus far. Lets assume, for sake of analysis, that most teams will not be buying FF at creation. The Box data includes change in FF from the match, so we can estimate a team’s FF by applying that change in FF to the total FF for a team’s next match.

Box FF – r = 0.175, p < 0.01, n = 137448

So, using estimated FF, we’re seeing that the difference in FF between two teams is a much stronger predictor of match outcome than TV in a TV-matched environment. FF is included in TV (at 10k per FF), so can we make TV “better” by increasing the amount of TV contributed by FF? Turns out we can! I’ll save you the progression, but the optimal per-FF cost as far as making TV best predict match outcome is 60k per FF. The resulting “TV Plus” difference, applied to box, gives us:

Box TV+ – r = 0.187, p < 0.01, n = 137448

That’s even better, especially when you consider the very low granularity of the match outcome variable (there are only 3 possible states).

But lets think about what FF is for a second. FF has a minimum value of 0, and a maximum value of 18, and there is only a *chance* of FF changing each match. What if we cut out these minimums and maximums and removed the random element… might we have a variable that is even better? Enter WF or “Win Factor” which is simply “wins – losses” for the team at the time of the match:

Box WF – r = 0.285, p < 0.01, n = 137448

Cripes, that’s quite a jump! It’s better than relative FF, and even better than optimizing the cost of FF’s contribution to TV. So then, lets see about using WF as part of TV to improve that further. The optimal cost per WF ends up being 30k per:

Box TV++ – r = 0.291, p < 0.01, n = 137448

That appears to be the best we can get. We notice, however, that TV++ is not all that much better than relative WF. Lets look at TV++ and WF for the average number of games played by a team (or less), which is 10.

Box TV++ <= 10GP – r = 0.348, p < 0.01, n = 70058
Box WF <= 10GP – r = 0.348, p < 0.01, n = 70058

In the range where the most games are played, WF and TV++ have equal power in terms of predicting match outcome. Lets look at matches where TV was 1250 or less (a recent subset used in the discussion of minmaxing):

Box TV++ <= 1250TV – r = 0.401, p < 0.01, n = 51549
Box WF <= 1250TV – r = 0.409, p < 0.01, n = 51549

WF is stronger than a combination of WF and TV in this case. There’s certainly some interaction, though, due to the fact that WF will correlate with FF, and FF is a part of TV, and the environment we’re looking at is, to a degree, controlling for TV. Lets look at OCC data, calculating both TV++ and WF:

OCC TV++ – r = 0.193, p < 0.01, n = 26450
OCC WF – r = 0.407, p < 0.01, n = 26450

In OCC TV++ is not that great a predictor (well, its better than TV alone, or games played) being less predictive than ELO. WF on the other hand is stronger than anything.. and in box WF is also highly predictive (moreso than TV, adjusted or otherwise).

ERRATA: I calculated TV++ wrong when I did these tests, having not noticed that OCC uses different scale for TV than Box data does.  TV++ is actually superior to ELO in OCC at the 30k mark, and superior even to WF/zSum at the 90k mark.

So, that pretty much gives us our conclusion. We can greatly improve perpetual play matchmaking by using what we’ve found to be the most predictive variable in terms of match outcome, as the variable on which to perform the matches:

WF = wins – losses