College Basketball Rankings

Generate March Madness Brackets

My Blog

Bracketology

My first system was not very good, but it was a decent starting point and provided the basis for the systems to come. I used it to enter (and sometimes win) small bowl pickem pools. Just recently, in summer of 2014, I got the idea to do the same in basketball. I have been picking brackets ever since 2005 as well, and soon after that I wrote a program which would create brackets for me by flipping "smart coins" based on win probabilities. Up until recently the win probabilities were either estimated or just pulled from historic seed upset probabilities. The idea of using some actual power rankings on which probabilities can be better calculated was appealing. This idea has now been realized here .

There also exists human polls, tendencies, and wisdoms. The following jargon has been thrown around constantly on all classes of sports networks:

- "Team X has a problem with finishing games."

- "It was ugly, but all that matters is they got the W today."

- "Team X will stay #1 until they are beaten."

- "Defense wins championships."

- "Team X's tough nonconference schedule will have them ready for conference play."

- "Team X is in for a shock the first time they play an opponent with a pulse."

- "I like Team X in this matchup, coming off a bye week."

- "This is a trap game for Team X, going into their big showdown next week."

- "Team X might struggle today after the big physical and emotional game last week."

There is probably some measure of truth in all of the above statements. The thing is, how much truth is there, and how can it be quantified? One truth of which we can all be certain is that no system can possibly be 100% accurate. Sports are too random, and their outcomes are dependent on too many variables which simply cannot be predicted. This leads to results which simply don't make sense and seemingly cannot be explained. The goal with my system is to take as many factors into account which I can think of which might have an impact on a game, and combine and weight them in such a way that it at least does the best possible job of predicting games in past seasons. If enough past seasons are included in the analysis, then the randomness starts to fall out and real patterns can start to emerge. It is then an empirical approach that I have: Out of all these ranking methods and wisdoms, which ones seem to hold true over history and which ones are myths?

All three components are run through an iterative process which adjusts them according to strength of schedule until the ratings stabilize. Then a linear combination of the three components is taken as the final rating. The weights of the components are taken such that the historic point spread predictions using that combination are as close as possible to the predicted ones. One might say that instead of being purely based on mathematics, it is more a brute-force historical comparison.

My system currently does not use anything in its calculations besides final scores, venues, and the date of the game. The date is important because more recent results are valued higher than distant results. The system uses some data from previous seasons in its rating early in the season also. The basketball system only uses D1 results, and the football system only uses FBS results (other games are still listed in win/loss column, but have no effect on the rankings).

I'd like to add that I think the idea of providing a single list of rating numbers for each team and ranking them that way (and using the numbers to provide predictions) is inherently flawed. While I do provide such numbers on my site and allow their use in making point spread estimates, I need only refer you to the above list of "human network" questions to debunk this. Factors such as emotional wins, trap games, and just plain rock/paper/scissors situations involving team play styles make such a transitive, single list impossible. In addition to all of this, it is just not true that if Team A beats B by 10 points and B beats C by 10 points, that A should beat C by 20 points. The result is typically inflated to become a 40 point blowout instead. There is also a limit on the upper end, as transitive arguments like this might suggest the #1 team would beat the worst team by 150 points, which obviously just does not happen because of subbing in 2nd string, etc. The solution is to post a set of ratings which does its best job, but to have a separate predictor system which uses different logic. This is currently a work in progress and will hopefully have a prototype ready by Fall of 2016.

The idea is what I call a "layering" approach. This approach would use some combination of the above components to create a "base" rating from which to start, and then apply a different rating approach using those values as the strength of schedule component. For example, I could construct a reasonable base rating, then as a final layer weight only the last N games played with non-zero weights. Another promising "layer" is to weight games higher that are against similar opponents, to mitigate the damage done by 50 and 60 point blowouts against bad teams. Overall I have dozens of ideas for layers that can be added, as well as the idea of applying several layers to the end result.

Ken Pomeroy - For being an early inspiration to me for computer ranking systems, and for fighting the good fight for public acceptance of advanced metrics. Down with RPI!

Ken Pomeroy - For an updated game database for basketball.

Kenneth Massey - For operating a great site and providing a nice introduction to computer ranking systems in general. His Master's thesis, available at that location, was very thought provoking and a good read for anyone interested in this stuff. He also provides easily readable football scores which I use for my rankings.

College Basketball Ranking Composite - Operated by Kenneth Massey, this is a nice place to view and compare different basketball rankings, both computer and human poll based (including my own). There is also a Football Composite.

Bracket Matrix - For hosting my automated basketball bracket predictions (and also just for being awesome and existing).

My Dad, Tom Wilson, for encouraging me in the three passions which collide in making these rankings possible - sports, math, and programming. Thanks to him also for hosting this web domain. BuzzPlugg itself serves as a forum for conversation about beers - see what beers other people like and chime in on one you've had recently that hit the spot! "Plugg" the ones you like, "Nixx" the ones you don't, and the system tracks your favorites. It is all totally free!

James Howell , for providing historical football scores (which are badly needed to do the kind of analysis I need to do!)

I will probably think of others to put here later!

madbuttonmasher@gmail.com