I believe that the Massey Ratings are the most scientific and full-featured system
available for analyzing the performance of members of a competitive league.
Future improvements are likely in an effort to further refine the model to
accurately reflect the nature of sports.
The second goal is to account for the differences in schedule. When there is a large
disparity in schedule strength, win-loss records lose their significance. The computer
must evaluate games involving mismatched opponents, as well as contests between well
matched teams.
It is neccessary to achieve a reasonable balance between rewarding teams for wins,
convincing wins, and playing a tough schedule. This issue is difficult to resolve,
and rating systems exist that are based on each of the extremes.
Similarly, a team's
Defense power rating reflects the ability to prevent its opponent from scoring.
An average defense will be rated at zero. Positive or negative defensive ratings
would respectively lower or raise the opponent's expected score accordingly.
It should be emphasized that the Off/Def breakdown is simply a post-processing
step, and as such has no bearing on the overall rating. A consequence of this
is that the Off/Def ratings may not always match actual production numbers.
A team that routinely wins close games may have somewhat
inflated Off/Def ratings to reflect the fact that they are likely to play well when
they have to. Winning games requires
more than just the ability to score points, but also teamwork, mental
strength, and consistency.
The Off/Def breakdown is simply an estimate of how much of
a team's strength can be attributed to good offensive and defensive play
respectively.
The Parity column measures how well matched the teams in a conference are.
A value of 1 indicates perfect parity - there is complete balance from top to
bottom. In contrast, a parity near 0 indicates that there is great disparity
between the good and bad teams in the conference.
The ratings are totally interdependent, so that a team's
rating is affected by games in which it didn't even play. The solution therefore
effectively depends on an infinite chain of opponents, opponents' opponents,
opponents' opponents' opponents, etc. The final ratings represent a state of
equilibrium in which each team's rating is exactly balanced by its good and bad
performances.
Goals
The first challenge for any computer rating system is to account for the variability
in performance. A team will not always play up to its full potential. Other random
factors (officiating, bounce of the ball) may also affect the outcome of a game.
The computer must somehow eliminate the "noise" which obscures the true strength of a team.
Inputs
Only the score, venue, and date of each game are used to
calculate the Massey ratings. Stats such as rushing yards, rebounds, or field-goal
percentage are not included. Nor are game conditions such as weather,
crowd noise, day/night, or grass/artificial turf. Overtime games are not
treated any differently.
Finally, neither injuries nor psychological factors like motivation
are considered. While none of these are analyzed explicitly, they may be
implicitly manifested through the game scores.
Predictions
The Massey Ratings are designed to measure past performance, not necessarily to
predict future outcomes.
Rating
The overall team rating is a merit based quantity, and is the result
of applying a Bayesian win-loss correction to the power rating.
Power
In contrast to the overall rating, the Power is a better measure of
potential and is less concerned with actual wins-losses.
Offense / Defense
A team's Offense power rating essentially measures the ability to
score points. This does not distinguish how points are scored, so good defensive
play that leads to scoring will be reflected in the Offense rating.
In general, the offensive rating can be interpretted as the number of points a team would
be expected to score against an average defense.
Home Advantage
Each team's home advantage is estimated based on the difference in performance
when at home and on the road. Ratings and schedule strength both depend on where
the games are played.
Schedule
The difficulty of each team's schedule is measured in the Sched column.
It depends on the quality of each opponent, adjusted for the homefield
advantage. More details are available. Note that
the schedule strength only represents games played to date.
Standard Deviation
As was mentioned before, the Massey model will in some sense minimize
the unexplained error (noise). Upsets will occur and it is impossible (and also
counter-productive) to get an exact fit to the actual game outcomes. Hence,
I publish an estimated standard deviation. About 68% of observed game results
will fall within one standard deviation of the expected ("average") result.
Conference Ratings
Below the team ratings, you will find a listing of the leagues, conferences, and
divisions. The win / loss records include only inter-conference games. For
conference games, the win / loss percentage will always be 50%, so it is
not beneficial to include them. A conference's rating is determined by averaging
the ratings of its members.
Preseason Ratings
Preseason ratings are typically derived as a weighted average of
previous years' final ratings. As the current season progresses, their
effect gets damped out completely. The only purpose preseason ratings
serve is to provide a reasonable starting point for the computer.
Mathematically, they guarantee a unique solution
to the equations early in the season when not enough data is available yet.
Ratings Overview
In essence, each game "connects" two teams via an equation. As more games are played,
eventually each team is connected to every other team through some chain of games.
When this happens, the system of equations is coupled and a computer is necessary to
solve them simultaneously.
Game Outcome Function (GOF)
Given the score of a game, GOF(pA,pB) assigns a number between 0 and 1 that
estimates the probability that team A would win a rematch under the same
conditions. Based on previous experience, it seems reasonable to distinguish
between a 10-0 win and a 50-40 win. A close high scoring game is likely to have
more variance, and less likely to be dominated by either team. While a low
scoring game may indicate a defensive struggle, or poor game conditions. In
which case, a small deficit is more difficult to overcome. Sample GOF values
are listed below:
A's points (pA) | B's points (pB) | GOF(pA,pB) |
30 | 29 | 0.5270 |
10 | 9 | 0.5359 |
27 | 24 | 0.5836 |
27 | 20 | 0.6924 |
50 | 40 | 0.7292 |
10 | 0 | 0.8548 |
30 | 14 | 0.8786 |
45 | 21 | 0.9433 |
45 | 14 | 0.9823 |
30 | 0 | 0.9920 |
56 | 3 | 0.9998 |
Each game score is plugged into a GOF that outputs the estimated probability that team A would win if the game were played again under the same conditions. This is independent of any other information since it involves only that one game in isolation. For example, it may be determined that the winner in a 30-14 game has a 88% chance of winning a rematch, while a 27-24 winner only has a 58% of winning again.
Notice that a diminishing returns principle is manifested in
this GOF. There is some advantage to winning "comfortably," but
limited benefit to running up the score.
A team will not be penalized just for playing a weak opponent (although it becomes much
harder to improve its rating by blowing someone out).
Let p = Prob(A beats B) = F(rA,rB,hA,hB), where rA,hA and rB,hB are ratings and home advantages
of teams A and B respectively. F is a function of rA,rB,hA,hB that is based on the
CDF of a normal random variable.
All the game scores are translated to a scale from 0 to 1 by the GOF.
Let g = GOF(pA,pB), where pA and pB are the points actually
scored by teams A and B in a particular game.
A nonlinear function of the teams' ratings is formed by multiplying terms that look like:
Here ^ denotes an exponent. Also note that 0 <= p,g <= 1.
By maximizing the resulting function, maximum liklihood estimates (MLE) are obtained for
the ratings and home advantages. The optimization problem may be solved with standard
techniques such as Newton's method.
Preseason ratings may be implemented via prior distribution factors in the
optimization function. Their importance diminishes as the season progresses,
and they are negligable by the end of the year. A strong prior distribution must be used to
compensate for lack of enough single season data for the home advantages.
Time weighting is a debatable practice, however I believe that more recent
games are generally better indications of a team's true strength.
An exponential decay based time weighting is applied by premultiplying
g by some weight w.
The MLE ratings are used to create
a prior distribution, which encodes the estimate of a team's
strength based on looking at its game scores alone.
The Bayesian correction is computed as an expected value using
the actual wins and losses (and who they were
against), combined with the prior distribution. This helps account
for the possiblity of correlating performances (a team playing up or
down to its opponent).
The advantage of the Bayesian approach is that it rewards teams that
win consistently, no matter how they do it. The more games a team wins, the
more confident the computer can be that scores are not so important.
Ratings are less likely to be negatively impacted by beating a poor team.
Furthermore, games involving well-matched opponents will naturally be given
priority in determining the overall ratings.
Calculate Ratings
Each team's gametime performance
is assumed to be normally distributed about a certain mean (its rating).
The probability that team A would defeat team B is then determined from
the cumulative distribution function (CDF) associated with a normal random
variable.
Bayesian Correction
The results obtained by the MLE will be predictive in nature since they are
based entirely on the scores of games and contain no provision for teams
that win, but don't always win big. Other teams will tend to perform in
a way that is highly correlated with the strength of their opponent.
Differences in style, coaching philosophy, and performance in close games can
easily be overlooked if we look at scores alone.
Postprocessing
After the ratings and home advantages have been determined, the following additional steps
are taken:
Output
My code implements a markup language that allows me to generate multiple web pages
automaticly.
Contact | Massey Ratings | Theory