Frequently Asked Questions


Give me a layman's description of your rating system.

First, to avoid confusion, be aware that I publish two different sets of rankings:

  • the "Massey Ratings", which utilize actual game scores and margins in a diminishing returns fashion
  • the BCS compliant version, which do not use the actual score
I will summarize the latter, since that is the one relevant to college football fans.

Massey's BCS ratings are the equilibrium point for a probability model applied to the binary (win or loss) outcome of each game. All teams begin the season rated the same. After each week, the entire season is re-analyzed so that the new ratings best explain the observed results. Actual game scores are not used, but homefield advantage is factored in, and there is a slight de-weighting of early season games. Schedule strength is implicit in the model, and plays a large role in determining a team's rating. Results of games between well-matched opponents naturally carry more weight in the teams' ratings. The final rating is essentially a function of the team's wins and losses relative to the schedule faced.

How are your BCS ratings different than the others?

This question is ridiculously common. All of the BCS computer rankings are based on wins and losses relative to schedule faced. Most of the differences can be attributed to the particular mathematical model used to generate the ratings. There is no tidy term in a one line formula I can point to as the difference between mine and the others. Here are some small data-related differences:

  • does the rating utilize homefield?
  • does the rating start every team at zero, or with a preseason value?
  • does the rating weight more recent games more?
  • does the rating include all teams, or just FBS?
Overall, the BCS computer rankings probably correlate more than a random selection of six human poll ballots would.

Give a quick bio / resume

Somebody has given me a Wikipedia page. I can't vouch that it is 100 % accurate, but it's a good place to start.

Kenneth Massey is a professor of mathematics at Carson-Newman College in Jefferson City, TN. He received his B.S. from Bluefield College and a master's degree (ABD) in mathematics from Virginia Tech. His research involved Krylov subspaces in the field of numerical linear algebra.

Kenneth is a partner with Limfinity consulting, and produces the Massey Ratings, which provide objective team evaluation for professional, college, and high school sports. His college football ratings have been a component of the Bowl Championship Series since 1999. Massey has also worked with highschoolsports.net since 2008.

Where can I get archived ratings?

Unfortunately I do not have a rating archive system in place. Until that can be implemented, you can use the "Last Week" links from the football and basketball ranking comparison pages.

How do you evaluate a computer rating to know which one is the best?

By nature, a rating system tries to explain past results according to some model of the probabilistic and dynamic aspects of competitive sports. Therefore a rating system may by definition be the "best" according to the objective function it tries to optimize, e.g. to minimize the MSE between actual and model scores (in hindsight).

Depending on the objective, data used, and weightings, some systems may be designed to "predict" future outcomes, instead of merely "retro-dicting" past outcomes. This is a valid approach if you accept the maxim that past results are an indicator of future performance.

On the college football and basketball comparison pages, I compute two metrics: correlation to consensus and ranking violation percentage. These are not meant to assess the quality of a rating system, but only to provide a crude reference point for comparing and contrasting different systems.

Todd Beck's prediction tracker documents predictions made a priori. It has proven difficult for any rating system to consistently be superior to the Vegas lines.

Explain the role that strength of schedule (SOS) plays in your ratings.

Results and SOS are the yin and yang of computer ratings. Simply put, a team's rating measures their performance relative to the schedule they faced. In a true computer rating, rating and SOS are inter-dependent, and are calculated in conjunction with each other. This relationship is implied by the solution to a large system of simultaneous equations, which represents an equilibrium of some mathematical model. Ratings are a function of SOS, and vice-versa.

Many fans are familiar with crude RPI type systems that may account for SOS by the records of opponents and opponents' opponents. A sophisticated computer rating system does not utilize such artificial and ad hoc factors. Instead, because SOS is an implicit part of the model, it accounts for opponent strength to an effectively infinite number of levels. It makes no sense to say that the SOS component is a certain percentage of the total rating. To gain a glimpse of how SOS can be implicit, consider this simple equation:

rating = performance + SOS

Here performance could be related to win-loss record, or some other objective measure of success. Each team has an equation of that form, which may be re-arranged to get:

SOS = rating - performance

Rating and schedule are functions of each other. All possible connections between teams are accounted for as the cream rises to the top and an equilibrium is reached. As a corollary, there is no need for explicit reference to conference affiliation. Conference strength is accounted for automatically as the model surveys the full topology of the team/game network.

Aggressive scheduling is not penalized, but instead raises the potential rating that a team may reach. However, scheduling alone doesn't earn a high rating - there must be success against it.

Faith without works is dead. -- James 2:20

For a single game, it is better to defeat a poor team than lose to a good team. However, that team's ranking may fall because another team had a more impressive win. Depending on the gap in SOS, an 11-1 team with a tough SOS may be rated higher than a 12-0 team that faced an easier schedule. Of course, there are symmetric forces at work toward the lower end of the rating spectrum.

Since the actual rating model involves non-linear equations, the notion of "average" SOS may be misleading. Games between equally matched teams are more influential to a team's overall rating. For example, the #1 team's risk/reward is greater for playing #2 and #80 than for playing #30 and #31.

Does conference affiliation affect the ratings?

I don't do any prior weighting of conferences, and therefore conference affiliation plays no direct role in the ratings. Schedule differences are implicit in the model. Conferences that perform well in inter-conference matchups will naturally be rated higher. Since these games provide a reference point for the entire conference, the rising tide lifts all boats. For this reason, non-conference games are in some sense more significant than conference games.

I want to develop my own rating system. Where do I start?

If you want to research existing rating systems, visit the theory page. In particular, I recommend David Wilson's directory.

To get the feel for how computer ratings work, you may want to try this iterative procedure:

  1. set each team's rating to zero
  2. calculate each team's SOS to be the average rating of their opponents
  3. calculate each team's rating to be their average net margin of victory plus their SOS
  4. go back to step 2 and repeat until the ratings converge
The data page provides data for many leagues.

How does the betting line get set?

Oddsmakers certainly use computer rating models as a tool when setting initial lines for games. However, they also incorporate data related to injuries, officials, matchups, motivation, travel, days off, weather, etc. Actual wagers then determine how the line moves to balance the bets so that the bookmaker has arbitrage.

What software do you use?

I use only open-source software, including the common LAMP framework (Linux, Apache, Mysql, PHP/Python). The main rating software is written in C++. Data is stored in a MySQL database, and rendered on the web with PHP and Javascript. If you would like to write your own rating software, Octave is a good high-level mathematical language.

Why don't the pages display correctly?

The site is written and tested for the Firefox web browser. Limited resources prevent extensive research on compatability issues with other browsers.

How do you deal with forfeits?

Forfeited results are not factored into the computer ratings. If an on-the-field outcome was later forfeited, the original score is used in the calculations, but the result is stricken from the win-loss records.

How do you deal with exhibition games?

Occasionally, especially in college basketball, one team counts a game in official records, but their opponent doesn't. This is a conundrum since we can't really judge the sincerity of strategy and effort in such contests. However, for record keeping practicality, I have decided that a game does not get marked exhibition unless both participating teams deem it so.

Are you satisfied with the BCS formula?

Over the years, the BCS has gotten criticized for fine-tuning its formula. Recent changes have simplified the system for the better and removed extraneous redundancies. The current setup is a good balance of the traditional human polls, which the fan base is most comfortable with, and the objective computer component. Over the years, the two methods have tended to converge as the computers have revealed to the human voters the dangers of regional bias and misunderstanding of schedule strength. There will always be controversy when the formula must split hairs between #2 and #3, but the system is stable and beginning to be accepted by the media and fans.

What would you change?

I think the biggest thing that is hurting college football is the lack of quality inter-conference games. Due to the premium placed by the media on win-loss records, most athletic directors are trying to assure themselves of 3-4 easy home wins each year in their out-of-conference schedule. Great matchups like Texas vs Ohio St in 2005 are few and far between. I would like to see something like an ACC-SEC challenge, whereby teams are matched up for 12 games over one weekend. This would get fans and media excited, and also provide a more solid basis for comparing teams from different conferences.

An increase in good matchups would require a shift in philosophy by the human polls. Currently they start with a preseason notion of who will be good, and adjust each week according to a predictable pattern. They are reluctant to go back and re-evaluate earlier results in light of new information, and thus prior biases are likely to compound as the season goes on. One result is that a 8-3 team that played a brutal schedule is often penalized, while and 11-0 team with a weak schedule is rewarded. Sometimes this strategy backfires (e.g. Auburn 2004), but in general padding ones record with wins means a higher poll ranking. Delaying the release of human polls until mid-season would help minimize bias and develop more respect for teams that play challenging early season non-conference schedules.

Do you favor a college football playoff?

The BCS system is the best college football has ever had to determine an undisputed champion. It is really a two team playoff. So the correct question is whether I favor expanding the number of teams in the playoff. College football is unique among all sports in that every regular season game is vital. Also, the bowl system provides a great reward to players and fans, allowing many schools to finish the year on a high note. A playoff should not ruin this. I would favor a 4 team playoff (the so called "plus one" system), or possibly 8 teams if it was done right, but no more than that.

What is the purpose of this web site?

I maintain this web site primarily as a hobby which creatively combines my interests in math, sports, and computer programming. I enjoy sharing my work with the visitors to my site. The Massey Ratings also serve an official purpose, most notably as a component of college football's BCS.

How long have you done ratings?

My college football ratings were initially posted in 1995 at RSFC. Since then, I have produced ratings for many pro, college, and high school sports. I continually work to improve the content and quality of my ratings and web site. The Massey Ratings have been part of the BCS since 1999.

What is the purpose of the computer ratings?

In any competetive league, there should be an objective and robust method to measure the performance of each team/individual. Win-loss records may be misleading if teams play disparate schedules, and polls suffer from human limitations and subjectivity.

After devising a mathematical model for the sport, an algorithm is implemented, and the resulting computer ratings objectively quantify the strength of each team based on the defining criteria.

How does your rating system work?

There is really no simple answer to this question (although I was once asked to provide one during a live interview on ESPN). Basicly, the ratings are the solution to a large system of equations, which comes from a statistical model and actual game data. For more details, see the Massey Rating Description.

How much time does it take?

I have written a fairly robust software to automate the calculations and web page generation. Daily updates require little intervention on my part. The bulk of my time is spent maintaining data files and writing computer code.

Can I get a copy of your software?

My software is a work in progress, and is not very user-friendly. I hope to one day have a version suitable for the public domain.

How big is your operation?

I am the sole proprietor of the Massey Ratings. Hats I wear include: researcher, developer, programmer, database manager, webmaster, and marketer. Of course, I don't work in a vacuum, and this effort would be impossible without internet resources and generous folks that have contributed over the years. See the credits.

Where do you get your data?

Most scores are collected electronically from a variety of publicly domain sources. Many individuals have graciously shared the results of their own data collection efforts. When convenient, basic consistency checks are run on multiple independent sources to verify the data's accuracy. I have written software that parses web pages and other sources, extracts the pertinent data, and merges it with my own database. Corrections and hard-to-find scores are entered manually.

How often do you update your pages?

Automated daily updates are scheduled for many of the mainstream sports. Each week (usually Monday), I do a full update of all the leagues.

Is the Massey computer system the best?

This depends on what goals you feel a rating system should meet. Should the rating system be predictive, or should it only measure and reward past performance (such as to determine who deserves the college football national championship)? What data is available? How is the model defined? Basicly any rating system can claim to be the best with respect to what it sets out to accomplish.

That said, I believe that the Massey Ratings satisfy all of the desirable properties of a rating system. The sophistication of the model and algorithm is beyond any other method I'm aware of. Every feature of my system is based on sound statistical assumptions regarding the nature of sports and games. There are rarely any skewed or highly abnormal results, and the Massey Ratings are highly correlated with the consensus.

My rating model has undergone several revisions. These changes are necessary to improve the quality of my ratings in light of new ideas, gained experience, and access to more historical data with which to refine the method.

What's the difference between "rating", "ranking", and "poll"?

A "ranking" is simply an ordinal number (such as 1st, 2nd, 3rd,...) that indicates a team's placement in a strictly non-quantitative sense. In contrast, a team's "rating" is generally a continuous scale measurement and must be interpretted on a scale by comparing it with other teams' ratings. For example, I can rank three teams as follows: (1) Team A, (2) Team B, (3) Team C. This tells me that according to my ranking criteria, A is better than B, and B is better than C. However, it does not tell me how much better. If ratings are assigned as (A = 9.7, B = 9.5, C = 1.2), then it is easy to see that in fact A and B are quite competitive while C is significantly inferior.

A poll is fundamentally different from a rating. Polls typically result from the tabulation of votes. For example, each ballot in college football's AP poll is the opinion of one writer who should be #1,#2,#3, etc. So a poll is really a composite of many opionions or preferences. In contrast, a computer ranking is obtained from a single "measurement" of how good each team is based on the defining criteria.

Team A beat Team B, so why do your ratings still have B ahead of A?

This situation is usually called and "upset." It is generally impossible order the teams to eliminate all inconsistencies in actual game outcomes. Teams are not evaluated on the basis of one game, in which there is potential for high deviation from typical performance levels. Instead, a team's rating is based on its "average" level of performance over the entire season.

Your ratings stink! Why isn't my team ranked higher?

The implementation of a computer rating algorithm is completely objective. So if the computer gives your team a bad (or good) rating, it shouldn't be taken personally. You have the right to disagree with the computer, but more than likely this is evidence of your own subjectivity. I do not meddle with the algorithm to "fix" the ratings. The model defines certain criteria that determine a team's rating, and the results are published on this web site without any human intervention.

What about predictions?

For many sports, I post predictions of upcoming games and moniter their success. In most cases, I would trust a computer's prediction over a human's. However, while this is often the most popular and entertaining application of computer ratings, it is not my primary purpose.

Predictions are obtained by extrapolating the analyisis of past performance to estimate future performance. Usually, the past is a resasonable indicator of what to expect in the future. However sporting events are to a great extent random, so upsets will occur. Furthermore, computer ratings are ignorant of many important factors such as injury, weather, motivation, and other intangibles. With this in mind, it is not wise to hold unrealistic expectations of the predictions.

Do you encourage sports wagering using your numbers?

Absolutely not! Please read the disclaimer.

Why do you post three different rating systems?

While the algorithms that produce computer ratings are objective, the choice of the model itself is not. Multiple systems provide the opportunity to compare alternative interprettations of the same data. Although there is general agreement, computer ratings are also quite diverse. The Massey Ratings are my creation, while the Markov and Sauceda models were developed with help from friends of mine.

How did you get involved with the BCS?

I started working on college football ratings as an honors project in mathematics while at Bluefield College in 1995. Continuing this interest as a hobby, I developed a web page and helped pioneer the organization of college football rankings via my comparison. The BCS, which started in 1997, realized the need to expand its sample of computer ratings from three to seven. My web site became a central resource point as the BCS officials searched for quality, respected, and well-established computer ratings. I received a phone call from SEC commisioner Roy Kramer in the spring of 1999 to discuss the prospect of adding my ratings to the BCS formula. Mine were chosen because of their demonstrated accuracy and conformance to the consensus, and my personal expertise in the field.