Sunday, February 16, 2014

Winter Olympics Medal Predictions Gone Wrong

I recently came across an article about an analytics company, MicroStrategy, that used their dashboard to predict the number of medals that each country will win at the 2014 Winter Olympics. They predicted that Canada will win the most medals (35), followed by Germany (31), with the US in 7th place (16 medals). They explain the results in their blog post. I think the company was just trying to do a simple, light-hearted analysis to show off the types of analyses that their product can do. Unfortunately for the company, there are several flaws in their model/analysis.  Two minor flaws have to do with the data itself:

  • In their post, they state "Keep in mind that the algorithm is based on historical data, and doesn’t necessarily reflect more current information such as emerging stars, recent funding boosts, and an unexpectedly large addition of new events to the program."
  • On a related note, they fail to explain how they handled the dissolution of old countries to form new countries. For example, are the medals from past Olympics won by the USSR now contributed to Russia? What about newer countries, such as Ukraine, that used to be a part of USSR?
The biggest problem is over-fitting of the model. Over-fitting occurs when you design a complex model using many variables that is more likely to describe random noise than the true signal. Because the model is describing the noise and not true signal of the historical data, it often results in inaccurate predictions (in this case, inaccurate medal predictions). If you are interested in performing predictions based on a model, one simple way to determine if a model is over-fit is by leaving out a subset of the data, fitting a model, then using that model to predict the data that you left out. Since you know the "truth" of these predicted data, you can assess the prediction accuracy. For example, MicroStrategy could have used all data prior to 2010, fit the model using that data, then predicted medal counts for 2010. We would be able to determine the accuracy by comparing to the true 2010 medal counts. 

The 2014 Winter Olympics are only half way over, but let's compare their predictions to current medal counts for selected countries:


Predicted Medal
Count
Actual Medal
Count (as of 2/16)
Canada
35
14
Germany
31
12
Russia
18
16
USA
16
16
Netherlands
7
17

While it is possible that Canada and Germany can still reach their predicted medal counts, it looks like Russia, USA, and the Netherlands will all greatly surpass their predicted medal count.

Again, I realize that the company was doing this for fun and to generate some press. I just hope that customers don't look at this analysis, see how bad these predictions are, and ultimately decide to not buy the product. This goes to show how a little statistical knowledge can go a long way!