No Sweat Statistics: 2014

Tuesday, April 15, 2014

St. Louis Rams $100K Giveaway

It was announced today that the St. Louis Rams are offering $100,000 to anybody who correctly predicts their 2014 schedule. The 16 opponents are known, so you must guess which opponent the Rams play each week, along with their bye week. To make things more difficult, you must also predict if the game is played on Sun (which most are), Monday (typically 1 game each week), or Thursday (typically 1 game each week).

I did a quick back-of-the-envelope calculation to determine the approximate odds of winning such a contest and came up with 1 in 45,193,226,156,719,200 which is about "45 followed by 15 zeros". Here's how I got this number:

First, let's choose the bye week. Last year no teams had a bye in weeks 1-3 or 13-17. This leaves 9 choices for the bye week.

Next, we can place the 16 opponents among the remaining 16 weeks. This can be done 16! ways (reminder: 16! = 16 * 15 * 14 * ... * 2 * 1). That's a lot of choices. However, the Rams play the 3 teams in their division twice each, and we can assume that they will not play the same team in consecutive weeks. Accounting for the bye week, this removes about 15 * 14 * 3 = 630 possibilities (although there is some double counting that I'm ignoring).

Finally, we need to select the day of the game. On average, each team plays one Sunday night game and one Thursday night game. For simplicity, we will assume that the Rams will play exactly 1 Thurs. night and one Mon. night game. Ignoring the bye week, this leaves 16 * 15 = 240 possibilities. (In reality, games that are thought to be "better" are on Monday night and games that are thought to be less interesting are played on Thursday. Therefore, the odds are probably slightly different).

To get the final number of possibilities, we need to multiply the different possibilities together: 9 * (16! - 630) * 240 = 45,193,226,156,719,200. Although this is slim chances, I still took 5 minutes to fill out the schedule. So here's to hoping I'm $100,000 richer when I write my next post!

P.S. - I'm sure I made some silly mistake, so please comment if you notice one and I'll get it fixed.

Sunday, February 16, 2014

Winter Olympics Medal Predictions Gone Wrong

I recently came across an article about an analytics company, MicroStrategy, that used their dashboard to predict the number of medals that each country will win at the 2014 Winter Olympics. They predicted that Canada will win the most medals (35), followed by Germany (31), with the US in 7th place (16 medals). They explain the results in their blog post. I think the company was just trying to do a simple, light-hearted analysis to show off the types of analyses that their product can do. Unfortunately for the company, there are several flaws in their model/analysis. Two minor flaws have to do with the data itself:

In their post, they state "Keep in mind that the algorithm is based on historical data, and doesn’t necessarily reflect more current information such as emerging stars, recent funding boosts, and an unexpectedly large addition of new events to the program."
On a related note, they fail to explain how they handled the dissolution of old countries to form new countries. For example, are the medals from past Olympics won by the USSR now contributed to Russia? What about newer countries, such as Ukraine, that used to be a part of USSR?

The biggest problem is over-fitting of the model. Over-fitting occurs when you design a complex model using many variables that is more likely to describe random noise than the true signal. Because the model is describing the noise and not true signal of the historical data, it often results in inaccurate predictions (in this case, inaccurate medal predictions). If you are interested in performing predictions based on a model, one simple way to determine if a model is over-fit is by leaving out a subset of the data, fitting a model, then using that model to predict the data that you left out. Since you know the "truth" of these predicted data, you can assess the prediction accuracy. For example, MicroStrategy could have used all data prior to 2010, fit the model using that data, then predicted medal counts for 2010. We would be able to determine the accuracy by comparing to the true 2010 medal counts.

The 2014 Winter Olympics are only half way over, but let's compare their predictions to current medal counts for selected countries:

	Predicted Medal Count	Actual Medal Count (as of 2/16)
Canada	35	14
Germany	31	12
Russia	18	16
USA	16	16
Netherlands	7	17

While it is possible that Canada and Germany can still reach their predicted medal counts, it looks like Russia, USA, and the Netherlands will all greatly surpass their predicted medal count.

Again, I realize that the company was doing this for fun and to generate some press. I just hope that customers don't look at this analysis, see how bad these predictions are, and ultimately decide to not buy the product. This goes to show how a little statistical knowledge can go a long way!