Saturday, September 8, 2012

How accurate are football preseason polls?

I'm a few weeks late on this post, but I think its still worth blogging about.  Every year, the media (especially ESPN) makes such a huge deal about college preseason football polls.  Without any games yet to be played, these polls are little more than speculation.  I wanted to look into how accurate these polls are at choosing that season's national champion.  In this post, I will be using exclusively the AP poll preseason and final results, which can make a difference in years before the BCS when there could be multiple national champs based on the poll used.

This first plot shows the final ranking of preseason top 5 teams since 1990.  The bar furthest to the right shows the teams that were in the top 5 preseason poll but finished the year unranked.

 A few interesting notes:
1. 14 of the past 22 national champions were ranked in the preseason top 5 (I think the last winner outside the top 5 was Auburn led by then-unknown Cam Newton).
2. More national champs were ranked preseason #2 than preseason #1.  This is good news for Alabama who started this season ranked #2 behind USC (but who jumped to #1 after their first win).
3. Looking only at the preseason #1 teams (blue bars), they are more likely to finish the season ranked 3rd than any other rank.  Also, no preseason #1 has finished worse than #16 in the final polls.

Next, I wanted to look whether teams ranked higher in the preseason poll tended to be ranked higher at the end of the season.  A simple way to do this is to look at the median finish of the top 5 preseason teams.

Median Final Ranking since 1990
Preseason Rank    Median Final Rank
1
3
2
3
6.5
4
9.5
5
8

So although the national champions are not always ranked preseason #1, the top preseason teams in general finish higher in the standings than the other preseason teams.  The exception is that teams ranked #5 tend to finish the season ranked better than the #4 preseason team.  There could be some bias causing this result, as there needs to be some tie between teams with the same final season record, and this may be influenced by the preseason rankings.

Finally, I wanted to see how the final rankings of the previous season influence the preseason polls of the next season.  For example, do teams who finish the year #1 tend to be the top ranked preseason team the following year (even though 1/4 of the team likely graduated)?
This plot shows that the preseason top 5 teams tended to finish the previous season ranked highly.  We see that, since 1990, 9 of the 23 teams finishing the previous season #1 were the top ranked preseason team.  This is a somewhat questionable strategy, as only 2 teams have repeated as national champs since 1990: Nebraska in 1994-95 (who was not ranked preseason #1 in 1995) and USC in 2003-04.

In summary, while the top preseason team more often than not does not win the national championship, on average they finish the season ranked better than any other preseason team.

Thursday, August 16, 2012

Swimming and the Fast Suit Aftermath

At the end of 2009, the governing body of competitive swimming, FINA, banned the use of high-tech full-body fast suits (try saying that 5 times!).  See here for a summary.  As proof of the influence of fast suits, all but 2 world records (both men's and women's) were broken in either 2008 or 2009!  The general consensus was that these world records would be untouchable for a long time, and for the most part, that has been true.  However, 8 world records were broken at the 2012 Olympics.  I am now going to answer the question, "Were the winning Olympic times significantly slower than the world records?"

First, let's look at a box plot of the difference between the world record and the winning Olympic time (a value less than zero denotes that the world record was broken).  Note: 2 men's world records were set in 2011, so I am comparing to these current records rather than pre-2010 records.


While the times are generally above zero (slower than WR time), the boxplot whiskers do extend below zero.  There is one clear outlier for the men, and this ocurred when the 1500m free WR was broken by 3 seconds.  Most of the variation is due to events being different distances (50, 100, 200, 400, 800 and 1500m).  To account for this difference, I have normalized all times to 100m (multiply 50m time differences by 2, divide 200m time differences by 2, etc.).  The normalized times are reported in the following box plots.  Now the times are much less variable and there are no clear outliers.


To officially answer our question of whether times were significantly slower without fast suits, I performed a t-test for mean difference.  Our null hypothesis is:

Ho: no difference between average world record time and winning Olympic time.

Leaving out the details, we obtain p-values of 0.18 for the men and 0.26 for the women.  Thus, since these p-values are large (> typical cutoff of 0.05), we fail to reject the null hypothesis.  We can conclude that there is no significant evidence that the winning swimming times in the 2012 Olympics were significantly slower than the world records.  I also repeated the calculations after removing the 3 relays from the analysis and arrived at the same conclusion.

We cannot tell from this analysis if the fast suits has a smaller influence on time decreases as originally thought, or if swimmers are just training harder and getting stronger (I tend to believe the latter).  It's also too early to tell if any of these records will be thought of as unbreakable (example: Phelp's 2008 Olympic performance + fast suit = some really fast world records).  But, I think we can safely conclude that, unlike the steroid era in baseball, world records set in the fast suit era will not require an asterisk.

Monday, August 13, 2012

How Usain Bolt can rival Michael Phelps

OK, so this post isn't exactly statistics related, but watching Olympic coverage talk comparing Usain Bolt to Michael Phelps is ridiculous.  First, let me say that without a doubt, Bolt is the fastest man alive.  His performances are the highlight of track and field Olympics.  But, winning back to back gold medals in the 100 and 200 is nowhere near Phelp's 22 medals (18 gold) over 3 Olympics.  Here are 3 ways, in my opinion, for Bolt to end his career on the same page as Phelps.

1. Compete in at least 4 Olympic games.  Phelps competed as a 15 year old at Sydney, swimming in the 200 fly.  Combined with Athens, Beijing and London, Phelps swam in 4 Olympics.  Bolt is only half way there with 2 Olympics.

2. Win both the 100 and 200 at Rio 2016.  Phelps became the first swimmer to (twice) win gold in the same event in 3 consecutive Olympics (100 fly, 200 IM), while just missing out on three-peating with the 200 fly.

3.  Add additional events.  Every commentator who says that Bolt does not have as many opportunities to race as Phelps should be fired on the spot.  Here are other reasonable events for him to race.

  • 400 m: This is only running 2-200's in a row.
  • 4 x 400 relay: This is the most reasonable race for him to add.  Phelps swims the 4x100 free relay (turning in the 2nd fastest split this Olympics), yet he has never swum the 100 free as an individual event.  Bolt doesn't need to be the fastest 400 runner to win a medal, but be part of the fastest team.
  • 110 m hurdles: Yes, this involves hurdles, but he's tall enough to make it over the hurdles and would definitely be the fastest pure runner in the race.
  • Long jump: Jesse Owens won 4 gold medals in the 1936 Olympics including the 100, 200, 4x100 and long jump.  Bolt has never had a single Olympics as successful as Owen's performance.
  • High jump and triple jump: see above.

I just listed 6 additional events for Bolt to possibly compete in.  Yes, many of the events would take him out of his comfort zone, but winning these off events is what distinguishes legends from greats.  If he added the 4x400 relay with another individual event and won golds in those events, I would then start to think of Bolt as competing on equal footing with Phelps.

And please don't argue that the schedule wouldn't work.  Phelps (and Lochte, Franklin, etc.) won gold medals within an hour of swimming in another final or semi-final.  I've yet to see a top Olympic track athlete push themselves and race multiple finals/semi races in the same day.  So Bolt could be innovative in this manner too.

Finally, Ranomi Kromowidjojo is a female swimmer from the Netherlands won gold in both the 50 and 100 free and silver in the 4x100 free relay this Olympics.  If she wins gold in all 3 events next Olympics and sets a few records in the process, will she be considered the greatest female swimmer ever? NO.  But isn't her event schedule comparable to Bolt's (minus the relay)? YES!