No Sweat Statistics: Conspiracy

Showing posts with label Conspiracy. Show all posts

Tuesday, May 29, 2012

Djokovic and Federer: Semifinal Attraction

For the past 20 grand slam tournaments (5 years, including this French Open), Federer, Nadal and Djokovic have been ranked/seeded in the top 3. Nadal has been either the #1 or #2 seed in all of these tournaments, meaning that either Federer or Djokovic has been the #3 seed. A majority of the time, Murray has been the #4 seed, but there are some exceptions (Soderling, del Potro, Ferrer, Davydenko, Roddick).

Nadal Getting Lucky with Grand Slam Draws?

In tennis, the placing of seeded players in the draw includes some randomness. That is, half the time the #1 seed will draw the #3 seed in the semi-finals, and the other half of the time will draw the #4 seed. This is different from basketball and other tournaments, where the semis will always consist of #1 vs #4 and #2 vs #3 (barring upsets). This means that Federer and Djokovic would expect to be in the same half of the draw (i.e., scheduled to meet in the semis) in 10 of these 20 Grand Slams. In actuality, Federer and Djokovic have been placed in the same half of the draw 15 times (actually playing 7 times in the semis, with Djokovic leading 4-3). This means that Nadal has been scheduled to play the #4 seed in 15 of the last 20 Grand Slams, giving him the easier path to the final. In fact, 5 of the 7 majors won by Nadal in this time span were won when Federer and Djokovic were both on the other side of the draw (only defeating both players in the 2008 French Open). Does this supply evidence that the draw is rigged in favor of Nadal, thus increasing his chances of winning?

This problem can be rephrased in terms of flipping a coin. If we flip a coin 20 times, how likely is it that we see 15 heads (i.e., Fed and Djokovic are on the same half of the draw)? Coin flips are independent, so we can easily calculate the probability of 15 heads to be 0.0148, or about a 1.5% chance. This is very unlikely, but not completely unexpected.

To better illustrate the point, I ran 10,000 simulations where I flip a coin 20 times and record the number of heads, shown in the following graph. These simulations show that we expect to see at least 15 heads about 2% of the time (proportion to the right of the red line), which is similar to the theoretical value of 1.5%.

Nice to See You Again...

Another interesting pattern is that Federer and Djokovic were placed in the same half of the draw for 7 straight Grand Slams (2008 Wimbledon - 2010 Australian Open). How likely is this occurrence? That is, how likely is it to see 7 heads in a row when flipping a coin 20 times? The theoretical calculation is more complicated, so I again ran 10,000 simulations of 20 flips and recorded the maximum number of heads to appear in a row.

The probability of seeing a run of 7 heads (i.e., 7 heads in a row) or more is 0.1112, or about 11%. This means that if you flip a coin 20 times, you will see a run of at least 7 heads 11% of the time, which is probably much more common than you might expect.

There was another stretch of 6 grand slams where Fed and Djokovic were again on the same side of the draw (2012 Wimbledon - 2011 US Open; combined with the previous run of 7, Federer and Djokovic were on the same half of the draw 13 out of 14 grand slams!). What's the probability that the second longest run when flipping a coin 20 times is at least 6? It's about a 1% chance, which is much less common than seeing a run of 7.

Will it Ever End?

While a run of 7 grand slams in a row is not very uncommon, having a second run of 6 or being placed on the same side of the draw 15 out of 20 tournaments is not very likely. I do not actually believe that the draw is rigged, but the frequency of Fed and Djokovic meeting is only expected to occur about 1% of the time.

The Law of Large Numbers tells us that, eventually, we expect Djokovic and Federer to be on opposite sides of the draw 50% of the time. However, as neither player will be playing infinitely many more grand slams, this does not impact the probability at the next Grand Slam. That is, assuming the rankings stay the same, there is still a 50% chance that Federer and Djokovic will be placed on the same half of the draw at Wimbledon, the next Grand Slam tournament.

Wednesday, April 25, 2012

SFS Swimming Conspiracy? Part 3

This is the final part of my trilogy on referee bias against the St. Francis swim team at the state meet. You can read the first two parts here and here.

Is Greg celebrating another district win, or about to smash the trophy on
someone's head after getting DQ'ed at states?

In the previous post, I proved that SFS being DQ'ed 5 out of 61 relays cannot be explained by random chance. So if this large number of DQ's cannot be explained by random chance, is there anything else besides referee bias that can explain this event? Here are some possibilities:

SFS swimmers are more prone to false starts. As mentioned previously, the 5 DQ's were all blamed on different swimmers. In fact, only 2 swimmers were on more than one DQ'ed relay (and they were only on 2 DQ'ed relays). Unless the murky water in the St. Francis pool is causing the swimmers' muscle fibers to twitch too quickly, I don't think the swimmers can explain the difference. But I will back this up with numbers in a minute.
While the swimmers have changed over the past 11 years, the coach, Keith Kennedy, has not. Is Keith to blame for these DQ's, or is he the muse of Lady (Bad) Luck? In my 4 years being coached by him, he never instructed us to false start. In fact, if anything, all of these DQ's have caused him to stress the importance of not pushing the starts. So I'm not buying this explanation either.

You may be thinking that I am too close to the situation to blame Keith or the swimmers. So let me finish with one last analysis that will put the "!" on this debate. In order to qualify for the state tournament, every relay must place high enough at the district tournament to advance. The OHSAA website also shows the Northwest Ohio District swimming results for 8 of the last 9 years (2006 is missing: apparently they are still waiting for one of the other districts to finish swimming before posting results). The coach and, for the most part, swimmers on the relay do not change from districts to states.

For the available data, 15 of the 529 non-SFS relay swims (2.8%) have been DQ'ed at the district meet. This is about twice as frequent as the perennial contenders at the state meet (1.3%), which should be expected as the quality of swimmers at districts are not as high as the swimmers and relays at states. Still, this shows that officials are DQ'ing more relays at the district meet than the state meet. SFS has not been DQ'ed at districts in the past 11 years for a total of 33 relays (even though I don't have the district results for all years, had a relay been DQ'ed at districts, they would not have qualified for the state meet, which I have data for). The probability that a team is not disqualified in 33 relays at the district meet if all relays are swam independently is

(1 - 0.028)³³ = 0.405

Let's do the same analysis, but using the DQ frequency of SFS at the state meet. That is, the probability that a team that is DQ'ed 8.2% of the time at states would go 33 straight relays without getting DQ'ed at districts is

(1 - 0.082)³³ = 0.059

In other words, it is highly probable that a given school will not be DQ'ed in 11 years at districts (probability of about 40%). But schools that are DQ'ed as often as SFS at states would expect to be DQ'ed at least once at districts 94% of the time.

Now, one can argue that because the competition is less at districts than states, that the swimmers play the relay starts safe at districts but push them at states. But I would also expect this to hold for all of the perennial contenders, so this doesn't explain the difference in DQ's between SFS and other perennial contenders at the state meet. One could also argue that there is a referee bias for the perennial contenders at districts. Why? The officials may want the district to have a good showing at the state meet, so it would hurt the district to DQ one of the top relays even if there really was a false start. However, it is tough to quantify this bias in favor of SFS at the district meet.

This look of the district data should show that the large number of DQ's is most likely not fully explained by the coach or swimmers, as you would expect to see similar DQ patterns at the district meet. This three-part series conclusively shows that it is extremely unlikely that there is no referee bias against the SFS swim team at the state meet.

If you enjoyed this series, help out my ego and leave a comment or subscribe to follow my blog.

Monday, April 23, 2012

SFS Swimming: Conspiracy? Part 2

In the previous post, I introduced data showing a possible referee bias against the Toledo St. Francis de Sales (SFS) swim team. In this post, I will use the binomial distribution to show how unlikely it is that SFS would be DQ'ed in 5 of the 61 relay races (8.2%) at the state tournament.

First, let's see how likely these 5 DQ's for one team is compared to all boy's Division 1 relay swims. I previously showed that the frequency of DQ's for all non-SFS teams is 22/1259 (1.7%). Using the Binomial distribution with n = sample size = 61 and p = probability of DQ = 22/1259, the probability of a team being DQ'ed 5 or more times is 0.0043. This means that if 1,000 teams would each swim 61 relay races at the state tournament, we would only expect 4 of the 1,000 to be DQ'ed at least 5 times. This seems very improbable, so we can conclude that the probability of being DQ'ed 5 times out of 61 cannot be explained by random chance.

In the last post, I also compared SFS to the other 4 "perennial contenders", teams that have finished the in the top 3 team standings at least 3 times in the past 10 years. I previously argued that these teams are well coached and accustomed to regularly winning, so it makes sense that these teams should be DQ'ed less frequently than all state relay teams combined. I showed that this is true, as the frequency of a relay DQ for the non-SFS teams is 1.3%. Using the binomial distribution again, the probability that a perennial contender would be DQ'ed 5 out of 61 swims is 0.0012. This means that if 1,000 perennial contender teams would each swim 61 relay races at the state tournament, we would only expect 1 of the 1,000 to be DQ'ed at least 5 times. This is even more improbable than the previous analysis, so we can conclude that the probability of a perennial contender being DQ'ed 5 times out of 61 cannot be explained by random chance.

I have included the following plot to show the probability distribution of a perennial contender being DQ'ed out of 61 relay swims. This shows how highly unlikely it is that a perennial contender is DQ'ed even 4 times out of 61.

This analysis seems to confirm the suspicions that all SFS swimmers the past decade have had: that there is a referee bias against us. However, the binomial distribution assumes that all trials (relay races) are independent. This is surely not true, as many swimmers swim in multiple relays. Additionally, each team has a different coach, so they may have different strategies and been trained differently. I will discuss this last point in more detail in the next post.

The final part of this trilogy will try to identify other possible explanations for this large deviation from the expected number of DQ's for a perennial contender.

Sunday, April 22, 2012

SFS Swimming: Conspiracy? Part 1

Back in high school (2001-2004), I was a varsity swimmer at Toledo St. Francis de Sales (SFS). The swim team has quite a history: 4 state titles (1967, '68, '96, '98) and 46 of the past 47 district titles. In the past 3 years, the SFS swim team has placed 2nd (2010), 2nd (2011) and 3rd (2012). So what I'm trying to say is that our team is badass. And because we're badass, people obviously want to see us fail. You may ask, "How can there be a conspiracy theory against the SFS swim team?" This post and the follow-up posts will use statistics to prove a referee bias against SFS.

How can referee bias enter swimming? The easiest way is to disqualify a relay team by way of false start, where an official claims that one of the swimmers leaves the block before the previous swimmer touches the wall. The state swimming results for the past 11 years are available on the OHSAA website. There are 3 relays (200 yd Medley and 200 and 400 yd Free). 24 teams swim in prelims and the top 16 come back for finals, with the top 8 swimming in the championship heat and teams 9-16 competing in the consolation heat. This means that each team can swim 6 relays in a given year at the state meet. Here is a table of the number and frequency of disqualified relays at the D1 boys' state meet by school over the past 11 years:

**Number of DQ's from 2002-2012**
School	# Disqualifications	# Swims	Percent DQ'ed
SFS	5	61	8.2%
Solon	3*	39	7.7%
Centerville	2**	58	3.4%
New Albany	2	27	7.4%
15 other teams	1 each
Total	27	1320	2.0%
Total (minus SFS)	22	1259	1.7%

* = DQ'ed twice in 2006

** = DQ'ed twice in 2005

The frequency of DQ's for SFS (8.2%) isn't much more than that of Solon or New Albany, but it is much larger than the total number of DQ's (1.7%) for all non-SFS teams over the past 11 years. The number of SFS disqualifications looks very fishy (pun intended), especially considering that these 5 DQ's occurred in different years. Here's the 5 SFS relays that were DQ'ed:

**Breakdown of SFS DQ's**
Year	Event	Prelims/Finals
2002	200 Medley	Prelims
2003	200 Free	Finals
2006	200 Free	Finals
2009	200 Medley	Finals
2012	200 Free	Finals

There were 2 common swimmers on the 2002 and 2003 relays that were DQ'ed (neither of which is me, I swear!), but if my memory serves me right, different swimmers were "blamed" for the DQ's (officials need to state which swimmer false started). So this result cannot be blamed on one bad relay swimmer on all 5 relays. I would also like to point out that SFS won 5 of the 6 relays in 2010-2011 and would have won the 200 Free relay in 2012 by 0.7 seconds (a fairly large margin for the event) had they not been DQ'ed. So unlike me, these kids know how to swim fast.

Over the past 10 years (I'm missing the final standings for 2002), there have been 5 different schools to finish in the top 3 at the state meet at least 3 times. Let's call these teams the perennial contenders. My thinking is that these teams are used to performing well at the state meet, so the more experience should result in fewer DQ's.

**List of Perennial Contenders**
School	Top 3 State Finishes	Relays Won	DQ's	Swims	Frequency
Cincinnati St. Xavier	10*	13	0	66	0
Upper Arlington	7	6	1	63	1.6%
Columbus St. Charles	4**	4	1	64	1.6%
SFS	3	5	5	61	8.2%
HV University School	3***	0	1	37	2.7%
Total	28	28	8	291	2.7%
Total (minus SFS)	25	23	3	230	1.3%

* St. Xavier has won 10 of the last 11 state titles.
** Columbus St. Charles won the state title in 2008.
*** University School dropped to Division 2 in 2009. Results are for Division 1 swims only.

This table confirms my guess that the top teams are DQ'ed less often than "ordinary" relay teams. This also demonstrates that St. Francis has been DQ'ed 6 times more frequently than all of the other perennial contenders (and more times than the other 4 schools combined).

In this post, I provided the background and data for the analysis. In the next post, I will show that this high number of SFS DQ's is statistically significant, implying a referee bias against SFS.