No Sweat Statistics: Novak Djokovic

Tuesday, May 29, 2012

Djokovic and Federer: Semifinal Attraction

For the past 20 grand slam tournaments (5 years, including this French Open), Federer, Nadal and Djokovic have been ranked/seeded in the top 3. Nadal has been either the #1 or #2 seed in all of these tournaments, meaning that either Federer or Djokovic has been the #3 seed. A majority of the time, Murray has been the #4 seed, but there are some exceptions (Soderling, del Potro, Ferrer, Davydenko, Roddick).

Nadal Getting Lucky with Grand Slam Draws?

In tennis, the placing of seeded players in the draw includes some randomness. That is, half the time the #1 seed will draw the #3 seed in the semi-finals, and the other half of the time will draw the #4 seed. This is different from basketball and other tournaments, where the semis will always consist of #1 vs #4 and #2 vs #3 (barring upsets). This means that Federer and Djokovic would expect to be in the same half of the draw (i.e., scheduled to meet in the semis) in 10 of these 20 Grand Slams. In actuality, Federer and Djokovic have been placed in the same half of the draw 15 times (actually playing 7 times in the semis, with Djokovic leading 4-3). This means that Nadal has been scheduled to play the #4 seed in 15 of the last 20 Grand Slams, giving him the easier path to the final. In fact, 5 of the 7 majors won by Nadal in this time span were won when Federer and Djokovic were both on the other side of the draw (only defeating both players in the 2008 French Open). Does this supply evidence that the draw is rigged in favor of Nadal, thus increasing his chances of winning?

This problem can be rephrased in terms of flipping a coin. If we flip a coin 20 times, how likely is it that we see 15 heads (i.e., Fed and Djokovic are on the same half of the draw)? Coin flips are independent, so we can easily calculate the probability of 15 heads to be 0.0148, or about a 1.5% chance. This is very unlikely, but not completely unexpected.

To better illustrate the point, I ran 10,000 simulations where I flip a coin 20 times and record the number of heads, shown in the following graph. These simulations show that we expect to see at least 15 heads about 2% of the time (proportion to the right of the red line), which is similar to the theoretical value of 1.5%.

Nice to See You Again...

Another interesting pattern is that Federer and Djokovic were placed in the same half of the draw for 7 straight Grand Slams (2008 Wimbledon - 2010 Australian Open). How likely is this occurrence? That is, how likely is it to see 7 heads in a row when flipping a coin 20 times? The theoretical calculation is more complicated, so I again ran 10,000 simulations of 20 flips and recorded the maximum number of heads to appear in a row.

The probability of seeing a run of 7 heads (i.e., 7 heads in a row) or more is 0.1112, or about 11%. This means that if you flip a coin 20 times, you will see a run of at least 7 heads 11% of the time, which is probably much more common than you might expect.

There was another stretch of 6 grand slams where Fed and Djokovic were again on the same side of the draw (2012 Wimbledon - 2011 US Open; combined with the previous run of 7, Federer and Djokovic were on the same half of the draw 13 out of 14 grand slams!). What's the probability that the second longest run when flipping a coin 20 times is at least 6? It's about a 1% chance, which is much less common than seeing a run of 7.

Will it Ever End?

While a run of 7 grand slams in a row is not very uncommon, having a second run of 6 or being placed on the same side of the draw 15 out of 20 tournaments is not very likely. I do not actually believe that the draw is rigged, but the frequency of Fed and Djokovic meeting is only expected to occur about 1% of the time.

The Law of Large Numbers tells us that, eventually, we expect Djokovic and Federer to be on opposite sides of the draw 50% of the time. However, as neither player will be playing infinitely many more grand slams, this does not impact the probability at the next Grand Slam. That is, assuming the rankings stay the same, there is still a 50% chance that Federer and Djokovic will be placed on the same half of the draw at Wimbledon, the next Grand Slam tournament.

Friday, April 13, 2012

A New Tennis Statistic: BGO

In an earlier post where I ranted about baseball statistics, I said that all good statistics should be two things: simple and easy to understand/interpret. I'm proposing a new statistic that the tennis folk should start using. It's a modification of the break point conversion. For those of you not familiar with tennis, a break point is when you have an opportunity to break your opponent (win a game when your opponent is serving). Consider two scenarios where Player A loses:

Scenario 1: 1/7 (14%) break point conversion

Scenario 2: 1/7 (14%) break point conversion

These two scenarios are the same, right? WRONG!! Suppose that we have a bit more information about the scenarios:

1/7 break point conversion. Each break point occurred in a different game.
1/7 break point conversion. All 7 break points occurred in the same game (many deuces), a game which Player A eventually won.

Remember that tennis is scored in games and sets, and that the player who wins the most points doesn't always win the match. In scenario 1, had Player A won every break point, he would have broken his opponent 7 times and maybe would have won the match. In scenario 2, Player A only had the opportunity to win one game on his opponent's serve. Winning the first break point would increase his break point conversion and may have saved a lot of hard work, but it would not have changed the outcome of the match as he didn't have any more return games with break point opportunities.

I am proposing a new statistic, Break Game Opportunities (BGO)*, which is the percent of times that a player breaks (wins the game) when he has the opportunity (at least one break point in the game). If this percentage is high, even if the break point conversion is low, then a player takes advantage of his opportunities to break. If this percentage is low, then the (un-opportunistic) player lost a lot of games in which he could have broken. This means that the score and outcome could have been very different.

*[Part of developing good statistics is coming up with a catchy name. Previous statistics that I developed for my PhD research are SWISS and ReQON (pronounced recon). Feel free to comment if you have any better naming suggestions before ESPN scoops me.]

Returning back to the earlier example, the BGO of scenario 1 is 1/7 (14%) and the BGO of scenario 2 is 1/1 (100%). Thus, the scoreboard would not have been different had the break point conversion increased in scenario 2 (as he converted in all games with break opportunities), but could be very different in scenario 1.

On the ATP website, they report players' break point conversion and number of games in which they broke their opponent. This gets close to the idea, but not exactly. Here are 2 interesting cases.

Novak Djokovic, the number 1 ranked singles player:

converts 47% of break points - ranked 9th this year on tour
wins 37% of return games (opponents' service games) - ranked 2nd

Marty Fish, toped ranked American at #9

converts 48% of break points - ranked 8th
wins 24% of return games - ranked 36th

So while they both have equal break point conversion, Djokovic breaks a lot more often. This means one of two things:

either Djokovic has more opportunities to break (which would give Fish a higher BGO), or
Djokovic has more opportunities in each game (which would give Djokovic a higher BGO). So while it takes Djokovic more chances to finally break, he is successful in more of those games.

It is impossible to calculate BGO from the data available from the ATP. But my hope is that BGO catches on so that the TV announcers don't solely blame a low break point conversion as the reason a player is losing.