Showing posts with label Football. Show all posts
Showing posts with label Football. Show all posts

Sunday, August 4, 2013

Analyzing Pro Athletes' Physiological Dashboard

I recently came across an article about the "sports science" changes that Chip Kelly has implemented since becoming head coach of the Philadelphia Eagles.  Basically, the Eagles spent more than $1 million investing in new technology that measures physiological details (heart rate, amount of time spent running during practice, 3d views of how players are lifting weights, etc) in the hopes of creating a "physiological dashboard" for each player.  They want to monitor the performance of each player during practice to increase training efficiency, such as ending practice early for players reaching their endurance limits or ensuring that players receive the correct amount of hydration based on what was lost during practice.  A large portion of the article is dedicated to describing the Eagles sports-science coordinator, who has previously served as a strength coach and nutritionist for colleges and the Navy SEALs.

Here are some interesting quotes:

  • "The result is a data driven approach to training"
  • "Players can log into their personal computers to check their own fitness profiles"
  • "Last season Catapult helped on of its NFL clients compare practice data ... in weeks when the team won compared to those when it lost.  A trend emerged: during Thursday practices before losses, offensive skill players were running a lot but not very quickly."
OK, so NFL teams are beginning to collect all of this data about their players.  But who exactly is mining all of this data to find useful information?  I can't believe that its the sports-science coordinator (he doesn't have a statistics degree).  Plus, who can actually monitor and interpret all of this data in real-time (i.e. during practice)?  It seems that Catapult, an IT consulting company focused on interpreting data, is doing some work after the season is over, but do any of these teams have the capacity to perform analysis in-house?  Here are a few things to think about:
  1. I'm sure most of the companies selling the equipment have guidelines or suggestions for how to interpret the data.  So maybe a bell goes off when a player's heart rate gets too high.  But how accurate are these baselines, especially when the same guidelines are applied to 180lb running backs and 350lb linemen?
  2. What is the goal of collecting all of this data?  Making real-time decisions about players' health during practice?  Drawing team-wide conclusions about what does/doesn't work at the end of the season?  These are 2 very different questions that could influence the most effective way to collect data.
  3. How much are teams investing into analyzing this data (either in-house or through ouside companies)?  For current genomic sequencing projects, more money is spent on the analysis than the sequencing experiment itself.  So are the Eagles planning to spend an additional $1 million on interpreting all of this data?  Or will this data just go to waste?

Friday, March 15, 2013

Statistics Playing Major Role in College Football Playoffs


In the above article, Sports Illustrated sought the recommendations of 5 college football and basketball "stats gurus" to get a better feel for how the college football playoff committee should go about choosing the four teams to compete in the 2014 national championship playoffs.  They discussed three primary themes:


1. The need for accountability and transparency. Although the BCS releases their rankings and scoring/point totals every week, the actual formula used in these calculations is proprietary.  I am in agreement with the 5 experts in calling for full transparency in the system.  However, this makes it difficult to include an "eye test" in the decision (whether this should be included is another debate).  My favorite quote:
"I doubt this will happen, but I think they need to have a non-voting data person in the room as well. Someone to help the members interpret ratings and other data sources, answer questions that are posed and hold the group accountable to information that is shared."
2.  Its about more than wins and losses.  Should other factors like injuries and margin of victory/defeat play into account?
"Of course, the danger of using advanced stats or ignoring head-to-head results is the committee might wind up producing a bracket that the majority of the public -- accustomed to seeing rankings ordered largely by team records -- rejects."
3.  Strength of schedule isn't what it seems.
"There are many ways to measure schedule strength, and many of them are valid. I like to use this example. Imagine two schedules. Schedule A consists of the six best teams in the country and the six worst. Schedule B consists of the 12 most average teams in the country. Which is tougher? Ask Alabama, and they'll obviously say Schedule A. Alabama would have a much easier time running the table against Schedule B. But ask the worst team in the country which one is easier, and they'll say the opposite. The worst team in the country would have a hell of a time winning a single game against Schedule B. ... So depending on who you are, you can perceive the exact same schedule of teams very differently."

Saturday, February 2, 2013

Super Bowl Squares Strategy

With the Super Bowl just a day away, I am hearing a lot of talk about Super Bowl Squares, the game of chance that only gets played one day out of the year.  With most variations of the game, people sign up for squares, then once all squares have been taken, the numbers are randomly assigned to the rows and columns, thus making this purely a game of chance (I guess the football game also plays a role too).

But suppose that these numbers were not randomly assigned: you get to choose the numbers that you want.  Which pair of numbers gives you the best chance of winning?  I have seen a few articles online trying to answer this question, but all the ones that I have come across look at the score after each quarter of all previous Super Bowl games.  While I see the point of only looking at Super Bowls, some of these games were played over 40 years ago and the game has clearly evolved since then.  For example, I have to believe that field goals are much more common now then they were 40 years ago, as kickers are now able to routinely make 50+ yard field goals (I don't have data to back this up, so let me know if I'm wrong).  Therefore, I have decided to look at all football games from this past season, including the playoffs.  If my counting is correct, this covers 266 games.  I should probably look at the score after each quarter of every game, but this would cover 1064 quarters, and I just don't have the time (or really care to) do this.  So I have decided to only analyze the final scores of the 266 games.  I also ignored whether the winning team was home or away, so to me, Team A winning by a score of 17-13 (making square 7,3 the winner) is equivalent to Team A losing 13-17.  That is, I treated squares (7,3) and (3,7) as the same.

Let's first look at the most common point totals, with respect to the last digit.  As expected the least likely point totals end in 5 (3.8% of all final scores) , 2 (4.3%) and 9 (5.1%).  The most common point totals end in 3 (16.4%), 4 (16.0%), 7 (14.8%), and 0 (13.5%).

Now let's look at pairs of numbers.  If you played over the full 2012 season, 3 squares would have never won (when only looking at final scores): (1,2), (2,9) and (5,6).  This isn't too surprising because, as shown earlier, it is difficult to score total points ending in 2, 5 or 9.  The most likely pairs this past season were (3,6) and (3,7)*, which each occurred 16 times this season.  Combined, these 2 pairs would have won over 12% of the games.  Additional pairs that would have won over 10 times this past season include (0,3), (0,4), (0,7), (0,8), (1,4) and (3,4).

In conclusion, if numbers were not randomly assigned in Super Bowl Squares, it would easily be possible to win in the long run.

* SI writer Peter King picked the Ravens to beat the 49ers 27-23, so he's playing the odds with his final score prediction.

UPDATE (2/4/2013). The score after each quarter (with the Ravens always leading) was 7-3, 21-6, 28-23 and 34-31.  This means that the winning squares were (3,7), (1,6), (3,8) and (1,4).  Did anyone follow my advice and bet on (3,7) or (1,4)?

Monday, January 21, 2013

College Football 2012 Wrap-up

This BCS bowl season, two teams, Wisconsin and Northern Illinois, were coached by interim coaches in their BCS bowl game after their head coach left the team to accept a new position.  There seems to be an increasing trend of coaches leaving their teams before a bowl game to accept a new coaching position.  I began wondering whether schools that are searching for a new head coach should try to scoop other coaches before the bowl games are completed, or if they should factor in the bowl game performance (maybe make the candidates feel some extra pressure to win)?  Schools tend to want to fill their coaching vacancies ASAP because this provides the new coach an extra month to put together his coaching staff and recruit.  But does this process of hiring coaches before the bowl game actually lead to better football success?

I chose to look at head coaches who lead their team to one of the BCS bowl games, then accepted a new college coaching position the following year.  While this leaves a small sample size (n=9), it is easier to evaluate the performance of these coaches because it is assumed that the new school is expecting the new coach to lead his new team to BCS bowls.  Here is a summary of the 9 coaches:

CoachPrevious TeamOld BCS Record*YearLast BowlNew TeamRecordNew BSC Record
Steve Spurrier
Florida2-1
2001
W
South Carolina
66-37
0-0
Urban Meyer
Utah
1-0
2004
W
Florida65-15**
3-0
Walt Harris
Pitt
0-1
2004
L
Stanford
6-17**
0-0
Rich Rodriguez
West Virginia
1-0
2007
W*
Michigan
15-22**
0-0
June Jones
Hawaii0-12007
L
SMU
31-34
0-0
Brian Kelly
Cincinnati
0-1
2009
L*
Notre Dame
28-11
0-1
Randy Edsall
UConn
0-1
2010
L
Maryland
6-18
0-0
Bret Bielema
Wisconsin
0-2
2012
L*
Arkansas
-
-
Dave Doeren
Northern Ill.
0-0
2012
L*
NC State
-
-

* = Coach left team before BCS bowl game, so it was coached by interim coach. If the head coach left the school before the BCS bowl game, it is not reflected in his BCS record.
** = No longer with this team. Meyer retired and Harris and Rodriquez were fired. 

A few interesting gems from looking at this table:
  • Only 3 of these 9 coaches won a BCS bowl game with their previous team (Spurrier, Meyer, Rodriguez).  
  • Only 2 have taken their new teams to a BCS bowl game (Meyer, Kelly), with Meyer being the only coach to win a game (actually 3, including 2 national championships).
  • The only coach to win a BCS bowl game with their new team (Meyer) had won a BCS bowl game with his previous team.
  • 3 of the 4 teams coached by intermin coaches lost their bowl game, with West Virginia being the only exception.
Yes, programs that are hiring coaches have probably suffered some losing seasons and need time to rebuild, so these results could change in another year or 2.  Plus, this is a small sample size, so we would probably be better off by including all coaches who leave their teams, not just ones leaving after reaching a BCS bowl game.  In my opinion, schools that are hiring college football coaches are placing too much emphasis on reaching BCS bowl games and not enough on winning these games. Even if it is all about the money of BCS bowl games and not actually about winning, most of these big-name hires are struggling to take their new teams to a BCS bowl game.

If I were in charge of hiring a new football coach to turn around a struggling program and win national championships, here would be my one major piece of advice:
If you are serious about winning national championships, hire a coach that has actually won a BCS bowl game. If none of these coaches are available/interested, then don't settle for a coach who has taken his team to a BCS bowl game but lost - what makes you think he can do better next time (ahem, Brian Kelly)?  Save your money and take a chance by hiring a coach who hasn't been to a BCS bowl (but has preferably won other bowl games). You might just hire the next Les Miles (2-1 in BCS bowl games since 2005, including a national title).

Wednesday, January 2, 2013

NFL Pop Quiz

As a new resident of St. Louis, I've enjoyed having a local NFL team to cheer for (although maybe not for much longer if they move to LA).  Rookie punter Greg Zuerlein had some incredible special plays this year and completed 3 of 3 pass attempts for 42 yards and 1 touchdown.  Can you guess which high-profile (and highly paid) quarterback threw for fewer yards?  Find out here.

Friday, October 12, 2012

Why the NFL is supporting the wrong cancer research

Unless you live at the bottom of the ocean, you have probably noticed that the NFL is showing support for breast cancer research by having the players and officials wear pink accessories (sounds a little girly when I say it like that).  As a cancer researcher, I think its great that the league with the most exposure in the USA is joining the American Cancer Society in its fight to end cancer.  However, the NFL is making a huge mistake by choosing to support breast cancer research over prostate cancer.  Here are a few statistics from the American Cancer Society that may surprise most people:

1. Approximately 1 out of every 6 men will develop prostate cancer in his lifetime.  In contrast, 1 out of every 8 women and less than 1 out of every 1,000 men will develop breast cancer in her/his lifetime.  

2. There will be an estimated 241,740 patients diagnosed with prostate cancer in 2012, compred to 229,060 new cases of breast cancer (<1% of those cases ocuring in males).

3. Treatments are not as effective for breast cancer as prostate cancer, and this is reflected in the 5 year survival rates: 99% of prostate cancer patients will survive 5 years, compared to only 89% for breast cancer.  However, until the 1990s, males with prostate cancer had a higher 5-year mortality rate than females with breast cancer.

4. It is estimated that 39,510 women and 410 men will die of breast cancer in 2012.  An estimated 28,170 men will die of prostate cancer this year.  That is, over 65 times more men will die of prostate cancer than breast cancer.

5. Prostate cancer is the second most deadly cancer type for males, behind only lung cancer.

6. African American males are 1.6 times more likely to develop prostate cancer and 2.5 times more likely to die from it than white males.   

Do these numbers surprise you?  While breast cancer is a more deadly disease than prostate cancer, all of the support for breast cancer month and "wearing pink" makes it seem like the disparity between the two diseases is much larger.  Considering that there is not a single female player in the NFL, the league is going out of its way to promote research for a cancer that its players are 150 times less likely to develop than prostate cancer (supporting evidence: allowing players to wear pink shoes and towels but fining them $5,000 for wearing a red undershirt).  Additionally, with the NFL consisting of a large proportion of African Americans males, you would think it would support research for a disease with significant racial disparities.

If you happen to meet Roger Goodell on the street and point these facts out to him, he will mention that the NFL is committed to promoting prostate health, and he is technically correct. However, I don't see the NFL encouraging players to wear blue during September.  A bit hypocritical, don't you think?

Saturday, September 8, 2012

How accurate are football preseason polls?

I'm a few weeks late on this post, but I think its still worth blogging about.  Every year, the media (especially ESPN) makes such a huge deal about college preseason football polls.  Without any games yet to be played, these polls are little more than speculation.  I wanted to look into how accurate these polls are at choosing that season's national champion.  In this post, I will be using exclusively the AP poll preseason and final results, which can make a difference in years before the BCS when there could be multiple national champs based on the poll used.

This first plot shows the final ranking of preseason top 5 teams since 1990.  The bar furthest to the right shows the teams that were in the top 5 preseason poll but finished the year unranked.

 A few interesting notes:
1. 14 of the past 22 national champions were ranked in the preseason top 5 (I think the last winner outside the top 5 was Auburn led by then-unknown Cam Newton).
2. More national champs were ranked preseason #2 than preseason #1.  This is good news for Alabama who started this season ranked #2 behind USC (but who jumped to #1 after their first win).
3. Looking only at the preseason #1 teams (blue bars), they are more likely to finish the season ranked 3rd than any other rank.  Also, no preseason #1 has finished worse than #16 in the final polls.

Next, I wanted to look whether teams ranked higher in the preseason poll tended to be ranked higher at the end of the season.  A simple way to do this is to look at the median finish of the top 5 preseason teams.

Median Final Ranking since 1990
Preseason Rank    Median Final Rank
1
3
2
3
6.5
4
9.5
5
8

So although the national champions are not always ranked preseason #1, the top preseason teams in general finish higher in the standings than the other preseason teams.  The exception is that teams ranked #5 tend to finish the season ranked better than the #4 preseason team.  There could be some bias causing this result, as there needs to be some tie between teams with the same final season record, and this may be influenced by the preseason rankings.

Finally, I wanted to see how the final rankings of the previous season influence the preseason polls of the next season.  For example, do teams who finish the year #1 tend to be the top ranked preseason team the following year (even though 1/4 of the team likely graduated)?
This plot shows that the preseason top 5 teams tended to finish the previous season ranked highly.  We see that, since 1990, 9 of the 23 teams finishing the previous season #1 were the top ranked preseason team.  This is a somewhat questionable strategy, as only 2 teams have repeated as national champs since 1990: Nebraska in 1994-95 (who was not ranked preseason #1 in 1995) and USC in 2003-04.

In summary, while the top preseason team more often than not does not win the national championship, on average they finish the season ranked better than any other preseason team.

Friday, June 15, 2012

Pissing off an NFL player

Chicago Bears cornerback Charles Tillman does not like your pro-Packers math!

  1. None of my homework or exam questions ever elicited this type of response.
  2. Question on her next hw assignment: If 100 NFL players attempted this stat problem, how many would answer correctly?  

Sunday, May 27, 2012

Plot of the Week 5

Last week, David Epstein (from Sports Illustrated) reported that, contrary to popular belief, recent studies have shown that NFL players actually have a longer life expectancy than non-NFL players.  You can find his original article here.  The printed SI article included some graphics, but I thought that the data could be visualized better.  So I am presenting the same data as in the article but in a more effective way.

The first plot shows that fewer NFL players have died than would be expected in the general US male population (*when looking at men with similar age and race to the NFL players in the study), 334 vs. 625, a 43% decrease.  The different colors represent different death causes.

The second plot looks at the expected and actual death rates for 3 special death categories: suicide, heart disease and cancer.  In each of these categories, NFL players have a lower death rate than the US male population (including suicide, 9 vs 22, a 59% decrease!!!).


The original academic paper that reported these numbers and results can be found here.  I only glanced through the paper for a minute, but the statistical methods seemed reasonably sound to me.  Even though this data was collected in 2007, these number seem to suggest that anecdotal evidence (i.e., the media) is responsible for exaggerating the link between playing in the NFL and suicide.

Saturday, May 19, 2012

College Football Playoff


It is looking more likely by the day that there will soon be a 4-team playoff to determine the national champion for college football.  One of the details that is still being determined is how the 4 playoff teams will be selected, described in this SI article.  There are 2 interesting issues raised by this article that I want to address.

1. Computer programs are currently used in the BCS system to determine which 2 teams will play for the national title.  The problem is that "computer programmers ... refuse to reveal the formulas that determine their rankings".  One of the current hot topics in statistics is reproducibility of research - when you get your work published in a journal, you need to describe your methods so that anybody else reading the article can reproduce the results.  If you don't reveal your method, then other researchers are not able to accurately evaluate your work and your conclusions become suspect.  How can we trust in a ranking where we don't know the input variables and model?  This is especially relevant in this context, as there is no way to evaluate the rankings (i.e., there is no "true" ranking that we can compare model performance with).
   If the authors are worried about others stealing their work, all they have to do is file for a patent/copyright.  If that is not the issue, I am left to believe that they are worried that others will improve upon their model and obtain "better" results. That doesn't instill much confidence.  And because no one knows the model, how can we be sure that the model isn't tweaked each week for someone's favorite team to move up the rankings? (I guess the BCS is overseeing things to make sure that this isn't the case, but who knows).

2. The article supports evaluating and selecting playoff teams only after the entire regular season is finished, rather than ranking teams after each week: "Because committee members ... would evaluate the entire body of work, schools will be more apt to schedule quality out-of-conference opponents."  What a novel idea ...  NOT!  One of the major rules in designing experiments is that you cannot modify your experiment half way through because you do not like the preliminary results (exception: cancelling a clinical trial resulting in many deaths).   You have to wait until you have all of the data before testing a hypothesis.  Games are played to determine the best team on the field, so let's collect all of the results before trying to determine the final rankings.  Otherwise, the preseason polls are very likely to be the tiebreaker between evenly matched teams, which has absolutely nothing to do with performance on the field.
   For example, image that all of the preseason top 5 teams go undefeated.  After each week, none of the teams have a reason to slide down the rankings because they all won.  Similarly, it is tough for the team ranked #5 to jump ahead of the other teams that also did not lose.  From week to week, voters tend to assume that the previous ranking is truth, and so need exceptional evidence to move one team above another.  This also requires the voters to admit that their previous ranking was incorrect, and who wants to admit they are wrong?

With this post, I am officially throwing my hat into the ring of choosing the 4 college football playoff teams.  I promise to wait until the end of the season, when all of the data has been collected, to make my selections (even if this means not watching ESPN during the fall when they update their playoff teams every hour).  If I use a model to select the teams, I will be fully transparent, allowing everybody access to the methods I used, so that my results can be replicated.  Because some may not agree with choices related to my model, I would also be prepared to defend my model by evaluating its performance using previous years' data or using some other metric.  Finally, because I am a new PhD graduate, I would be willing to accept a discounted salary compared to other BCS executives (I'm thinking about $100k would be fair for the month I would be working each year).

Thursday, May 3, 2012

The Mystery Behind NFL Suicides

If you haven't heard, former NFL player Junior Seau was found dead in a possible suicide.  This event has the media up in arms again about the link between concussion, depression and suicide in football players.  I'm sure that its only a matter of time before Congress steps in to voice their opinion*.

While pro athletes live glamorous lives while playing, many retired athletes struggle with depression after falling out of the limelight and other personal struggles (bankruptcy, family issues, not knowing what to do when retiring at age 30, etc), regardless of sport.  I wanted to look at the data for suicide numbers between the NFL (a high contact sport) and other non-contact professional sports, hoping to find a higher rate among NFL players than other athletes.

However, for all of the press that the NFL and suicide are getting right now, I challenge you to find the number of former NFL players who have committed suicide - I can't find it!  Here's what I did find:

  • Cricket is known to have the highest suicide rate in professional sports, with over 150 known cases in the 20th century.  See the link here for a good explanation of possible reasons for this.
  • As of 2005, there have been 76 suicides of former MLB players.
  • All we have for NFL suicide numbers are anecdotal stories, which play towards the heart but not the mind of statisticians.  

I realize that researchers have shown a link between brain injuries sustained playing in the NFL and depression, which has at times has led to suicide.  But without the data, we can't conclude that former NFL players are any more likely to commit suicide than other athletes, such as MLB players or cricketers.  Extra credit to anyone who can help me find this data.


*Showing how effective their MLB steroid hearings were, Rogers Clemens is currently in court for possibly lying about possibly taking steroids and Ryan Braun, who last year failed a drug test and got off on a technicality, is on the field making millions playing in the MLB.