Bayesians, Frequentists, and Lance Armstrong

by John Sides on June 26, 2012 · 12 comments

in Methodology,Sports

This is a guest post by Nathan Paxton.

*****

Lance Armstrong has returned to the news, and the Tour de France is upon us in just a few more days. For those of you who don’t follow professional cycling, most likely the first thing that comes to mind about the sport is doping. Indeed, that’s why Armstrong popped up in sports pages again, since the United States Anti-Doping Agency (USADA) has begun an investigation of him (after the Justice Department dropped its case against him last February).


Rather than consider whether all these charges, counter-charges, and counter-counter charges are true, let’s talk about a couple of different ways that social scientists might think about l’affaire Armstrong. (As justification for the break in election programming, I’d like to note that one of the founders of this blog was just a bit of a cycling nut.)


As social scientists, at least as regards what we can empirically assess, we tend to make statements of probability rather than fact. So rather than say that Armstrong did or did not use performance enhancers, we would talk about how likely versus not likely it is that he used the substances. It frustrates many people that we rarely make categorical statements, but we’re trying to be honest about what we know and don’t know.


The two major approaches to this sort of reasoning are probabilistic/statistical in nature, and we generally refer to them as “frequentist” and “Bayesian.” A “frequentist” viewpoint is the basis of almost any basic statistics class you took in college or grad school (unless you became a social methodologist or statistician). Basically, it asks, “Given an infinite number of trials or experiments or tests, what is the probability that the results I am getting are true?” At a sufficiently small, agreed-upon threshold (1 out of 20 or 0.05 in the social sciences), a frequentist would accept or reject the “null hypothesis” (the proposition that nothing actually happened even if the data said otherwise). They’re called “frequentists” because they assess how frequently a phenomenon “should” occur.


In the case of a doping test, a frequentist would look at the number of performance enhancing substance (PES) tests that Lance Armstrong has taken (lots, and all negative, so far as we know), seen that they are all negative, and say, “The probability of a false negative is small but possible. Under repeated sampling (which is what each drug test is in essence), we become more and more sure that we are getting the ‘true’ result.” This understanding of probability is what underlies the case from Armstrong’s camp: he’s had ALL of these tests over years, they have ALL been negative, and so it is virtually impossible that he could have been using performance enhancers. The frequentist perspective relies upon holding probabilities for some event constant, like getting positive test results, but those probabilities are based upon specific conditions or assumptions that may not hold.


Bayesians, on the other hand, look at evidence differently. The world, for them, can be divided into “priors” (what you know or educatedly guess the world is like), “data” (information you collect and assess), and “posteriors” (your revised beliefs about the world, which can be thought of as the combination of priors and data). Posteriors come from the combination of priors and data.

Bayesians also like to “iterate.” Posteriors beliefs from a situation can become the prior beliefs for another round of data examination.


Bayesian-inclined cycling fans might look at the brouhaha this way. Armstrong has never failed a drug test, of which he has taken more than 500. Armstrong was at the very top of this grueling sport for many, many years (7 Tour de France titles). Armstrong’s greatest, most consistent competitors — like Italian Ivan Basso, German Jan Ullrich — and potential American heirs — like Tyler Hamilton or Floyd Landis — have all been found to have used PES. If these are the only people who have been able to keep up with Armstrong over the years, and they have been found to use some form of performance enhancement, then we may have more reason to think that something doesn’t quite add up.

  • Prior: LA did not use performance enhancers because the tests show super-low probability.

  • Data: All significant competitors used performance enhancers and tested negative, until they were caught (sometimes via a test, sometimes via old-fashioned police work).

  • Posterior: Perhaps the tests’ probability of sussing out those who use performance enhancers are wrong. Revise those probabilities, in light of what we now know, make them the new “priors”, run the tests on the data, and assess how probable it is that Armstrong used those substances.

My own interpretation: If Armstrong did not use performance enhancers, then that means he’s even more extraordinary than previously reported, since he won unassisted against those with PES assistance. It seems more plausible that there’s either something wrong with the tests or that LA used performance enhancing substances than that he becomes an even greater statistical anomaly.


Why can we say this? Given what you might call the “hard” results of LA’s tests and the “soft,” circumstantial results of his competitors, iterating the posteriors and making them priors is exactly what we want to do.


The tests that these cyclists take tell us something not only about the cyclists, but about the tests too. Since we know that many of these riders were being regularly tested and passed the tests until they did not, and we gain a better appreciation of how well the tests detect people who are doping. It’s pretty commonly accepted in sports discussion that the EPO test has a high false negative rate (athlete tests “clean” even if using PES), even if no one seems to know the exact rate. The updated/iterated prior confirms that idea.


A Bayesian perspective can more easily contend with a world where athletes are actively trying to mislead, hide, and evade detection. When we know that competitors are trying to hide, that they did use banned PES, and that previous tests didn’t catch them, we can use the collective information about the population of athletes to make more real-world accurate  statements about the probability a particular athlete’s results are right.


Importantly, I think, this “light” Bayesian perspective on cycling and Armstrong gives us better leverage for thinking through the problem. Adopting the perspective does not mean that you think Armstrong used PES. But it does mean that one has to take into account more information about the world (of cycling, at least) than just how likely a particular test result is — one also needs to know how well the system is being gamed, and how well the tests caught demonstrated users. We can get some indication of that by looking at Armstrong’s peers.


Even if it turns out that Armstrong used performance enhancers, it may not diminish his Tour de France accomplishments. If he won using these substances while all his significant competitors did too, that still may mean that he’s the better cyclist. On the other hand, given how much he has made of never using banned substances and competing clean, it could significantly hurt his charitable works on behalf of cancer patients.

{ 12 comments }

Andrew Gelman June 26, 2012 at 12:08 pm

I recommend you read statistician Kaiser Fung’s thoughtful discussion:

I [Fung] don’t know if Armstrong doped or not. Given that his racing days are years in the rear-view mirror, there is little chance we will ever have direct evidence either way.

However, as I pointed out in Chapter 4 in Numbers Rule Your World, “never a failed test” is not a great basis on which to rest one’s case!

We have quite a few examples of athletes who never failed any drug test during their competitive careers but later confessed to doping. Marion Jones and Bjarne Riis are two examples I used in the book. Why is this the case?

The sad truth of steroid testing is that most dopers do not test positive. A recent example (discussed here) illustrates that about 50% of dopers would pass the test — and that was measured in a controlled laboratory experiment. The reason for such high false negative rates is that the anti-doping labs want to minimize the chance of a false positive error. The underlying statistics dictate a trade-off between false positives and false negatives; the harder one tries to eliminate false positives, the more false negative results will be produced!

I call for more false positives in drug testing in this post.

***

The media has gotten the statistics totally backwards.

On the one hand, they faithfully report the colorful stories of athletes who fail drug tests pleading their innocence. (I have written about the Spanish cyclist Alberto Contador here.) On the other hand, they unquestioningly report athletes who claim “hundreds of negative tests” prove their honesty. Putting these two together implies that the media believes that negative test results are highly reliable while positive test results are unreliable.

The reality is just the opposite. When an athlete tests positive, it’s almost sure that he/she has doped. Sure, most of the clean athletes will test negative but what is often missed is that the majority of dopers will also test negative.

We don’t need to do any computation to see that this is true. In most major sports competitions, the proportion of tests declared positive is typically below 1%. If you believe that the proportion of dopers is higher than 1%, then it is 100% certain that some dopers got away. If you believe 10% are dopers, then at least 9 out of 10 dopers will test negative!

More at the link.

Matt Zimmerman June 26, 2012 at 1:00 pm

“Given an infinite number of trials or experiments or tests, what is the probability that the results I am getting are true?”

This isn’t what a null hypothesis test asks. It asks “what is the probability of the results given that a null hypothesis is true.” These are very different things, though NHTers like to convince themselves of the former – even though most know better.

Robert June 26, 2012 at 1:40 pm

I think social scientists have a lot more to say about cycling, and doping in cycling, than just the frequentist-Bayesian views. First, cycling is n-person game theory in
action, with time perdurance on agreements, temporary coalitions, side
payments, reneging on bargains, bluffs, calls, unenforceable contracts,
payback, and backstab-your-buddy prisoner dilemmas. This stuff is great. Much
more interesting than who was doping. This is what makes bike racing much,
much better than sports where there are only two teams on the field at a
time. But, second, if doping really is your cup of tea, there’s some behavioral economics research on the odd situational ethics of cheating, sociological research on the efficacy of punishment and penalties as deterrents, and the labor relationships between riders, teams, and sponsors that may inform behavioral choices. Cycling is great.

Eric June 26, 2012 at 4:14 pm

I don’t follow cycling, but as a frequentist statistician, I can say that your description of what a frequentist statistician would do is off the mark. For one thing, I almost never do significance tests. Just as a good Bayesian statistician would do, a good frequentist statistician would assess the evidence logically before analyzing it. (I can’t say how I would approach this question because I don’t know enough about the data.)

But I think your approach has a deep flaw. Lance Armstrong won the Tour de France 7 times, and his abilities are obviously at the extreme, even if he was doping. The extremes of probability distributions are highly variable, and the degree to which he was extreme is very poor evidence of anything except his own abilities. By your logic, anyone who wins a lottery must have been cheating, since, for any given person, it is extremely unlikely that person will win.

Alan G June 26, 2012 at 4:52 pm

Very interesting analysis but the question is (a) misleading and (b) not so amenable to statistical analysis. I agree that it is unlikely that LA is simply a statistical anomaly. (And even more unlikely that he has just gotten lucky with doping tests.) Perhaps he has indeed found some way to “beat the system”, some “trick”. The implication of this article is that he is then using “PES” which is implicitly equated with illegal PES. Yes it could be he has found a way to trick the doping tests. Or it could be he has found a legal “trick”, a non-banned substance, perhaps a “drug” or perhaps a “nutritional supplement” (there is no clear distinction). Or perhaps he discovered a uniquely effective exercise (which he might not even be aware of, as it was part of a full regimen). The distinction between a legal “trick” and an illegal one can’t be subject to statistical analysis. Or perhaps he has a single gene which takes him from X Std Dev above the mean to X+1. I think there are too many unknown variables to make a judgement like this about a single individual.

mike ward June 26, 2012 at 5:04 pm

ils sont tous dopee!

DJ June 26, 2012 at 8:07 pm

It’s not true that Armstrong “never tested positive”. This line is repeated over and over again as if it were a matter of fact, whereas looking more closely at the history of his testing, the truth is:

- tested positive three times during the 1990′s for elevated testosterone
- tested positive for corticosteroids in Stage One of the 1999 Tour de France, used a forged, backdated prescription to claim “Therapeutic Use Exemption”
- blood samples taken from the 1999 Tour de France tested positive for exogenous erythropoietin (EPO).

Those are the proven ones, there are allegations of various other positive drug tests as well.

To claim “never tested positive” requires increasingly absurd semantic wrangling about the definition of “positive test”.
- Tested positive but due to unknown procedural irregularities, “B” sample was not tested to confirm the positive “A” sample. (Is this a “negative test”?)
- Tested positive, but explained it away with a fake prescription for the drug and false claim that the banned substance is “therapeutically necessary” for the athlete. (Is this a “negative test”?)
- Tested positive, but retroactively, because the test did not exist during the period of competition itself. Stored blood samples were positive for EPO. (Is this a “negative test”?)

Joel June 27, 2012 at 10:03 am

yes, DJ’s point.

if there is a probability question here, it is whether we can reject the null that armstrong’s irregular test results were simply coincidence.

d June 27, 2012 at 10:16 am

Enjoyed the article, would be useful starting point for an undergrad thesis.
An extra point to consider:

Armstrong’s teams were built around him, with the only objective being the yellow jersey. Most other teams during this period had riders encouraged to get stage wins on their own. Can you factor that in? Does it make a difference?

@DJ The only “official” time Armstrong failed a test was the corticosteroid incident. I’m not an Armstrong apologist, nor am I suggesting he did/didn’t take PEDs, but facts are facts. All the others need to be listed in your “allegations” cat.

Roll on le grand depart… liege is a great place for the fun to start!

Martin June 27, 2012 at 2:29 pm

This post is written like its audience is in the first week of high school stats (or maybe college applied math for non-STEM)? I doubt there are any readers who don’t know the basic definition of data, or that 1/20 = 0.05. The frequentist-Bayesian intro is simplified to point of being inaccurate.

The Armstrong doping question is interesting but I don’t see how social scientists can add much, aside from reviewing testing/stats methodology, unless they have a significant biomedical background.

Stan Young August 7, 2012 at 2:25 pm

Two things. New or special compounds and compounds that block known compounds. I think most of the old testing was for known compounds. The compound had to be an exact match in molecular weight and had to come off a separation column at the right time to be identified. The drug designers would stay one step ahead of the law by making new and slightly different compounds. Each time a compound was identified, a new one would be selected for use. There is chatter about blocking compounds, a compound that will block the detection of a known compound. So far as I know, no real blocker has been found. Discussion of blockers likely a red herring, used to fool the athlete into thinking he/she is getting an old trusted compound. A particularly sophisticated drug design was of a compound that would metabolically collapse quickly. If you don’t get the sample quickly or know its metabolic breakdown products, you will not detect it. Design of new compounds is relatively easy in the world of medicinal chemistry. There is essentially no safety testing so you can go straight to market. Passing a drug test if the tester does not know what compound to be tested for is 100% easy.

Fil October 17, 2012 at 1:45 pm

I don’t know much about Lance Armstrong but I do know something about the binomial distribution.

Suppose that Lance Armstrong had used performance enhancing drugs during the entirety of his post-cancer career. Suppose also that he was a good but not perfect cheat and that his probability of being detected as a fraud while he was masking drug use was one in a hundred. Finally let us accept Armstrong’s claim that he had been tested five hundred times in his career.

A statistician would ask what the probability of having no positive tests would be under the above circumstances.

Using the binomial distribution the statistician would conclude that there was a chance of only seven in one thousand of evading detection over that span or inversely a chance of ninety-nine point three percent of being caught at least once. Using conventional norms of statistical significance the statistician would reject the “null hypothesis” of cheating.

Suppose that we lengthen the odds against Mr. Armstrong and say that his cheating was so good that there was only a one in five hundred chance of him being detected on any given test. Then the chance of evading detection in five hundred tests would be almost thirty seven percent or inversely the chance of at least one detection would be sixty three percent.

While this is smaller than a ninety-nine point three chance of detection, it is still a hair-raising risk for someone with so much riding, as it were, on his reputation as a clean sportsman. When you add a team of at least seven other riders also subject to drug tests, the odds of being discovered in ten years of competition bloat.

To push the analysis to an extreme, if Mr. Armstrong had the genius to evade positive drug tests with 99.9% certainty, that is one in a thousand chance of being discovered on any given trial, over the course of five hundred tests there still would have been a chance of 39% of finding one or more positive outcomes in Mr. Armstrong’s tests, to say nothing of tests on teammates.

As you probably know statistics “proves” nothing. It just allows us to put a number to our degree of uncertainty. But the above numbers for me do raise the question of why any organization even bothers testing athletes and whether the sports bureaucracy-industry that expends billions on testing should not recuse itself or just resign en masse.

Comments on this entry are closed.

Previous post:

Next post: