Saturday, April 23, 2011

The Statistical Singleton Falicy

The generalization of applying statistics to an individual could be called the statistical singleton fallacy: people (almost universally) inappropriately apply statistics to individuals. Statistics are only valid over populations (in the general sense). As the number of "things" in a population diminishes to 1, the confidence interval goes to 100%, i.e. you can not apply statistical conclusions to a singleton.

A simple graphic illustration of this is smoking. A smoker's probability of dying of a smoking-related disease before age 65 is 15.6%[1]. However, my probability of dying of a smoking-related disease (assuming I smoked) is either 0% (I don't die of a smoking-related disease) or 100% (I die).

People don't understand why some people smoke when there is such clear evidence of the increased probability of death due to smoking-related disease. Well, for each smoker, the probability is either 0% or 100%. If the smoker believes his probability is 0%, he will continue to smoke. If he believes his probability is 100%, he is a hypochondriac.

Thus, smokers either believe they are untouchable or they are crazy.

This is why it is so hard to sell "it's good for you" things... they are almost invariably statistically good for a large population, but can make no guarantees of "goodness" when applied to a singleton.

This applies in spades to health-anything:

  • Individual's health: Lose weight and exercise - it's good for you... but then they advise you talk to your doctor first to make sure the exercise won't kill you before it becomes good for you.
  • Program's health: testing is not guaranteed to find any bugs - if it were, running testing a second and third time would always find more bugs.

[1] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1646951/?page=3