This is but one small example of why I highly recommend the book, Super Crunchers, by Ian Ayers. More to come on this excellent book. For now, here is my take on the Democratic primaries.

Today is an important day in the Democratic Presidential Primary season. Barack Obama has won 11 straight primary contests over Hillary Clinton. With primaries being held in four states today (OH, TX, RI and VT), some strategists believe that Senator Clinton needs to win both Ohio and Texas to remain a viable contender for the Democratic nominee. While Senator Clinton appears to be safely ahead in Ohio, recent polling in Texas suggests that Senators’ Clinton and Obama are in a dead heat there.

The Texas poll indicates that despite Clinton holding a 47 percent to 44 percent lead over Obama, the data are consistent with either candidate winning tomorrow by a narrow margin. This claim seems sensible since the poll’s margin of error is +/- 3.4 percent. But is it?

For simplicity (the following analysis would be unchanged) let us assume that the nine percent of undecided voters split evenly between the two candidates, so that Clinton holds a 51.5 percent to 48.5 percent lead in the polls. The real point of interest is, what is the probability that Senator Clinton is *actually *leading?

To figure this out, it is important to know that the margin of error in a poll is related to what is called a “standard deviation.” A standard deviation is a simple measure of dispersion. It represents how widely data *deviates *from its mean. In a sample of data that follow what is popularly known as the bell-curve (statisticians refer to this as a normal distribution), we know that two-thirds of the observations fall within one standard deviation of the mean and that 95 percent of the observations fall within two standard deviations of the mean.

For example, suppose we know that for a population of people the average income is $50,000 and the standard deviation is $10,000. Then we can say that two-thirds of the population has incomes within $10,000 of the average — between $40,000 and $60,000. Ninety-five percent of the population would have incomes between $30,000 and $70,000. So, if we were asked what is the probability that a person in this population has an income that exceeds $70,000, we would estimate that to be roughly a 2.5 percent chance (because five percent have incomes that fall outside of the $30,000 to $70,000 range).

The margin of error in opinion and other polls represents nothing more than two standard deviations. Thus, in the Texas poll, starting with the 51.5 percent that support Clinton, we can construct a range of numbers of people that are likely real supporters of Clinton (because the sample results might not be representative of how Texans would actually vote). There is a 95 percent chance that 48.4 percent to 54.9 percent of likely voters support Clinton. And there is a 67 percent chance that 49.8 percent to 53.2 percent of likely voters support Clinton.

It should be clear then that the race in Texas, according to this poll, is anything but a dead heat. Since there is a 67 percent chance that Clinton’s actual support in Texas is between 49.8 and 53.2 percent, there is a 33 percent chance that her actual support falls outside this range, either above or below. Since the bell curve looks the same both above and below the peak, there is just a 16.5 percent chance that Clinton’s support in Texas is less than 49.8 percent. Using a little bit of rounding, the polling data suggest that there is over an 80 percent chance that Senator Clinton is leading Senator Obama in the Texas primary.