13.1 Weighing the Evidence

Probability is a tool we can use to mathematically describe the outcome of weighing the evidence for and against a claim. Probabilities can be obtained from theoretical models, statistical generalizations, or subjective judgments.

13.1 Weighing the Evidence

13.1.1 What is Probability?

The probability of rolling double sixes is 1 in 36.

Evaluating the Premises

Most of the evidence we have for any premise falls short of certainty. All of the sources of evidence we have looked at are capable of failure. Sometimes we misperceive things. Sometimes testimony is deceptive. Sometimes memory fails us. Inductive arguments do not guarantee the truth of their conclusion. An inference to the best explanation is not always an inference to the correct explanation.

If we are going to function, then, we need to be willing to function with uncertainty. Probability is the best we can do. For instance, although the postal service is fairly reliable, there is no guarantee that the mail will arrive as it is supposed to. When I mail a letter, I think it is highly probable, but not certain, that it will arrive at the place to which I sent it. It is possible that the letter will not arrive, but this is a much less likely possibility than the possibility it will arrive. I can say that it is probable that the letter will arrive, and that is enough for me to bother to mail the letter. Some evidence makes a conclusion more likely to be true; other evidence increases the probability of a conclusion only slightly; but all the evidence we have informs us in some way about probability.

Representing Probabilities Mathematically

We can think of probability as how likely it is that something is (or will be) true, given a particular body of evidence. Using numbers between 0 and 1, we can express probabilities numerically. For example, if I have a full deck of cards and pick one at random, what is the probability that the card I pick is a queen? Since there are 52 cards in the deck, and only four of them are queens, the probability of picking a queen is 4/52, or .077. That is, I have about a 7.7% chance of picking a queen at random. In comparison, my chances of picking any “face” card would be much higher. There are three face cards in each suit and four different suits, which means there are 12 face cards total. So, 12/52 = .23 or 23%. In any case, the important thing here is that probabilities can be expressed numerically.

In using a numerical scheme to represent probabilities, we take 0 to represent an impossible event (such as a contradiction) and 1 to represent an event that is certain (such as a tautology). In a valid deductive argument, if we were certain of the premises, then we could be certain of the conclusion.

Calculating the probability of negations is simply a matter of subtracting the probability that some event, say event a, will occur from 1. The result is the probability that event a will not occur:

P(~a) = 1 – P(a)

For example, suppose I am playing monopoly and I wanted to determine the probability that I do not roll a 12 (since if I roll a 12 I will land on Boardwalk, which my opponent owns with hotels). Since we have already determined that the probability of rolling a 12 is .028, we can calculate the probability of not rolling a 12 thus:

1 – .028 = .972

Thus, I have 97.2% chance of not rolling a 12. So, it is highly likely that I won’t.

Theoretical Probability

How exactly do we determine the probability of an event? There are three basic approaches. The first, and the one we have used so far on the page, is theoretical probability, or the probability in principle assuming we have no further evidence than a number of equally weighted options and the certainty that one of those options will be chosen. The theoretical probability of one option being chosen out of n possible options is always 1 ÷ n. Theoretical probabilities are easy to calculate, and so they make for good logic text examples; they apply to games of chance, or other cases where the options are precisely known and the selection is random; ‘random’, that is, in the sense that we have no evidence to think one option is more likely to happen than another.

In real life, however, we always have more evidence than theoretical probabilities take into account, and the way in which we divide up options can look arbitrary. For instance, what is the probability that, when I go to the ATM and try to withdraw $100, I will get exactly $100? The probability ought to be very high, because ATM machines are very reliable. But suppose there are two options:

(a) I get exactly $100 (b) I don’t get exactly $100

Then the theoretical probability of my getting $100 is only 50%. It gets worse if we allow 3 options:

(a) I get $100 (b) I get something else (c) I don’t get anything

Now, the probability is 33.3%. We could allow 5 options:

(a) I get exactly $100 (b) I get > $100 (c) I get $1-100 (d) I get something other than money. (e) I get nothing.

Obviously, the probability of my getting exactly $100 should not be 20%; if it were, then ATMs would be like slot machines. Probabilities in the real world are calculated with far more evidence about how often certain options do or do not occur, including a lot of background knowledge about what the roughly equally probable options are. Theoretical probabilities work only in highly constrained cases. So, there are two better ways we can make use of our evidence to calculate probabilities in the real world: statistical probability, from the frequency with which an event is observed in the world, and subjective probability, based on human estimates. We’ll look at each method in the following pages.

13.1.2 Statistical Probability

This data is randomly generated, but it is easy to assume that there are patterns in it.

Frequency Probability

An alternative way to derive the probability of an event is through its observed frequency. Statistical generalizations can support claims about probability. This method relies on inductive reasoning: given that similar events have occurred n% of the time in the past, we conclude there is an n% probability the event occuring this time. For example, if 70% of the times in the past that Jeff comes home late from work, his breath reeks of donuts that he bought on the way home, and Jeff is late from work again today, then there is a 70% likelihood that Jeff’s breath will reek of donuts again, all other things being equal.

The phrase “all other things being equal” is important, of course. If, for instance, we know that Jeff is late today for an unusual reason (like getting a filling at the dentist) then this may reduce the likelihood that Jeff will come home with donut breath. A statistical sample must be relevantly similar, or representative, before we draw probabilistic conclusions from it.

Representative Samples

There are two conditions that any statistical generalization must meet to be representative:

1. Adequate sample size: the sample size must be large enough to support the generalization.

2. Non-biased sample: the sample must not be biased.

A “sample” is simply a portion of a population. A population is the totality of members of some specified set of objects or events. For example, if I were determining the relative proportion of cars to trucks that drive down my street on a given day, the population would be the total number of cars and trucks that drive down my street on a given day. If I were to sit on my front porch from 12- 2 pm and count all the cars and trucks that drove down my street, that would be a sample.

A good statistical generalization is one in which the sample is representative of the population. When a sample is representative, the characteristics of the sample match the characteristics of the population at large. For example, my method of sampling cars and trucks that drive down my street would be a good method as long as the proportion of trucks to cars that drove down my street between 12-2 pm matched the proportion of trucks to cars that drove down my street during the whole day. If for some reason the number of trucks that drove down my street from 12-2 pm was much higher than the average for the whole day, my sample would not be representative of the population I was trying to generalize about (i.e., the total number of cars and trucks that drove down my street in a day).

The “adequate sample size” condition and the “non-biased sample” condition are ways of making sure that a sample is representative. In the rest of this section, we will explain each of these conditions in turn. It is perhaps easiest to illustrate these two conditions by considering what is wrong with statistical generalizations that fail to meet one or more of these conditions.

Inadequate Sample Size

First, consider a case in which the sample size is too small (and thus the adequate sample size condition is not met). If I were to sit in front of my house for only fifteen minutes from 12:00-12:15 and saw only one car, then my sample would consist of only 1 automobile, which happened to be a car. If I were to try to generalize from that sample, then I would have to say that only cars (and no trucks) drive down my street. But the evidence for this universal statistical generalization (i.e., “every automobile that drives down my street is a car”) is extremely poor since I have sampled only a very small portion of the total population (i.e., the total number of automobiles that drive down my street). Taking this sample to be representative would be like going to Flagstaff, AZ for one day and saying that since it rained there on that day, it must rain every day in Flagstaff. Inferring to such a generalization is an informal fallacy called “hasty generalization.”

One commits the fallacy of hasty generalization when one infers a statistical generalization (either universal or partial) about a population from too few instances of that population. Hasty generalization fallacies are very common in everyday discourse, as when a person gives just one example of a phenomenon occurring and implicitly treats that one case as sufficient evidence for a generalization.

Biased Sample

The non-biased sample condition may fail to be met even when the adequate sample size condition is met. For example, suppose that I count all the cars on my street for a three hour period from 11-2 pm during a weekday. Let’s assume that counting for three hours straight give us an adequate sample size. However, suppose that during those hours (lunch hours) there is a much higher proportion of trucks to cars, since (let’s suppose) many work trucks are coming to and from worksites during those lunch hours. If that were the case, then my sample, although large enough, would not be representative because it would be biased. In particular, the number of trucks to cars in the sample would be higher than in the overall population, which would make the sample unrepresentative of the population (and hence biased).

Another good way of illustrating sampling bias is by considering polls. Consider candidate X who is running for elected office and who strongly supports gun rights and is the candidate of choice of the NRA. Suppose an organization runs a poll to determine how candidate X is faring against candidate Y, who is actively anti gun rights. But suppose that the way the organization administers the poll is by polling subscribers to a hunting and fishing magazine. Suppose the poll returned over 5000 responses, which, let’s suppose, is an adequate sample size and out of those responses, 89% favored candidate X. If the organization were to take that sample to support the statistical generalization that “most voters are in favor of candidate X” then they would have made a mistake. We would expect that subscribers to a hunting magazine would have a much higher percentage of gun rights activists than would the general population, to which the poll is attempting to generalize. But in this case, the sample would be unrepresentative and biased and thus the poll would be useless.

Random sampling is a common sampling method that attempts to avoid any kinds of sampling bias by making selection of individuals for the sample a matter of random chance (i.e., anyone in the population is as likely as anyone else to be chosen for the sample). The basic justification behind the method of random sampling is that if the sample is truly random (i.e., anyone in the population is as likely as anyone else to be chosen for the sample), then the sample will be representative. The trick for any random sampling technique is to find a way of selecting individuals for the sample that doesn’t create any kind of bias, while also obtaining a large enough sample size to be representative.

13.1.3 Statistical Significance

Dr. Burton, left, analyzes a printout of the physiological reactions of three tests subjects during a test of the drug Temazepan at the US Air Force Drug Testing Laboratory.

Tests with a small sample size make it harder to rule out chance.

Sample Size

Suppose you want to take a random sample of students at the University, to determine whether they prefer dining hall A or dining hall B. How many students should you ask?

Obviously, asking 10 students would be too few, because you might randomly happen to ask only students who like dining hall A. On the other hand, asking half of the students the university would be far more than you need, since you’d expect a pattern to show up in a random sample long before that. So, how many do you ask?

Two Types of Errors

There are two ways a small sample size might cause problems. These are defined in terms of the null hypothesis, the hypothesis that there is no difference or no relationship. This might be very simple: the null hypothesis in our case might be that dining hall A and dining hall B are equally well liked. The null hypothesis might instead be more complicated: the null hypothesis might be that there is no relationship between dining hall preferences and some other factor, like being a freshman versus a junior or senior, or like being introverted or extroverted.

There are possible errors:

Type I Error = finding a difference that isn’t there, or rejecting the null hypothesis even though it is true. In other words, concluding there is a difference in dining hall preferences, when in fact there is no difference or relationship. This error would be due to the effects of chance on our random sample, ‘noise’, sampling outliers who were not typical of the population.

Type II Error = not finding a difference which is there, or not rejecting the null hypothesis even though it is false. In other words, not concluding that there is a difference in dining hall preferences, when in fact there is a difference. This error would be due to the difference being too subtle to detect: we say the sample wasn’t powerful enough to detect the difference.

The question of what our sample size should be depends on how much risk of each type of error we are willing to accept.

Step One: Estimate Variability (Proportions)

Variability is the degree to which the attributes or concepts being measured in the questions are distributed throughout the population. A heterogeneous population, divided more or less 50%-50% on an attribute or a concept, will be harder to measure precisely than a homogeneous population, divided, say, 80%-20%. Therefore, the higher the degree of variability you expect the distribution of a concept to be in your target audience, the larger the sample size must be to obtain the same level of precision. To come up with an estimate of variability, simply take a reasonable guess of the size of the smaller attribute or concept you’re trying to measure. If you estimate that 25% of the population in your county farms organically and 75% does not, then your variability would be .25… If variability is too difficult to estimate, it is best to use the most conservative figure of 50%.

In our example, suppose we estimate that 40% prefer A and 60% prefer B. Then the proportion or variability would then be 10% (i.e., 10% greater or lesser than 50%). On the other hand, suppose we estimate the proportion is 80% to 20%. The proportion or variability would then be 30%. (30% greater or lesser than 50%). Obviously, this is an estimate before gathering results, but the greater we estimate the variability, the larger the sample size needed.

Step Two: Choose Precision (Confidence Interval)

The degree of precision or margin of error is widely known as the confidence interval. It is the closeness with which the sample predicts where the true values in the population lie. The difference between the sample and the real population is called the sampling error. If the sampling error is ±3%, this means we add or subtract 3 percentage points from the value in the survey to find out the actual value in the population. For example, if the value in a survey says that 65% of farmers use a particular pesticide, and the sampling error is ±3%, we know that in the real-world population, between 62% and 68% are likely to use this pesticide. This range is also commonly referred to as the margin of error. The level of precision you accept depends on balancing accuracy and resources. High levels of precision require larger sample sizes and higher costs to achieve those samples, but high margins of error can leave you with results that aren’t a whole lot more meaningful than human estimation.

If we allow for a very large margin of error, like ±20%, then our results are not very reliable. On the other hand, if the difference in dining hall preferences were only ±3% anyway, we probably wouldn’t care if we failed to detect that difference. It wouldn’t be an important difference, so we’d accept the risk of error within that range. The more precise we want the results, the larger the sample size must be.

Step Three: Choose Confidence Level

The confidence level involves the risk you’re willing to accept that your sample is within the average or “bell curve” of the population. A confidence level of 90% means that, were the population sampled 100 times in the same manner, 90 of these samples would have the true population value within the range of precision [confidence interval] specified earlier, and 10 would be unrepresentative samples. Higher confidence levels require larger sample sizes… If the confidence level that is chosen is too low, results will be “statistically insignificant”.

In other words, suppose we are willing to accept a 1 in 20 risk that our sample did not fall within the confidence level we selected in step 2. We’d then want a 95% confidence level that any differences between the sampled population and the real population were within the margin of error. The greater the confidence level desired, the larger the sample size.

Statistical Significance

Suppose that our survey finds that people prefer dining hall A over B by 75% to 25%. This is a 50% difference. Suppose the difference is greater than the margin of error of +/- 5%. Is this enough evidence to reject the null hypothesis, where the null hypothesis is the claim that there is no difference? It depends on whether the difference is statistically significant.

If the sample size was large enough, so that we reach our desired confidence level (say, 99%) that the difference is due to a real difference rather than chance, then the result is “statistically significant”, and it is safe to reject the null hypothesis. On the other hand, even if the results were very dramatic, if the sample size was too small, so that we can only be 70% or 80% sure the results aren’t due to chance, that may not be enough to reject the null hypothesis, and the result is “statistically insignificant”, meaning we’d need to do another study with a larger sample size to rule out Type I error.

Knowing the results of a study are statistically significant gives us reason to reject the null hypothesis. It means we have enough evidence to be reasonably certain that there is some relationship between the variables studied.

Statistical significance doesn’t mean that the difference is practically significant, though. Suppose that our survey found that 52% of students preferred dining hall A over B. That 2% difference could be statistically significant, provided that the sample size was very large, and so we could still reject the null hypothesis that there was no relationship. Still, a 2% difference would probably not be significant enough to take any sort of practical action, like renovating dining hall B.

The Insignificance of Statistical Insignificance

Unfortunately, many people do not realize that finding a statistically insignificant difference is not evidence of no significant difference. The fact that the only difference shown in a study is a “statistically insignificant” difference does not mean that there is not a difference, or not a practically significant difference. It only means one thing: the sample size in the study was too small to rule out that the differences found were due to chance. It does not mean that the null hypothesis is true; it only means that the null hypothesis hasn’t been proven false.

For example, suppose that you polled 5 people, and found that 80% of them preferred dining hall A over B. That looks like a significant difference! It isn’t statistically significant, though, because the sample size was too small to rule out the effects of chance. That doesn’t mean you’ve shown there’s no relationship; rather, it means that you’d need a larger sample size to show that there is a relationship.

The next time you read a news headline that says, “No statistically significant relationship between cell phone use and brain cancer!”, or “No statistically significant relationship between seat belt use and injuries in auto accidents!”, or “No statistically significant relationship between organic foods and health!”, realize the headline might just as well say, “Researchers perform study with a sample size too small to be meaningful.”

The Power of a Test

Of course, we can use statistical data to decide not to reject the null hypothesis; that is, to decide not to accept that there is any relationship between two variables. Statistical significance, which has to do with the possibility of Type I error (finding a relationship which isn’t there), just can’t do that on its own. Instead, what we need is something that tells us the possibility of Type II error: the risk of failing to find a relationship that is there. This is called the powerof a test. There are a variety of ways to calculate it, but for our purposes it is a function of (a) the sample size, (b) the confidence level, and (c) how big the difference or relationship is between the two variables in the real world.

The idea is that, if there were in the real world a big difference in dining hall preferences, we would be able to detect that even with a small sample, whereas if there is only a small or subtle difference, we would only be able to detect that with a large sample.

Sample Size Calculator

So, return to the original question: how large should your sample size be? Well, now that you know all the factors involved in the calculation, and why they each matter, you can use a chart or an online calculator to determine the optimal sample size for your project. Here is one, though with a little searching you can easily find your own.

Sample Size Calculator: Simply input your desired confidence level, estimated proportion (variability), desired confidence interval (margin of error), and the total population size, and this calculator will tell you the minimum sample size needed: Click here to access the Sample Size Calculator

13.1.4 Subjective Probability

When is it rational to bet on a horse race?

Subjective Probability

So far, we have been studying probabilities based on how frequently an event occurs, as measured through statistics. The idea is that a good, objective way to evaluate how likely an event is to happen is how often it actually occurs. This is known as a frequentist view of probability.

What do we do, though, when an event is unprecedented? For instance, suppose that two teams are playing against one another in the Super Bowl, or many horses are competing in the Kentucky Derby? What is the probability of either team winning, or of each horse winning? We can’t measure this statistically, because each Super Bowl and each Kentucky Derby will only be repeated once each year, with different teams, players, or horses each time. There are few useful statistics, because the most important features of each event will happen only once. Still, clearly one team can be strongly favored to win, or one horse can be a long shot. People place bets on sporting events all of the time. So there is still a meaningful sense of probability even for events which only happen once.

We call this subjective probability; it is also sometimes called Bayesian probability, after the Reverend Thomas Bayes, who developed the theory of subjective probability in the 18th century. Subjective probabilities measure the likelihood of an event based on our own subjective estimate of how likely it seems to us, or how willing we would be to bet on it compared to other events.

Coherence

The probabilities we assign must be coherent. If we are 40% certain of A, we can’t also be 60% certain of A. If we think A is twice as likely as B, then we have to think B is half as likely as A. If we think A is more probable than B, and B is more probable than C, then we’d better think A is more probable than C. A simple way to describe coherence is that it means we assign every possible outcome a probability between 0 and 100% (making sure we don’t double-count any outcomes), and that when all of the possible outcomes are added up, they add up to exactly 100%. For instance, suppose 4 horses are running in the race. We could rank the probabilities in any manner which added up to 100%, such as 25, 25, 30 and 20, or 15, 15, 10, and 60. But we couldn’t rank the probabilities in such a way that added up to less than 100% or more than 100%.

The probabilities also have to be honest: you can’t just say “I’m 100% sure!” because you feel pretty confident today. Someone’s beliefs about how probable events are will be consistently reflected in their behavior. For instance, any time we were offered a bet which seemed fair based on our statistics, that means we rationally ought to take the bet. For instance, if we really think Pegasus has a 20% chance of winning, and somebody offers us $1 if Pegasus wins in exchange for less than 20 cents, then we ought to take the bet. On the other hand, we shouldn’t accept the same bet if it costs more than 20 cents. Why not? Well, if there really is a 20% chance of Pegasus winning, we can imagine that means that Pegasus loses 4 out of 5 times. If we take the bet at 20 cents, then we’ll end up breaking even after 5 times: we’ll have paid 20 cents five times, and one $1 one of those five times. But if we take the bet at 25 cents, we’ll end up losing money: we’ll have paid a total of $1.25 and only get back $1. On the other hand, if we take the pet at 10 cents, we’ll make a profit: we’ll have paid only 50 cents, but received $1 in winnings.

(Note: saying this bet is “rational” assumes that every dollar is equally valuable to every other dollar; for instance, that the person has some savings and doesn’t need that dollar to meet a basic need; obviously, it isn’t rational to bet money required for basic needs).

Calculating Subjective Probability

Suppose that you know what bets a person (including yourself) is willing to accept. Assuming the person is rational, you can use that information to determine what they believe the subjective probabilities are. Let’s suppose that Danny accepts or rejects the following bets on the Pittsburgh Steelers winning their next game.

Accepts? Cost / Payout Cost per Dollar Payout Percentage

Rejects $0.70 / $1 $0.70 < 70%

Rejects $15 / $20 $0.75 < 75%

Accepts $0.20 / $2 $0.10 > 10%

Accepts $0.25 / $.50 $0.50 > 50%

Rejects $6 / $10 $0.60 < 60%

We calculate cost per dollar payout by dividing the cost by the payout. We can then translate cents into percentages, “accepting” a bet as considering the Steelers winning to be more likely than that percentage, and “rejecting” a bet as considering the Steelers winning to be less likely than that percentage. The highest cost-per-dollar payout Danny accepted was 50 cents, and the lowest cost-per-dollar payout Danny rejects as 60 cents. This means Danny believes the Steelers winning is between 50 – 60% likely.

13.1.5 Fallacies and Probability

Don’t fall into the trap of the Gambler’s Fallacy.

Fallacies and Probability

We’ve looked in this submodule at what probability is, and two ways to calculate probabilities: the frequentist method, through statistics, and the subjective method. In the submodules to come, we’ll look more closely at how logical operators like and, or, and not apply to probabilities. We should conclude this submodule, though, with consideration of two very common fallacies in how people think about probabilities.

Base Rate Neglect

Consider the following scenario. You go in for some testing for some health problems you’ve been having and after a number of tests, you test positive for colon cancer. What are the chances that you really do have colon cancer?

Let’s suppose that the test is not perfect, but it is has a 5% rate of false positives (people who get the test and test positive, but don’t have colon cancer) and a 5% rate of false negatives (people who get the test and test negative, but do have colon cancer). The test is 95% accurate. That is, in the case of those who really do have colon cancer, the test will detect the cancer 95% of the time, and for those who do not have colon cancer, the test will diagnose them as having it 5% of the time.

Many people would be inclined to say that, given the test and its accuracy, there is a 95% chance that you have colon cancer. However, if you are like most people and are inclined to answer this way, you are wrong. In fact, you have committed the fallacy of ignoring the base rate (i.e., the base rate fallacy).

The base rate in this example is the rate of those who have colon cancer in a population. There is a very small percentage of the population that actually has colon cancer (let’s suppose it is .005 or .5%), so the probability that you have it must take into account the very low probability that you are one of the few that have it. That is, prior to the test (and not taking into account any other details about you), there was a very low probability that you have it—that is, a half of one percent chance (0.5%). The test is 95% accurate, but given the very low prior probability that you have colon cancer, we cannot simply now say that there is a 95% chance that you have it. Rather, we must temper that figure with the very low base rate.

Here is how we do it. Let’s suppose that our population is 100,000 people. Of those people, 500 have colon cancer (0.005), and 99,500 don’t. If we were to apply the test to that whole population, then it would identify 95% of those with cancer as having it (500 x .95 = 475). It would also identify 5% of those without cancer as having it (99,500 x .05 = 4,975). That means there would be 25 false negatives, and 4,975 false positives.

A false positive occurs when a test registers that some feature is present, when the feature isn’t really present. 4,975 people would register as having cancer who don’t have it. A false negative occurs when a test fails to register that a feature is present, even when it is. So, 25 people would register as being free of the cancer who in fact have it.

Now, what you want to know is the probability that you are one who tested positive and actually has colon cancer rather than one of the false positives. And what is the probability of that? It is simply the number of people who actually have colon cancer (500), multiplied by the accuracy of the test (500 x .95 = 475), and then divided by the number that the test would identify as having colon cancer. This latter number includes those the test would misidentify (4,975) as well as the number it would accurately identify (475)—thus the total number the test would identify as having colon cancer would be 5,450. (That is, 4975+475).

So the probability that you have cancer, given the evidence of the positive test is 8.7% (i.e., 475/5450).

Thus, contrary to our initial reasoning that there was a 95% chance that you have colon cancer, the chance is only a tenth of that—it is less than 10%! In thinking that the probability that you have cancer is closer to 95% you would be ignoring the base rate of the probability of having the disease in the first place (which, as we’ve seen, is quite low). This is the signature of any base rate fallacy.

The Gambler’s fallacy

The gambler’s fallacy occurs when one thinks that independent, random events can be influenced by each other. For example, suppose I have a fair coin and I have just flipped 4 heads in a row. Erik, on the other hand, has a fair coin that he has flipped 4 times and gotten tails. We are each taking bets that the next coin flipped is heads. Who should you bet flips the head?

If you are inclined to say that you should place the bet with Erik since he has been flipping all tails and since the coin is fair, the flips must even out soon, then you have committed the gambler’s fallacy. The fact is, each flip is independent of the next, so the fact that I have just flipped 4 heads in a row does not increase or decrease my chances of flipping a head. Likewise for Erik. It is true that as long as the coin is fair, then over a large number of flips we should expect that the proportion of heads to tails will be about 50/50. But there is no reason to expect that a particular flip will be more likely to be one or the other. Since the coin is fair, each flip has the same probability of being heads and the same probability of being tails—50%.

The Small Numbers Fallacy

Suppose a study showed that of the 3,141 counties of the United States, the incidence of kidney cancer was lowest in those countries which are mostly rural, sparsely populated, and located in traditionally Republican states. (In fact, this is true.) What accounts for this interesting finding? Most people would be tempted to look for a causal explanation—to look for features of the rural environment that account for the lower incidence of cancer. However, they would be wrong (in this case) to do so.

It is easy to see why once we consider the counties that have the highest incidence of kidney cancer: they are also counties that are mostly rural, sparsely populated, and located in traditionally Republican states! So whatever it was you thought might account for the lower cancer rates in rural counties can’t be the right explanation, since these counties also have the highest rates of cancer. It is important to understand that it isn’t the same counties that have the highest and lowest rates—for example, county X doesn’t have both a high and a low cancer rate (relative to other U.S. counties). That would be a contradiction (and so can’t possibly be true). Rather, what is the case is that counties that have the highest kidney cancer rates are “mostly rural, sparsely populated, and located in traditionally Republican states” but also counties that have the lowest kidney cancer rates are “mostly rural, sparsely populated, and located in traditionally Republican states.” How could this be?

The reason is that these counties have smaller populations, so they will tend to have more extreme results (of either the higher or lower rates). The less populated counties will tend to have cancer rates that are at the extreme, relative to the national average. And this is a purely statistical fact; it has nothing to do with features of those environments causing the cancer rate to be higher or lower.

The first take home lesson here is that smaller groups will tend towards the extremes in terms of their possession of some feature, relative to larger groups. We can call this the law of small numbers. The second take home message is that our brains are wired to look for causal explanations rather than mathematical explanations, and because of this we are prone to ignore the law of small numbers and look for a causal explanation of phenomena instead. The small numbers fallacy is our tendency to seek a causal explanation for some phenomenon when only the law of small numbers is needed to explain that phenomenon.

Submodule 13.1 Quiz

Licenses and Attributions

Key Sources:

Van Cleave, Matthew (2016), Introduction to Logic and Critical Thinking, pp 169-170, 178 under CC BY-SA 4.0
Watson, Jeffrey (2019). Introduction to Logic. Licensed under: (CC BY-SA).
Dean, Susan, Illowsky, Barbara, et. al., OpenStax, Introductory Statistics. OpenStax CNX. May 13, 2015.
Watson, Jeff (2001). How to Determine a Sample Size: Tipsheet #60, University Park, PA: Penn State Cooperative Extension. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Next Page: 13.2 Probability and Logic

13.1 Weighing the Evidence

13.1 Weighing the Evidence

Table of Contents

13.1.1 What is Probability?

Evaluating the Premises

Representing Probabilities Mathematically

Theoretical Probability

13.1.2 Statistical Probability

Frequency Probability

Representative Samples

Inadequate Sample Size

Biased Sample

13.1.3 Statistical Significance

Sample Size

Two Types of Errors

Step One: Estimate Variability (Proportions)

Step Two: Choose Precision (Confidence Interval)

Step Three: Choose Confidence Level

Statistical Significance

The Insignificance of Statistical Insignificance

The Power of a Test

Sample Size Calculator

13.1.4 Subjective Probability

Subjective Probability

Coherence

Calculating Subjective Probability

13.1.5 Fallacies and Probability

Fallacies and Probability

Base Rate Neglect

The Gambler’s fallacy

The Small Numbers Fallacy

Submodule 13.1 Quiz