Birthdays and Pigeons

In continuing with the trend of discussing statistics, I figured I would repost an old entry from my blog about the Birthday problem. This isn’t a skeptical topic, so much as a way to illustrate how statistics can provide counter-intuitive results to simple problems.

The birthday problem is a statistical problem — it’s not pseudo-science, but it is counter-intuitive at first glance. The question is as follows: if you have a group of randomly selected people with a random distribution of birthdays, what is the probability that two of them will share a birthday? For somebody who is unfamiliar with this problem (or statistics), you might guess that you would need 366 people to ensure that at least two of them share a birthday. As it turns out, you only need 23 people in order to have a 50% possibility of two people sharing a birthday. Once you have a group of 57 people (or more), the probability that two will share a birthday is over 99%. The following is the graph of probabilities from Wikipedia:

This happens because of the mechanics of random clustering. The easiest way to think about it is by picturing a group of mailboxes. If you start slotting the mailboxes with envelopes, with each successive envelope, it will become more and more likely that a mailbox that already has an envelope will be given another one. Keep in mind that we’re not talking about a specific mailbox, but rather we’re looking for two (or more) envelopes in any mailbox. So each mailbox that receives an envelope will increase the number of mailboxes with an envelope in it, and subsequently increase the odds that we will place an envelope into a mailbox that already has an envelope in it.

Going back to the birthday problem, we can use the pigeonhole principle to calculate the probability that all of the birthdays are different (given that we have fewer than 365 people). As an example, let’s say that we had four people in a room. The probability that those four people all had different birthdays would be calculated as follows:

p(4) = 1 x (1 – [1/365]) x (1 – [2/365]) x (1 – [3/365])

because for each successive person, when we calculate the probability that their birthday is unique, there is one fewer day that their birthday can occupy. This could also be calculated using the formula:

p(4) = (365 x 364 x 363 x 362) / (365 x 365 x 365 x 365)

Then, to find the probability that any two people share the same birthday, we subtract the probability that they don’t share the same birthday from 1. Of course, you can never be 100% sure that you will have two people who share a birthday until you have 366 people, but chances are pretty good (97%) that you’ll have two people who share a birthday by the time you have 50 people, and that percentage will have risen to 99% by the time you have 57 people.

So the next time somebody asks you, or you wonder to yourself, what the chances are that two people share the same birthday — you now have an answer.

• 